Mapping Digital Scholarly Communication Infrastructure - Final Report

Nov 24, 2020 06:14 · 3956 words · 19 minute read could get um invested really

hello my name is david lewis i’m coming to you from lancaster pennsylvania and i’m here today to talk about a project mapping the digital scholarly communications infrastructure um i’m speaking on behalf of my two colleagues mike roy the dean of libraries at middlebury college and kathryn skinner the executive director at educopia and this was the team that has been working on this project for the last couple of years a little bit of background on where this project came from and how we got to where we are it began with um my little paper um the two and a half percent commitment and i circulated that to a number of my friends and colleagues and mike roy and i began a conversation and we decided that what we uh would be useful to do would to do a map of what the open scholarly infrastructure looked like and we had an ambition of trying to create a map of that infrastructure that looked something like this map of the commercial infrastructure that pasada and chen had put together um we were we were jealous of this accomplishment and we wanted to try to do something akin to this um with open um we did a number of presentations and um that led um a couple of years ago to a grant from the andrew w mellon foundation to do a study of the digital scholarly infrastructure this was not quite what we had in mind we wanted to look both at infrastructure and content in an open environment but the funding we got from mellon we’ve just looked at opens just looked at infrastructure and we looked at both the open infrastructure and the proprietary infrastructure both of which were provided on occasion by both commercial firms and not-for-profits so this is the the team that we put together shortly after we got the grant we brought in katherine skinner as a core member of the team we put together a really nice and very talented advisory committee that we consulted with in a variety of points along with the project and the survey and visual visualization work that um i’ll talk about a little bit later was largely done by nathan brown and his firm true bearing so in the beginning it’s important to talk about what we mean by infrastructure um and this is a little bit tricky in in some ways um but the way we thought about it in general was it was the tools services and systems that underpin uh scholarly communications there are a couple of dimensions of this that that become a little bit tricky there is a a gray area between content that lives on a particular system and the system itself and in many cases it’s really hard to disentangle that so things like hottie trust or archive we tended to include some of those when they were very large in general we also didn’t look at discipline discipline specific infrastructure but rather the things we included needed to have some general application across disciplines and we looked at communications what that meant and we we defined that very broadly from um discovery to access to preservation um but it was not the things and tools that were used to create scholarship so it’s the communication of scholarship broadly defined not the creation of scholarships so each and also there’s some gray area there as well tools used to write and bring data into articles we tended to include those but if it was a data manipulation tool or a digital humanities tool we left those out we tried to create a a hierarch a description of the different types of of tools um and this is uh the one that we mostly worked with but this kind of categorization really needs some work and and it’s one of the things we’ll recommend at the end but um we started with researcher tools there’s some writing tools collaboration tools uh repositories preprint servers leading into a variety of publishing tools both for monographs and journals the whole discovery piece um evaluation and assessments of various sorts preservation and then there are a series of what we’ve called general services that overlay the whole system so things like um orchid or um doj things of that sort that make it all possible um are also included so the project had six goals but the first of which was to create a census of the infrastructure providers we we thought it was very important to try to understand who was out there what they were doing and how they all fit together in the to the best of our ability so we tried to do a survey of that we called that the census of um so that was the first piece the second piece had to do uh was a literature view we looked um very um diligently and across the web and and harvested a lot of literature about a variety of general issues and also about specific projects in order to and both of those looked at the the provider side we also did a number of case studies of the providers to do some qualitative data that would enrich the numeric uh and and quantitative stuff that we pulled out of the census the other side that we wanted to look at was how particularly libraries invested in these providers to get a sense of of who was making what kinds of investments and why and we did this in a couple of ways we looked at focus groups we did a number of focus groups with library leaders and we also did a survey of library investments in an attempt to capture the amount of money that was being invested by libraries of a number of sizes and types and and and what things they were putting their money into in terms of supporting the infrastructure that um they rely on and then as i had said at the beginning our ambition was to create this map of the digital scholarly infrastructure and ironically we accomplished at least to some degree all the first five things but the map we have a draft that’s very preliminary but that was the one piece that we didn’t accomplish in the way that we had really hoped for this is another way of looking at the things we tried to accomplish you can look at the library side of it and you can look at the infrastructure provider side of it and and qualitative and quantitative views so the focus groups the literature review the case studies uh were qualitative the survey and the census were web-based surveys um that attempted to collect mostly quantitative data so i’m going to run through the five pieces that we’ve uh really managed to accomplish the first of those was the census of providers this was a web-based survey it was done largely by true bearing and and catherine skinner wrote it up it was based on a tool that jacobia had put together um that um looked at the organizational maturity and financial maturity of um of organizations and um this is a fairly extensive document if you’re really interested i would encourage you to to take a look at it um catherine also wrote a a blog post um called uh running a queen red queen’s race which which is her interpretation of the data and i would i would recommend that piece to you as well in general uh the findings of the census the first one i think is is pretty stark we only got 42 responses to the census after haranguing and harassing a number of providers for some time so we were we were disappointed with that result and and we think it’s pretty important but uh to try to continue this effort and i’ll talk about that a little bit farther down it’s pretty clear that um we need the taxonomy that i talked out about a little bit at the beginning in order to get a better sense of how to think about and look at what’s out there the other thing that became very clear to us is that many of the providers were challenged to provide the the data that we requested and a lot of what we were looking for was financial data and particularly open providers often they are project embedded in different organizations and or our grant funded or multi-organizational uh uh projects and it’s not easy often for them to bring together the information that we uh were requesting uh quickly enough to to make filling out our census instrument worth their while a lot of people got frustrated with that um and so i think that it’s uh that effort needs to continue and it’s a sign that many of the providers are their institution their organizations are not as mature or as robust as they ought to be the data we were requesting really should be straightforward if you had a good annual report and a lot of people really didn’t have that easy easily at hand and many were especially small projects um just the the time to even do a very relatively short survey was was a challenge um among the open providers uh who did manage to to do this survey and actually many of those who did found it a really valuable exercise because it required them to pull data together that they had before but as as you would uh i think expect many of them have a hard time raising and sustaining the level of funding that that’s needed to really maintain uh the projects it also became clear that many of the providers are in need of guidance mentorship training other kinds of opportunities in order to enhance their organizational and financial health so again we have a lot of organizations that are providing important infrastructure that are not as robust as we would hope for the other thing that’s quite clear in looking at this is that there is no uh coordinated or even uncoordinated end-to-end workflow in the an open environment and that the varieties of technologies and strategies that the providers are using will make it a real challenge to put all of this together and this is true at a time when the commercial providers particularly the large publishing houses are are working very diligently and with significant resources to try to put this end-to- end workflow in place and so um if you believe as we do that that open is important this is a very dangerous sign for us we really need to work on trying to figure out how to make this happen so the second piece was the literature review um and i did this work um and i’ve called it a bibliographic scan of the digital scholarly communication infrastructure it is an extensive bibliographic um essay and uh that looks at both the literature of the the deal with the general issues involved in this and then the literature that uh documents the activities of of the sector i identified just over 200 projects um about two-thirds of them were um were not-for-profit many of these were very small projects and then i i also identified uh 67 commercial of projects uh in a variety of firms uh it’s uh insightful i think to just it’s it’s uh that many of there are relatively small number of organizations that provide a significant number of the projects both on the commercial and on the not-for-profit side so if you look at organizations like deep space public knowledge project they they have a variety of projects involved and on the commercial side the big firms do as well so this is as close as we get to a map this shows the sort of the workflow is the yellow error in the middle running from sort of creation just off the screen all the way through to preservation and assessment you can see the the number of different projects in each of those areas i think it’s interesting if you look at the researcher tools the ones that are really significant particularly around collaboration are very are controlled by large commercial firms discovery layer is also the the important tools are are managed by large commercial firms but in this case the firms are not part of the scholarly communications environment really their google scholar and microsoft and the ones that really matter are from big firms that are outside our sector um in the repository sector and in the publishing sector there are a large number of not-for-profit open alternatives one might say that there are really maybe in some cases too many um and that there are redundancies in there that that are unnecessary and that we maybe need to look at sunsetting some of those projects or weeding them out although how that happens is a really difficult way to think about it there are a variety of preservation strategies many of them open although probably the most important is is a commercial firm and then when we get to assessment um again the large commercial firms um dominate particularly in the chris systems um and so at both ends of the research workflow that are dominated by commercial firms in the middle there are many good open uh alternatives but those are not coordinated in a way that would you would be able to piece together a consistent workflow easily so the next piece has to do with case studies these were done by catherine skinner you can see the the four firms that she looked at again we have a fairly nice publication that brings these all together i would encourage you to take a look at them if you if you have interest in any of these projects there also are a series of case studies that have just been released by spark europe and and i would encourage you to look at those as well um as well as some conversations that are similar in nature that the investing open infrastructure group has recently released so there are a variety of case studies beyond what we did that can give you a feel for the particularly the open side of the infrastructure system the next piece i want to talk about are the library focus groups um we did a series of of groups we did some at ala in uh the summer of 19 and at with arl in in september at cni uh last november last december and we did a series of virtual sessions in january and february of this year um the majority were from large research universities i think mostly because we were at arl and then a smattering of other kinds of libraries we asked them how much do they invest where they invest and why they invest as well as what the challenges and opportunities they were from where they sit i think the most striking finding that that we had here in the focus groups was that um often people really have a hard time sorting out how much money they they invest in the infrastructure particularly the open infrastructure and these were primarily library directors so unlike collections where the definition of what ought to be counted where are pretty clear and you could ask most library directors how much they spend on collections or staffing and they would be able to give you a number off the top of their head almost immediately and this was not the case with open they really didn’t know how to uh they really hadn’t done that exercise before and so they they often didn’t didn’t have a number at the tip of their tongue um which i think is is really important and a it’s an effort that i think would be important for library organizations that think about statistics to start to define these things so that we really have a better picture of of how much investment is going in which directions when we asked them about some why they invested a lot of the answers for what you would expect they wanted to be part of a community particularly if it was a tool they were using that they wanted to influence sometimes the investment that got them a seat at the government’s table table was useful um interestingly a lot of people said that it’s the keep up with the joneses or i trust my friends um at a competi comparable institution and she invests so i’m prepared to do it because i trust her judgment not so much that i’ve done the assessment but that it’s what everybody else is doing and a lot of people admitted that they really didn’t have a good sense of whether what the trade-offs were and whether or not they were making good uh good investments when we asked about the factors we we got the kinds of an analysis when they were carefully looking in some cases that you would expect about privacy costs exit strategies that kind of thing there was a major concern about the sustainability of the system and as i’ve talked about when we talked about the the census um that’s really justifiable i mean the the sustainability of the whole system is is at risk i think not as robust as we would hope um and there was a frustration with the the funding model which was you know an organization that a tool we use comes to us and says give us five to twenty thousand dollars um and there’s no overall strategy and there’s no clear way of creating an overall strategy for investments that would invest in the whole system and there was a sense that it and and often this was based on a campus rather than a library perspective that you really needed to invest in strategies that were quote unquote winners or that had a sustainable um strategy and and often those were commercial players so be press might be a better investment even though you hate the idea of doing it because um your computer center says you need something you can trust for the long haul oh we have technical glitch here um so when we looked at the live um so excuse me here so we the next piece is the the library survey um sorry okay here we go um the library survey again was a web-based tool we asked about investments numeric investments in particular tools which we had classified and then some other data on staffing we got 91 responses two-thirds were large research universities mostly about a quarter were small liberal arts colleges and then a smattering of the other types um a couple of this was obviously a very low response rate considering the number of institutions uh and primarily we were focused focusing on the u.s um it occurs it appears to us that there was a bias towards people who were um invested in open um and often that the data was incomplete and again i think this is the issue that often uh the people responding to survey had a hard time pulling the data together so here is uh some of the the data that we were uh got got here you can see that there was um 14 million dollars of investment by these 91 libraries that’s about 150 000 a piece one and a half percent of the library budget um and uh two and a half percent if you take off of salaries and that was eight dollars a student um we based our survey i should say um in large part uh on a survey that the canadian uh association of research libraries have done and i would strongly recommend their survey if you’re really interested look at their survey if you’re looking if you’re interested in that data um here’s another couple of ways of looking at it a large portion of the return 10 million over 10 million of the 14 million was in staffing so less than four million dollars actually left the campus so that’s a relatively small investment in infrastructure providers the majority of that was for hosted repository solutions and then you can can see that down the way um this is another interesting way of looking at it you can see the graphs here the the pers the higher up you are um the the more as a larger percentage of the investments you’re making uh as a library you can see the one really high large library um and that’s a university that supports a very large project so they make a significant investment in it the majority of the respondents regardless of size invested less than two percent and there’s a great deal of free riding often we asked about which projects they used and often people would use a project but not make an investment in it so that’s the the work that we did i think it’s probably important to to indicate that we did all of this before the pandemic set in so our data is all based on that and maybe subject detain has changed as a result of that um we have a series of recommendations that we made and the first three are the most important we think that um continued efforts to try to get a survey of the open particularly open providers uh so that we can get a picture of what that universe looks like and can begin to think about it as a coherent whole rather than bits and pieces it’s also clear that a variety of strategies that work to enhance the organizational and financial robustness of that community of providers is important whether it’s a community of practice or other kinds of work we think that’s pretty important and we think that the library server survey ought to be continued in some way whether that’s groups like arl or other library organizations that might try to collect that data it would be useful if all libraries did it but if we could get even some large consortiums or groups to work on it they could get some of the kinks out of it the other things a sort of annual report of the survey of the sector would probably be useful case studies continue to be useful some idea about working with librarians to to get give them a better sense of what is going on would probably be important as well so we have a variety of reports and resources i uh katherine skinner’s done a couple of these i did the bibliographic scan um we have a couple of blog posts i would recommend as as i’ve said before the second one here catherine’s red’s queen is pretty good um a couple of other uh resources that are part uh that you might want to look at and i put a pdf of this presentation up at this tiny url so you can uh then use the links to get to this so if you have any questions about our project any of us would be happy to hear from you by email um so thank you sorry for the little glitches going back and forth couldn’t quote my screen to work but again thank you very much you .