High Fidelity: Connecting Information for Better Research Reproducibility

Nov 11, 2020 07:51 · 5163 words · 25 minute read others iss ins nsf part

Cliff Lynch: Welcome, everybody. Thank you for joining us, we’ll be starting in about a minute and a half to two minutes. Cliff Lynch: Thanks for joining us, we will be getting started in a little around a minute or so. Cliff Lynch: Thanks for joining us, and welcome, we’re gonna get started in about a half a minute. Cliff Lynch: Okay, it’s time. There’s lots to talk about and let’s get started. Welcome, everybody, to the fall CNI virtual meeting and you’ve joined us for the first Cliff Lynch: Project briefing session of that meeting. I’m Cliff Lynch. I’m the director of CNI and I’m delighted that you’re here with us.

02:54 - Cliff Lynch: I will note that the session is being recorded and will be available the recording will be available after just a couple of very quick things because I know we all know much more about zoom than we ever thought we would Cliff Lynch: Everybody’s muted. At the moment, except for our speakers and Cliff Lynch: You can use the Q AMP a box to ask questions for your for this of the speakers will deal with all of those at the end of the presentation. Cliff Lynch: There is also a chat and you’re welcome to use that for comments. Cliff Lynch: As we go Cliff Lynch: This Cliff Lynch: Closed captioning is available on this recording. If you would like to make use of that and you’re welcome to introduce yourself in the chat.

Don’t feel you have to, but we’d be happy to have you. Cliff Lynch: With that, I’m going to introduce our speakers. Cliff Lynch: Diane golden Burkhardt of CNI will beam into existence. After our presentations are complete to moderate the Q AMP a session. Cliff Lynch: And Cliff Lynch: Martin helper an old friend of CNN eyes, who’s worked with us for many years in many capacities and done a lot of fascinating work is now at the National Science Foundation, as a rotating program officer and I’m delighted to have him back with us.

04:44 - Cliff Lynch: He is going to be joined by Lance fell from the science office at the US Department of Energy and they’re going to talk to us about Cliff Lynch: A collaboration that they’ve had in place to support plus a platform for providing access to federally funded Cliff Lynch: scholarly publication as part of the the public access mandates that our funders are putting in place. And with that, I will turn it over to Martin, who is going to start the presentation. Welcome, Martin. Martin Halbert: Thank you very much, Cliff. Thank you for having us here today and we’re delighted to be your first set of presenters and this virtual CNI meeting. I’m Lance, go ahead and go to the first overview slide.

05:40 - Martin Halbert: So in this short presentation we’re going to review the six year collaboration between the National Science Foundation and the Department of Energy honesty. Martin Halbert: To create the NSF public access repository or NSF par and we’ll give you a little bit of history about this partnership as well as descriptions of the technical implementation details. Go ahead and. Next slide. Martin Halbert: So many of you will remember the Holdren memo by issued by john P Holdren in Martin Halbert: From under the Obama administration from the off the White House Office of Science and Technology Policy which directed agencies that issued more than $100 million in Martin Halbert: Sponsored Programs to develop plans to support public access to the results of research funded by the federal government, especially peer reviewed publications and digital data on. Next slide. Martin Halbert: And many agencies responded to this in the case of NSF Dr under Dr. Francis Cordova Martin Halbert: We developed this public access plan NSF 1552 if you want to look it up online.

It was entitled today’s data tomorrow’s discoveries increasing access to the results of research funded by the NSF and it laid out a plan for responding to the Holdren memo. Martin Halbert: The, the, this NFC 1552 plan was developed largely by Amy Freelander I want to give her a lot of credit for leading the development of this initiative, but with also many contributions by others at NSF and of course at at do is you’re going to hear on next slide lands. Martin Halbert: So just a very high level scan of the Martin Halbert: Plan it laid out a set of goals, whereby NSF would create an open flexible in incremental approach and those are my emphasis in that quote to develop this extended infrastructure to deposit publication publications by NSF awardees Martin Halbert: And it in particular, called out this notion that we would work with other federal agencies and, in particular, Department of Energy in this collaborative Martin Halbert: Infrastructure that you’re going to hear a little bit about, and I’m very proud of this collaboration. I think it represents a great example of interagency work and reuse of software and expertise to accomplish this federal public access mandate. So, over to you, Lance. Lance Vowell: Thank you, Martin. Lance Vowell: As Cliff said another said, my name is Lance style.

I am the Assistant Director for the 08:55 - Lance Vowell: Applications development and operations with the Department of Energy’s Office of Scientific and Technical Information we are hosted in we are Lance Vowell: Hosted in Oakridge and that’s going to come into play pretty importantly in just a few minutes when I started diving into some of this technology, most of the time when I Lance Vowell: Do these sorts of presentations, I have to ask the facilitator to give me just a little extra time so I can get through my whole title. I know it takes up quite a bit of time there so Lance Vowell: But I do appreciate this opportunity and as Martin alluded to this collaboration. Lance Vowell: started back in 2013 or 2014 I’ve been with the project since the very beginning, as the project coordinator and project manager from the Department of Energy side. Lance Vowell: After discussions with any Freelander and others at NSF NSF did decide to utilize the same formatting for their public access repository, as the Department of Energy did and we Lance Vowell: Have called that repository or that process, we refer to it as pages or public access gateway for energy and science for the Department Lance Vowell: But at Austin, we have been providing public access since 1947 through through Ostia, the Department Lance Vowell: We’ve been all digital. Since around 2000 so 20 years and I’ve been at the department.

Now since 2005 so I’ve been in this space for 15 years 10:18 - Lance Vowell: We provide public access to a variety of scientific and technical information. Lance Vowell: Accepted manuscripts technical reports data software patents videos, we have a litany of different products that we host under the aussie.gov domain. And those are sub products. I’ll see.gov is our umbrella. Lance Vowell: Product and then we have niche products for each of these that allow users to drill down into more detail. Lance Vowell: The slides are lagging just a little bit. Here we go. Lance Vowell: Aussie is very familiar with SDI management.

We have a specialized program that we refer to as step or the scientific and technical information program. Lance Vowell: That collects all of the R amp D results for the department both labs are national labs that do we we have sponsors 17 National Labs, as well as our grantees or our financial award recipients Lance Vowell: We have a corporate responsibility. It’s not just the Office of Science. Lance Vowell: We have the corporate responsibility across all programs office and across the entire department, we have a specialized tool that is an electronic ingest tool that we use for this we refer to as a link or energy link. Lance Vowell: And currently, we’re processing 50,000 or over incoming SDI products annually. And that goes back to that list of product tops that I mentioned earlier.

11:40 - Lance Vowell: Though through step and through a link. We were and continue to be well positioned to extend our existing infrastructure to accommodate accepted manuscripts. Lance Vowell: After that 2013 STP memo went out, we felt that we were, we felt confident that we were able to extend our infrastructure and our knowledge to assist Lance Vowell: other federal agencies that may be interested in NSF take us up on that offer our approach is extensible not only to NSF, but to other agencies, we also work with a deal. Do these details. Lance Vowell: To implement their public access plan our submission infrastructure is well established. And like I said, we’ve been doing this digitally for 20 years now.

12:21 - Lance Vowell: And can be customized to meet the needs of other agencies which you’ll see very shortly how we did customize our infrastructure to allow for NSF public access Lance Vowell: In SF saw that that partnership, I would, I saw that opportunity and they reached out for the partnership with that long term goal being preservation and access, not just immediate access, not just short short term access and that will come into play. And just a few more minutes as well. Lance Vowell: NSF part upon working with NSF, we developed what Morton referred to earlier as the NSF public access repository or Lance Vowell: NSF par. That is a submission tool and that submission tool allows NSF PS to directly deposit peer reviewed published journal articles and juried conference papers. Lance Vowell: Into a repository at NSF the NSF public application is a dissemination product that is a special product for these Lance Vowell: peer reviewed journal articles and conference papers. So there are two sections for NSF par We have NSF part of the submission tool that allows for the submission Lance Vowell: And then we have NSF par public, which is the dissemination portion of that and it doesn’t.

It’s the dissemination for NSF supplied metadata and the appropriate full text. Lance Vowell: Metadata and links from course which I’ll get to in just a moment. What that correspond means and Lance Vowell: We also had the potential for adding additional product tops, which include data which NSF and do we are currently working on together to Lance Vowell: See how they might want to approach that and I think Martin will touch on that briefly later. Lance Vowell: Just with deal we pages. It is a hybrid model it both consists of NSF PR supplied records as well as chorus subplot supplemental records which are used to supplement the collection and I’ll get to that more detail in just a few moments. Lance Vowell: This was an integration at the technical level, there was a lot of integration that had to go on here to make this Lance Vowell: To make this work and how we did that.

We started with a single sign on that was 14:29 - Lance Vowell: The most far reaching. Lance Vowell: Interact interagency collaboration that we work through with this what we work to do was have a seamless handoff between research.gov Lance Vowell: And NSF for NSF part of the submission tool and the dissemination tool or actually hosted in Oakridge where research.gov and all of the NSF related products are hosted in the in the DC metro area. Lance Vowell: So we work with their team to have this seamless handoff a researcher logs into research.gov they have that portal.

15:00 - Lance Vowell: They click on the link to submit these manuscripts, they’re taking the NSF or NSF par that’s hosted in Oakridge seamlessly. They never know that they’ve left those servers. They’re all government hosted government funded government protected servers, along with that. Lance Vowell: Their SSO passes a certain information part of that information is there a unique user ID. So we know who that user is. And along with that they pass us that unique ID were able to then subsequently turn around using a Lance Vowell: Open NSF award API, which is that next bullet there were able to auto populate Lance Vowell: a listing of that that researchers that PS their awards and they’re able to associate their publications directly with their awards through that API integration. So if I’m a if I’m an NSF P i log into research.gov I want to submit a public Lance Vowell: Public Access manuscript. I click on the link. I’ve taken over to NSF car. I don’t know that I’ve lived in SF servers.

16:02 - Lance Vowell: Some information is passed along to their I enter a do lie I enter submitted data upload my Lance Vowell: Full Text my dream my accepted manuscript for my journal article and then I’m able to associate which award. I’m submitting that on behalf of Lance Vowell: Its four easy steps. We also have a wrist API that is used for integration between D and NSF data stores so that all of this information can be passed back and forth seamlessly from NSF servers to Lance Vowell: To UNICEF into their data stores, both the metadata and the links to the full text for their project reporting. That’s how the PX would then go in and add those manuscripts to their projects for their Lance Vowell: program officers to then approve at the end of their reporting cycle. Lance Vowell: We also use this as a certificate authenticated full text service.

So many of you are probably familiar with the Holden memo, the public access memo. Lance Vowell: And what that calls for is that these peer reviewed journal articles. These juried conference papers that they’re made publicly available after a, a, an administrative interval or maybe you’ve heard the term embargo. Lance Vowell: That the NSF along with do we we chose a 12 month embargo for that and so Lance Vowell: All of these are these accepted manuscripts were made available 12 months after the publication. However, many times. There are needs for NSF IPOs Lance Vowell: To have access to those embargo full text beforehand to make sure that what their peers are submitting is accurate.

17:38 - Lance Vowell: Before they can approve their project report. So what we did is we created an authenticated a certificate authenticated service for that. So inside of the NSF management system, otherwise known as he jacket. Lance Vowell: These POS can have direct links to to do we hosted servers to have access to these embargoed full text before the public has access to those so they can view those they can read those that can Lance Vowell: Make sure that the PR has acknowledged NSF appropriately that the science is tied to the award and so forth. Next slide. Lance Vowell: We also integrated with some third party services those third party services to have those that all that I’ll talk about today are cross ref Lance Vowell: The reason that we integrate integrate with cross wrath is do law services, of course, what we’re able to do for these p eyes.

The pianos once they get and they’re ready to submit their 18:34 - Lance Vowell: Publication. They don’t have to sit there and hand top auto hand top all of that metadata. The title the publication date a list of authors, a list of other organizations. Lance Vowell: Orchid IDs. If they’re if they have that dry that digital object identifier, they can put that in a box. They can hit submit and will auto populate that metadata for them using Lance Vowell: Integration with cross REST APIs. If they don’t have that or if they do have the top that metadata in the there. But the vast, vast majority of the submissions are done through what we call auto population. Lance Vowell: So they don’t even have to top that meditating and they can put that doin, they can get their metadata auto populated once they verify that is the correct information. Lance Vowell: They hit a button, they’re taken to a screen where they upload that accepted manuscript version. Lance Vowell: Then they hit another button, they associate their correct award they verify all that information is correct. They submit it and they’re done. They’re taking back to research.gov fairly seamlessly.

19:31 - Lance Vowell: We’ve also integrated with this is a big A big shout out to Amy Freelander her part here we integrated with the ISS in database and international standard serial number. Lance Vowell: It provides look up services for ISS ins and journal titles for those that are being manually submitted. So, those that may not have a DUI. Lance Vowell: Or we they weren’t able to auto populate that information for whatever reason manually putting that in Lance Vowell: It was very important to the integrity, the metadata integrity of this repository to NSF that those ISS ends and those journal names were authoritative and authentic. So what we did. Lance Vowell: We integrated with the assistant database. We have taught the heads. We have top of his for both the ICC and and their journal name. Lance Vowell: Or and or the journal name.

So all of those are authentic off Thor authoritative journal titles and in an ISS scenes that are paired up with the submission. So that metadata integrity is solid. Lance Vowell: We also integrated with chorus, chorus is a consortium of publishers that have worked with the federal government to say that we will support public access as it is defined in the holding memo. Lance Vowell: NSF and do we both have agreements with course and course is used again as a supplement Lance Vowell: Neither deal we pages, nor NSF car rely on the publisher for public access or for dark out archiving of these records. Lance Vowell: These records are treating us supplemental to the researcher supplied records. We wanted our repository to be publisher agnostic. Lance Vowell: If a publisher decided that they were going to get out of the public access game in SF and do we, we knew that it was important to have that long term preservation and access to these accepted manuscripts.

21:20 - Lance Vowell: So through this process chorus metadata is ingested and links to the VO are or the publishers version of record if and when that VR is made publicly available. Lance Vowell: Course allows for full text indexing of articles to enhance this search position. So what we do every night because we go out and we do a special asked query on crossroads API’s and we ingest all of the appropriate Lance Vowell: NSF related NSF funded and related articles and to par. And then we process those to say, Are these publicly available. Will these Lance Vowell: Be publicly available what version has the publisher made available.

Is it the version of record or is it, is it a publishers version of the accepted manuscript. Lance Vowell: All of that goes into this calculation for what we call the best available version. And there’s a hierarchy there that is related to exactly what the publisher makes available. Lance Vowell: The key objective is that for do we pages and for NSF par We want public access to that best available version, while not being reliant directly Lance Vowell: On the publishers for that long term access and important part of this is that chorus does allow for that full text indexing. So what we do is we are they make a version of their Lance Vowell: Full text available, it’s downloaded text is extracted from that.

And then that PDF is thrown away that PDF is not dark archive the dark archiving is is directly 22:44 - Lance Vowell: Sorry there downloaded the full text is directly related to full text indexing for search precision and accuracy. We don’t keep that for any amount of time until we get that full text out with throw that out. Lance Vowell: This is that best available version concept just a little bit deeper on that. Lance Vowell: What what Lance Vowell: What occurs inside of the NSF part. And do we pages is comparison. It’s submissions of NSF PR submissions, along with this collaboration with publisher from chorus and cross ref Lance Vowell: And the in the intersection of those allows for this best available version.

23:18 - Lance Vowell: It’s a course offers a single feed for all publisher. So, so we don’t have to go out and have individual agreements which I remember back in 2013 timeframe. Lance Vowell: Or even before. Do we was seeking to have these individual relationships with individual publishers course. Of course. Kind of packaged all that into one package and allows us to have one agreement with them. Lance Vowell: And then the publishers agree to a set of standards, the best available version of course is going to be that publishers version of record when and if they do make that available, followed by Lance Vowell: The publishers accepted manuscript version.

And then the NSF supplied 23:58 - Lance Vowell: PGi or sorry that NSF PR supplied version crossroads of course allows that standard as metadata, including the funding sources, the licensing the start dates for those licenses, all of those go into allowing for public access to happen. Lance Vowell: These are just a few screenshots of NSF part public. This is that public version that we’re that we’re seeing here. You can see here that this is a version of Lance Vowell: A record that goes directly to the publishers version of records. So we know that through our calculations, looking at that metadata. Lance Vowell: That we know that this publisher at that doin has made their be are available. So that’s what we make available to the public. Lance Vowell: You can see on the bottom screenshot. There’s a little open lock there. That’s an open access typically that open Lance Vowell: lock into cases open access. So if a researcher or the public went and they searched on this record in SF car, they would be taken directly to Lance Vowell: The publishers version of record, regardless of whether there was an NSF supplied am for that or not, however, if the publisher decided to take this version offline. We would then be able to make that am version available subsequently after that. Lance Vowell: Another version here.

This is actually a publisher’s accepted manuscripts that that second tier of the best available a best available version that I was referring to earlier. And you can see here that this APS physics article actually directly Lance Vowell: Acknowledges chorus on this accepted manuscript for. And so the user found this particular record and NSF power they clicked on that deal lie. They would be taken to a landing page and they would have direct access to this publisher am version. Lance Vowell: And then this is a this is a record that the best available version happens to be the open version of that free Lance Vowell: publicly accessible am that was submitted by an NSF PR.

So what this tells me is when I look at this is that the publisher didn’t make this 26:00 - Lance Vowell: Either the version of record or the am available on their platform or we didn’t recognize that in the metadata. So if a user came, they found this record they clicked on that accepted manuscript link, they would be able to download that version that was directly submitted by the API. Lance Vowell: So you can see how that best available version higher just kind of flows down from there. Lance Vowell: And I think Martin, I’ll turn it over to you. Martin Halbert: Sure, just to Martin Halbert: Book end this we the the system as it currently currently exists and all the functionality that Lance went through. We are now calling par 1.

0 26:41 - Martin Halbert: And we are now actively in the development planning process for what we are considering par 2.0 and it will feature a number of new upgrades. Martin Halbert: In a set of improved workflows for award link management functions. What I mean by that is our researchers have the capacity to Martin Halbert: Link the publications to the specific awards that funded the activity. And we have some new capabilities to let them edit those links and manage them a little bit more effectively. Martin Halbert: Other things that are coming are Martin Halbert: Perhaps some modest upgrade, but one that is important in many cases.

27:30 - Martin Halbert: awards have funded workshops and the workshop reports that come out of those things are not typically juried or always let’s say juried papers, but rather separate free standing workshop reports and we think that’s important to capture as well. Martin Halbert: Finally, probably the biggest set of work that we’re going to be undertaking and over the coming months. Martin Halbert: Is upgrades to the NSF par that will allow research data sets to be recorded and submitted to the system in terms of metadata DUIs other persistent identifiers Martin Halbert: And that is very much in the spirit of the Holdren memo from 2013 so it’s very important to me to see that par Foster’s good or best practices in terms of research data management. Martin Halbert: So we will be NSF I’m particularly interested in research proposals that foster good research data management practices we have to current dear colleague letters that are still active regarding, you know, sort of programs that will foster good Martin Halbert: research data management practices and and projects to advance that all of this will collectively comprise what we are turning the power to point of system. Martin Halbert: We don’t specifically have a have a specific timeline for the implementation yet, but we’re hoping to achieve, you know, really a significant amount of progress by the end of calendar 21 Martin Halbert: Um, with that, Lance.

And I would be happy to answer any questions people have 29:24 - Martin Halbert: And I see there’s a message. Martin Halbert: What is NSF protocol to ensure P eyes comply with the NSF Public Access Policy in making the research data and publications publicly available. Well, if you’re familiar with the NSF so called pap G or procedures guide it requires Martin Halbert: Researchers to deposit currently where we as as Martin Halbert: It’s clear from, you know, the Martin Halbert: Presentation on the PAR right now. Martin Halbert: You know, we’re foot we’ve focused for the last few years on Articles, but now we’re really stepping up to the issue of data. Data is a much more complicated topic. Obviously it Martin Halbert: In, you know, a data set can be anything from 20 kilobytes to 20 petabytes, and we’ve had to think through Martin Halbert: You know what sort of repository records.

We will be ingesting into the system to accommodate you know good access to data sets, but that’s that’s really the focus of the work that lance and I are in our teams are working on right now is to lay out that workflow. Diane Goldenberg-Hart (CNI): That was, that was really interesting. Thank you so much. Martin and Lance for that presentation and thank you to everyone who has made time out of your day to join us here at CNS fall 2020 meeting. I’m Diane golden Burkhardt Diane Goldenberg-Hart (CNI): And at this time I am very pleased to keep the floor open for questions, if there are any other questions or comments for our speakers. We have a couple more minutes here. Um, I see that Martin has shared some links there. Martin Halbert: Those are just the links to the two dear colleague letters that I mentioned. If people aren’t interested in them. Terrific.

31:29 - Diane Goldenberg-Hart (CNI): Thanks, Martin, um, Diane Goldenberg-Hart (CNI): I just, I was wondering if I know you mentioned, I think, Lance mentioned the Diane Goldenberg-Hart (CNI): Other agencies are also being brought into the system. I think you mentioned god. Are there active plans to expand across federal agencies or what’s what’s happening on that front. Lance Vowell: So, good question. So right now we are not working with any other agencies directly to implement a pages or apart like repository. We’re continually open to that we do feel the many, many questions across the various agencies. Lance Vowell: For helping people to instantiate public access plan, but Lance Vowell: For now God NSF and the Department of Energy remain Lance Vowell: Steadfast partners and we are all very appreciative of the partnership across and what that means for both federal government and individual agencies, but we are not currently working with any other agencies. Happy to though. Diane Goldenberg-Hart (CNI): Great.

32:44 - Diane Goldenberg-Hart (CNI): Okay, thank you for that. And we just saw a chat. Diane Goldenberg-Hart (CNI): Okay, so I just chatted out those links that Martin shared to all attendees and if you still can’t see him, let us know. But everyone should be able to see those now. Diane Goldenberg-Hart (CNI): All right, well, if there are no other questions and we are right at the half hour here. Um, I just want to take Diane Goldenberg-Hart (CNI): Just a quick minute here to remind everyone that will be having another session in a half an hour. It will be high fidelity connecting information for better research reproducibility.

33:27 - Diane Goldenberg-Hart (CNI): With Terry Wheeler and Peter oxley of the Lyle Cornell Medical College. In the meantime, I’m going to go ahead and stop the recording on this session. Diane Goldenberg-Hart (CNI): But if you’d like to hang around and sort of approach the podium, as it were, please feel free to do so I can unmute you. And you can ask live questions. Diane Goldenberg-Hart (CNI): Or make live comments have a little chat with our presenters and thank you so much again for joining us here today, and I hope that you’ll stay on for more sessions at CNI Cliff Lynch: Thank you so much. Martin, Lance. That was really helpful. Martin Halbert: Thank you, Cliff. Martin Halbert: For having us and Diane for facilitating Cliff Lynch: And I look forward to hearing about the version two, and the Research Data Set links as that develops, that’s all our wonderful direction. So, thank you. .