The FAIR Signposting Profile

Dec 5, 2020 02:38 · 5236 words · 25 minute read list really minimal adopted much

Cliff Lynch: Thanks for joining us, everyone, and welcome, we’ll be getting started in about 90 seconds. Cliff Lynch: Thanks for joining us, we’ll be getting started in about a minute. Cliff Lynch: Glad you could join us today, we’ll be getting started in about a half a minute or so. Cliff Lynch: All right, why don’t we go ahead and get started. Welcome everybody. And I’m glad you could join us today. I’m Cliff Lynch. I’m the Director of the Coalition for Networked Information and I will be introducing the session.

02:41 - Cliff Lynch: This is one of the breakout sessions from the last day of week three of the CNI fall virtual remember meeting for Cliff Lynch: Just to remind you, the third week of the meeting is focused on technology standards and infrastructure. Cliff Lynch: And this certainly checks all of those boxes. Um, I would remind you also that there are a number of pre recorded sessions as well as the scheduled breakout sessions. Cliff Lynch: That are part of week three, please enjoy those as your time permits, we are recording this session, it will be available later for public access Cliff Lynch: Closed captioning is available, please make use of that, if it’s helpful. There is a chat box, which you’re welcome to use throughout the session.

There’s also a Q AMP a tool at the bottom of your screen. You can use that pose questions at any point during the session. Cliff Lynch: We will address all the questions after the presentations are complete Diane Goldenberg-Hart from CNI will moderate that Q&A session. Cliff Lynch: And with that, let me introduce the session itself. We have two people with us, who will Cliff Lynch: Be extremely familiar to the CNI community. They’ve been important contributors for many years now, Martin Kline from the Los Alamos National Labs. Cliff Lynch: And Herbert von to sample, who many of you will know from when he won the Paul Evan Peters award. A few years ago, as well as his many, many contributions over the years to network technologies and infrastructure. Cliff Lynch: The topic today is the fair signposting profile. Those of you who’ve been following this work for a while, will be somewhat familiar, at least with signposting which is a idea that was developed some years ago to help with the discover ability of Cliff Lynch: Of network scholarly resources or indeed network resources more broadly.

05:18 - Cliff Lynch: What’s really interesting here is the way Cliff Lynch: They’ve coupled the evolving ideas about the fair principles and their role in discovery and certainly one of the great challenges of the last few years has been turning the fear principles into concrete practice. Cliff Lynch: And they have worked out a way of recognizing the practices in a signposting profile. So I think this is going to be very helpful for us. And with that, let me just give a really warm welcome back to Herbert and Martin and turn it over to Herbert to start the presentation. Herbert Van de Sompel: Thanks a lot, Cliff.

That was really great summary of the motivation for the work that I’ve been doing with the number of colleagues in including Martin over the past couple months so signposting itself is as Cliff indicated, not you, but 06:26 - Herbert Van de Sompel: With the fair signposting profile we’ve basically gone from a couple of moves suggestions on how to help robots navigate this call the web. Herbert Van de Sompel: To a true implementation guideline on how to do so, and thereby indeed paying attention to the notion of fair. So in this cartoon that Martin will bring up now. Herbert Van de Sompel: You basically see the essence of what signposting really is about. It is about making the scaly web more easy to navigate by machines. Herbert Van de Sompel: And doing this by literally putting signposts out like shown in this cartoon, like if you’re interested as about in information about the altar office call the asset follow this Otter sign Herbert Van de Sompel: If you’re interested in what the persistent identifier of this object is follow the site as sign and so forth.

07:29 - Herbert Van de Sompel: So, Herbert Van de Sompel: signposting Herbert Van de Sompel: As we started at around 2015 always had this notion of, let’s make the scaly web more easy to navigate for machines. Herbert Van de Sompel: It starts from the notion that landing page are ok for human users. You know, they tell human users where to find the PDF or the data center, etc. But they’re absolutely not optimized for machine use. And this is where signposting comes in. Herbert Van de Sompel: Now signposting from the outset, also as aim for absolute simplicity in trying to achieve meaningful interoperability. Herbert Van de Sompel: And this is based on. Well, the big lesson learned from 20 years of doing interoperability work is when you don’t keep things simple things will not be adopted. Herbert Van de Sompel: When you have simple things, then the chances for adoption grow significantly. So signposting is all about trying to achieve. Herbert Van de Sompel: Meaningful interoperability for machines to navigate this call the web in a very easy way Herbert Van de Sompel: It’s fully standard based follows the rest hate your so hypertext as the engine of application state architectural notions. Herbert Van de Sompel: And it makes us have typed web links.

Those are type links that are provided in the HTTP link header. Herbert Van de Sompel: Of landing pages as we know them and of the content resources that belong to a scholarly object and then the link types that are being used in these type links. Herbert Van de Sompel: All are registered in the eye and the registry for link relation types and all of those are defined informal specifications meaning robots can understand what each of these links types mean and they are fully standardized via ATF specs and via Diana registry. Herbert Van de Sompel: Now HTTP links are really used. This is not some marginal technology. This is used all over the web. And here you see Herbert Van de Sompel: A DDP yoga URL on kitchen HTTP head is being issued and you see the response to that HTTP header.

09:54 - Herbert Van de Sompel: And in there, you see a link header that actually contains three links on these different lines there under the LinkedIn. So if we look at the first one that basically says that the resource on which we issued the HTTP head is available under this specific creative commons license. Herbert Van de Sompel: The second link. Herbert Van de Sompel: shows us that when we access that the BP D awry, then by default we will receive RDF XML, as shown in the content type there. But there’s also an alternative representation that is an other RDF serialization Herbert Van de Sompel: And then determining their says that this UI on which we issue to HTTP head describes the city of Reiki effect because you see there, do you arrive slash resource shuts Reykjavik, that is the BPS way to talk about the city of Reykjavik Herbert Van de Sompel: Not only are these type links used all over the web, they’re actually really interesting because they work with HTTP head. Herbert Van de Sompel: They can be uniformly used for resources of all mine types.

So, not only for HTML landing pages, but for all the content resources that are of different mind types you can use exactly this same approach. Herbert Van de Sompel: And again, because these links are accessible merely by using an HTTP head no content transfer is required to see these links. So that means when you have massive resources like big data sets or you have restricted content. You still can use this technique. Herbert Van de Sompel: Now links, as I mentioned, can definitely be provided by value as for shown there in the HTTP link header. Herbert Van de Sompel: But there was also a by reference approach whereby a standalone document is published that contains a whole bunch of these links and then that standalone document is being made discoverable.

For example, from the landing page or the content resources or both. Herbert Van de Sompel: Generally speaking, these type links provide guidance. Herbert Van de Sompel: To machines that are navigating the web and want to accomplish a certain task and in order to accomplish that task. They will follow specific type links you know this one. If I need order information. This one, if I need metadata and so on and so forth. Herbert Van de Sompel: Now as I mentioned signposting has been around since 2015. So what is new. What’s really new is that we have turned these loose ideas of use the style links to achieve some interoperability into a concrete. Herbert Van de Sompel: Implementation guideline. It’s really a manual on how you can implement signposting and it basically says, which kind of links, should you be providing for the landing page and which kind of links. Should you provide for each of the content resources.

13:27 - Herbert Van de Sompel: This implementation guideline is targeted at platforms that host all kinds of scholarly output. So it’s not only Herbert Van de Sompel: For data platforms as the notion of fair might suggest it’s also for institutional repositories publisher platforms. So basically, it can be used across all kind of scholarly platforms. Herbert Van de Sompel: And then the fair signposting profile as Martin will show you later has basically two levels. Herbert Van de Sompel: In level one. We provide a concise arrived a limited set of links by value in the HTTP link header.

14:09 - Herbert Van de Sompel: And that level to repeat provide a comprehensive so complete set of links by reference. So meaning in such a standalone document. Herbert Van de Sompel: And the reason that in Level one. We are not providing a comprehensive set, but rather a limited set of links is because there is a risk if you would provide all the necessary links. Herbert Van de Sompel: That your link header becomes too large and that the web server would suffocate on them. Okay, so you want to avoid that. And this is why in level to the links are provided by reference.

14:48 - Herbert Van de Sompel: So signposting definitely contributes to fair, it contributes to find double accessible reusable. Herbert Van de Sompel: by informing machines, what the persistent identifier of an object is what the landing page is where the content is where metadata is available that describes the object. Herbert Van de Sompel: Where the persistent identifiers of the altars are, but it does not do this by means of a metadata format. It does this by means of these links and hence by providing HTTP arise that machines can visit to find further information pertaining the object. Herbert Van de Sompel: signposting also obviously contributes to the interoperability aspect of fair by providing all of this information in a uniform way.

15:46 - Herbert Van de Sompel: And it doesn’t do this in a way that only applies to the scholarly web quite to this country. It uses that techniques that are used in the web at large. Herbert Van de Sompel: Now signposting clearly is something that requires an investment for these platforms that hosts hosts quality content. Herbert Van de Sompel: To implement right some investment needs to be made to implement the fair signposting profile. Herbert Van de Sompel: But as I mentioned earlier signposting has been designed to be so simple that the investment required is really minimal and it is confirmed by certain in contest implementations that have been done in a minimal amounts of time.

16:33 - Herbert Van de Sompel: Now in order to create services that leverage this uniform signposting interface. Obviously. Again, one will need to make some investment. Herbert Van de Sompel: But the good news here is that if repositories implement this if platforms implement this, then it means that the services only have to interface with one type of interface, not with Herbert Van de Sompel: A bunch of heater a genius interface, meaning the barrier to entry is lower, and it creates a level play level playing ground for the emergence of complimentary and competing services. Herbert Van de Sompel: Compare that with an other approach to create service on top of repositories, which is basically saying Herbert Van de Sompel: Let the repositories. Just be what they are with the API’s that they have all different API’s and the central service provider will just deal with all of that complexity. Herbert Van de Sompel: That is possible and I see in a lot of European projects. This approach being taken.

17:36 - Herbert Van de Sompel: Unfortunately, the barrier to entry. There is so high, because the complexity of dealing with all these different interfaces so high. Herbert Van de Sompel: And typically, what you see is only one service provider can really have the resources. Herbert Van de Sompel: To enter that kind of a market. So you have a monopoly and the sustainability problem in that case.

And with this, I’m going to hand it over to Martin, who will in some detail describe the fair signposting profile to you and actually even attempt a live demo. Martin Klein: Yeah, thank you very much. Herbert. And so what I would like to do in the next few minutes as give you an overview of how the signposting fair profile that can really work in the real world. Martin Klein: And for the sake of this presentation, please bear with me for this abstract depiction of a scholarly object as it often is represented Martin Klein: On the web. So in the center of this graph, you have our landing page and, for example, you have a persistent identifier that if you resolve that in your web browser, you would most likely be redirected to to a landing page. Martin Klein: Of a scholar, the object on the bottom of this graph, you see a number of metadata files to describe the scholarly object on the top of it.

You see a number of authors that ideally have persistent identifiers such as an orchid. For example, Martin Klein: And you see on the right of this graph, a number of content resources that belong to the scholarly object at large. So that’s a very abstract way of looking at it. Martin Klein: So, for the sake of this presentation at Los Alamos National Laboratory, we went out and we installed a local instance in our infrastructure of the D space. Chris system and we also implemented all our signposting Martin Klein: The first time posting profile in order to be able to give you a good idea of what can this actually look like in in reality. Martin Klein: So for this sake, let me go switch windows here and show you my browser if you navigate to the space demo dot Memento web.

org slash J SP you I you’d be able to to access this demo this pilot implementation of a very vanilla D space. Chris instance Martin Klein: You see the the entry page of the of the system and a number of items that we have ingested into the system in order to showcase how signposts he can work. So if I now just click Martin Klein: On this link to show you the scholarly object representing our recent paper on the persistence of persistent identifiers Martin Klein: Of the scholarly weapon I open this in a new tab switch to this tab, you’ll see the, the typical de space landing page. Martin Klein: Describing the scholarly object we see a number of pieces of information provided such as the author’s their affiliation. The abstract of the work. And we also see that there are some there’s an item associated with this object, namely the PDF document.

20:45 - Martin Klein: That actually represents the actual paper. I can also click on this button here that shows a full item record and Martin Klein: Here I get basically an extended view of the landing page with more metadata provided about the the item. I’ll see that there’s a license associated with the object. Martin Klein: I also see that the object indeed actually has a persistent identifier. Right. It has a DUI. But here it has also a related data set hosted elsewhere that has an individual DUI.

21:15 - Martin Klein: So all these sort of pieces bits and pieces of information are available through the D space. Martin Klein: User interface. And again, that’s something that a human can fairly easily digest for a machine that’s much harder to do. And of course also open this Martin Klein: PDF document in a new window, just to give you an idea of this is a real thing this PDF document really representing the paper that we wrote, right. So that’s the sort of setup that we have established for the for the demo. And going back to my slides here. Martin Klein: Let’s talk about level one for the signposting Martin Klein: signposting profile at level one all links pertaining to the landing page and also the content resources are conveyed by means of HTTP link headers.

So if you reference the landing page here, right, you will get a number of link headers. Martin Klein: In return, and those convey information about the scholar, the object. Martin Klein: Some of those links are mandatory according to our profile. Others are optional. The mandatory links here are indicated by the solid lines and the optional links for the dashed lines. Martin Klein: And we see, for example, that landing page as a mandatory link to it’s persistent identifier for the link relation type site as that which I mentioned earlier.

22:33 - Martin Klein: Other links that are mandatory or links to meditate, an object of describing the scholar, the object for the described by relationship. Martin Klein: And also mandatory is the link that would actually convey information about the type of the object in this case. Martin Klein: The type would be a landing page and we highly recommend as the terminology from schema.org to to describe the type of the scholar, the object there. Martin Klein: Are the links are optional. So for example, the link to the persistent identifier of the authors are optional. In this level.

23:04 - Martin Klein: The reason being is that we cannot guarantee that all authors have or even one author of an object has a persistent identifier. Hence, we can’t really make this link mandatory. Martin Klein: The links to the individual content resources are at this level also optional and they’re basically two reasons for this, some of which Herbert had already alluded to. Martin Klein: One, it is entirely possible that your score the object has way too many content resources and including each of them by means of an HTTP link in the response would just be too much right Martin Klein: And the other reason is that it’s entirely possible also that these content resources. Martin Klein: Are hosted on platforms that the publisher of the scholar object has no control over. Hence, we have no way of accessing HTTP headers and modifying what has been returned there.

23:53 - Martin Klein: And those are basically the reasons why we cannot order and want to make those links mandatory. So there was. Those are the number of links. Martin Klein: Are contained in Level one when we’re talking about the landing page the perspective of the landing page. Martin Klein: How about the perspective of the content resources off this this object. Well, two different kinds of links that are also optional. Martin Klein: One is basically the inverse of the item relation type. It’s a way for content resource to convey.

I am actually part of this scaly object by means of conveying I’m part of this collection. That’s a type of connection link. Martin Klein: And the other optional link for content resource would be to convey what type it is. So am I. For example, a scholarly article or a data set. This would be other links there too. Martin Klein: So if you Martin Klein: If you will notice that these sort of graphs also represent nicely. The follow your nose approach in a way that if you recall that the landing page had a mandatory link to it’s persistent identifier in, let’s say, a DUI. Martin Klein: If I am now content resource one, for example, and I do convey that I am part of this landing page by means of the type collection.

25:02 - Martin Klein: And machine would also now know that my persistent identifier is what I should be using in order to reference the content resource one right. This is the hopping Martin Klein: Following links and following your nose approach that machines can easily. So do. So let’s see what this may look like in practice, I go back to my landing page former Scalia object in my D space repository, and if I can find my pointer my mouse go Here we go. Martin Klein: And I copy the URL for landing page. Come on. Martin Klein: And I use a terminal in order to be able to use the command line tool curl to send an HTTP header request against the URL of the landing page.

26:02 - Martin Klein: And I’ll see I get my HTTP response, including the response headers representing information that is mandatory and optional for Level one. So for example, I see a my site as linked with a DUI. I see a Martin Klein: Described by link or he is pointing at the medicine and record. I see the link describing the type of the object. In this case, it’s an above pictures landing page right Martin Klein: And also see a number of optional links that we have included, for example, I have the optional link for Level one. Martin Klein: Pointing at the PDF document in the relationship item, right.

This is the item of this scaly object both the authors have pockets so I can include those links here. Martin Klein: And two more links that I want to highlight one is the link for the relation type related pointing at the data set. Martin Klein: Host of a fixture in this case. And also another link that is not yet part of the profile, but could very well be part of this down the road, but just the link to the Martin Klein: To the associated license document with the relationship license. They discussion discussion is currently ongoing whether that should be included or not. Martin Klein: So we lost. I need to speed up a little bit.

So I go back to level two, which is where we as heard alluded to. Martin Klein: Where we convey all our links in a link set document. So all the links pertaining to the landing page and all the links pretending to the content resources are included in links that document and Martin Klein: That links that document is also discoverable by means of HTTP links. Martin Klein: So the, the notion of what is mandatory but it’s optional has changed slightly from between level one and level two. Martin Klein: Now for Level two. We see that they described by links to all available metadata records is mandatory and also the links to the come to all available content resources is mandatory in that level as well.

27:59 - Martin Klein: For from the perspective of content resources, the previously seen optional links are now mandatory meaning each content resource now has to convey Martin Klein: The the the collection relationship and also has to convey what sort of type it is. So whether it is a scholarly article, for example. Martin Klein: You see new lines in this graph here. So the dash dotted lines, if you will. Those lines represent cases. Martin Klein: You’re supposed to put those links into place if and only if the conveyed information from the perspective of the content resource is different. Martin Klein: Than what was conveyed from the perspective of the landing page. So for example if content resource n has a different persistent identifier than the landing page has, then you should put in a new site s link from the content resource and to its corresponding persistent identifier.

28:52 - Martin Klein: So a brief example of what this could look like. We go back to my link headers, because here I’d already seen my links that document is included in the HTTP response headers. Martin Klein: I copy that URL, go back to my browser because looking at JSON and my browser is a little bit more friendly and Martin Klein: increase the font a little bit, and he will see the the links that document that was returned. I’ll see my Martin Klein: Landing page and see the type of my landing page conveyed the site so link towards, towards them persistent identifier. The two authors. Martin Klein: My described by links licensed link my related link and my item link pointing back at the PDF document.

29:36 - Martin Klein: And also, as mentioned that the links from the perspective of the content resources are included in this document as well here conveying that Martin Klein: This items actually part of that collection pointing back to the landing page URL and also as mandatory and level two conveying that it is off type scholarly article utilizing the schema.org Martin Klein: Terminology Martin Klein: So, Martin Klein: That I go back to my slides here. And again, invite you to try it out yourself D space demo document web.org slash J SP UI very plain vanilla D space. Chris instance Martin Klein: Not all that hard to set up we invested a number of scholarly objects that you know will kind of familiar with in order to to give an environment to test out our sign signposting profile or fair signposting profile. Martin Klein: And I’ll conclude this presentation with a quote from Luke.

30:32 - Martin Klein: Pointing at the simplicity of this approach, again, a point to, to our fair signposting profile at another point or to a GitHub repository where we invite you to provide feedback and, for example, join the discussion about the Martin Klein: The potential inclusion of links to license documents to convey that additional information in our signposting now fair signposting profiles. Martin Klein: So for that all all end do have. We are very appreciative of your time. Thanks again for listening for most happy to to foster a discussion and answer questions that you may have. Diane Goldenberg-Hart (CNI): Thank you. Thank you, Martin, and thank you, Herbert, for that wonderful presentation. Diane Goldenberg-Hart (CNI): Indeed, the possibilities with this approach seem very exciting.

31:21 - Diane Goldenberg-Hart (CNI): Indeed, and with that, the floor is now open for questions, I would like to invite our attendees to please enter your questions in the Q AMP a box and our presenters will be happy to. Diane Goldenberg-Hart (CNI): Respond. So just a quick question. While we’re waiting for attendees to type in their questions. I’m actually Martin. I was wondering, would you mind dropping the URL to the demo site in the chat. Diane Goldenberg-Hart (CNI): I have a feeling folks would appreciate that and enjoy playing around with that side a little bit. And if I understand correctly, this approach is already in production. Right.

I mean, are there are their organizations already making use of the protocol. Herbert Van de Sompel: So there’s been Herbert Van de Sompel: Early adopters of signposting in general. And when you go to signposting to Bork, you actually will see a list of, you know, organizations platforms that have implemented early signposting approach. Herbert Van de Sompel: When it comes to the fair scientist in profile. There’s no real implementations yet. But there are things ongoing so Martin’s experiment that he just demonstrated Herbert Van de Sompel: At dance.

We are currently working in the context of European project called iOS curb on an implementation of the fair signposting profile for the beat to share platform, which is based on in venue. Herbert Van de Sompel: That is also the basis of as an auto and then there is enormous interest in implementing it for data verse platforms. Also, so a dance. We use data verse Herbert Van de Sompel: But there’s other data verse customers. Let’s call it Herbert Van de Sompel: That are also really interested. So I wouldn’t be surprised that in the next coming weeks, we start implementation there, it’s still very early days, the spec has basically been out only for I think we started this three months ago and it’s been evolving still quite a bit recently. Diane Goldenberg-Hart (CNI): Great.

Okay, thank you for 33:32 - Martin Klein: That also kind of informed our decision to to install for the sake of this demo ad space, the space. Chris instance, because there was kind of the missing link some things. So we are very interested now and we have Martin Klein: colleagues that are very involved in the development of the corps de space code. And so we’re in touch with the with our friends and colleagues there to see how our lessons learned for implementing the fair signposting profile can be Martin Klein: Moved and merged into into the core code of the space and other platforms that you Diane Goldenberg-Hart (CNI): Got it. That’s great. Thank you very much. I appreciate that clarification, I see that we are a little bit past time here.

So, 34:14 - Diane Goldenberg-Hart (CNI): I’m going to go ahead and stop the recording of the presentation, but first I’d like to thank Martin and Herbert one final time for sharing this Diane Goldenberg-Hart (CNI): Good work with us here at sea, and I really appreciate this presentation and also to our attendees for making time to be with us here at CES fall Diane Goldenberg-Hart (CNI): 2020 meeting we hope will see you back, but my understanding is that Martin. Diane Goldenberg-Hart (CNI): And perhaps Herbert for a little bit, will be able to hang around if people want to stay back and ask questions or make comments, please feel free to do so. Diane Goldenberg-Hart (CNI): Join the conversation just raise your hand and I’ll be happy to unmute you. And with that, I will bid everyone a good rest of your day and hope to see you again soon. Bye bye. .