IPT demo

May 6, 2021 15:11 · 2821 words · 14 minute read

and i am logged in as an administrator on the ipt. gbif. org test website and the first thing i’m going to do is create a resource and for the purposes of this i’m going to create a resource, but then i’m going to switch to an existing resource that already has metadata in it so that i am not using this time to type up metadata. So this short name field when you’re creating a resource is what becomes your url in the ipt so it’s best to be descriptive for  what you are doing.

For today’s demonstration I’m using a invert paleo data set from the Paleontological Research Institution in Ithaca so I’m going to call this PRI invert paleo test and it is an occurrence type. We could also do checklist event or metadata only.

01:09 - And then I’m just simply going to create it.

01:17 - So I’ve got a resource now that is ready for me to populate with source data, mappings, metadata and publish it. So i’m now going to switch to the one that has the metadata and I’m going to begin by adding the source data. As carol mentioned in her presentation there are a variety of data types that you can use as source data or you can connect with a database connection. But for the purposes of today’s exercise I’m going to be using a text file and a csv file.

So the first file I’m going to add is occurrence data and it is in a text file format.

02:15 - Once you’ve selected the file you can add it and it comes into a confirmation screen. I can see that I have 39 columns in my data set. I can see that there’s one header row. It comes in with the default encoding for utf-8 which is typically what we use in the community to encode our data, but should it be something else there is an information button that you can use to select one of the other ones or you can hand type it in.

02:53 - It recognizes the field delimiters and there is a preview button that looks like an eye that allows you to get a preview of what that data in that text file looks like.

03:09 - So it all looks good to me, so i’m going to hit save.

03:16 - And over here on the right you can see I’ve got a text file added.

03:21 - Now I’m going to add my second file which is a csv file and this one contains images.

03:31 - So I’ll do the same process by adding it. I review the delimiters header row. I’ll preview it - you can see that i’ve got the image information and i will save it.

04:09 - Next up I’m going to add the mappings and the first mapping I’m going to add will be for the occurrence data so I will choose the occurrence core.

04:24 - We’ll click add and then I need to choose my source data. I have two different source data so I want to choose the matching occurrence file and click save.

04:46 - And the IPT auto maps what it can so if the occurrence file contain fields that were already named with the darwin core terms it auto maps those so I can tell at the top in the green message bar that it auto mapped 36 columns. But I had 39, so three fields still remain to be mapped. And I also get a warning that basis of record is a required field. So the first thing I’m going to address is basis of record, so i scroll down the list I find the basis of record field and I did not have that in my data so I’m going to use the predefined vocabulary that is available for this field and I’m going to choose fossil specimen and then I will save – and every time you save it takes you back up to the top of the page.

And so now, my message has gone away but I still remember that I’ve got three fields that I need to map. And over here on the left hand side you can see hyperlinks and these hyperlinks are based on the darwin core term categories so they match up exactly to what you will see on the darwincore website. So I’m going to click on the unmapped columns which will take me down quickly to the bottom of my page and I can see which three fields didn’t map. So I’ve got cat num, collector and sciname Well I know sciname would go into the taxon information so i’m going to click into that section and I will find scientific name and i will click the drop down and I will map it to my sciname.

06:46 - Then I know the other two fields are in my occurrence section, so I’m going to click there and I will add cat num to catalog number and then if you’re not as familiar with darwincore as I am you may not know which field collectors should go to and in that case you can use the information buttons to pull up the definitions. It will give you the definition from darwin core. It will give you a link out to the darwin core and it will give you some examples.

So from here I can see that the primary collector observer should go into recorded by and to remove the box, you just click on it again and it goes away. So I’m going to add collector here and then I will save again.

07:43 - So there are several other fields that have predefined vocabularies that you can add so type is one of those and this is a occurrence of physical objects and I’m also going to add the language of the data which is in english and then there are a few other really nice features in here there is a filtering feature and filtering allows you to specify data so say you have some sensitive data you needed to remove you could do that here or say you weren’t ready to publish all of your data you could we only want to publish part of your collection you can make a filter here to do that so in this example we’re only going to publish the class equal to gastropoda we’ll save that and then the other nice feature you can see examples of five records in the data so you can make sure that your fields are lining up correctly well I noticed when iIwas in the taxon section that in kingdom, my source sample says animals instead of animalia so there are a couple different ways you could correct that.

You could instead of using the kingdom you could unmap that field and you could type in animalia or you could use the translation feature and the translation feature will bring up a unique list of all the values in that field so in this case it is only using animals and i can add what the translated value should be and save it that will save once more here and one other thing i should mention about the filter is that there is a condition here that asks if you want to do it before translation or after translation so if for some reason we were filtering on animalia then you would want to do it after translation in this case it really wouldn’t matter if i did it before or after so the mappings are all complete for the occurrence now i’m going to go back to my edit page my manage page and i’m going to add the second mapping so this time i have images and i have a choice i could use simple multimedia or i could use the audubon media description and for today’s purposes i’m going to use the audubon media description i’m going to click add and then i need to choose my source data again and this time i’m going to use the image file now the audubon media description can be comprehensive of other media types besides images so you might have sound or video this particular file only has images so i hit save and in this one all of my fields mapped and so i don’t have any unmapped fields to add or change and i do want to point out that this file contains the occurrence ids just like the occurrence file did so this identifier here uniquely identifies each image and then that links back to each unique occurrence or multiple occurrences sorry i’m not saying that right occurrences can have more than one image associated with it so you might have one occurrence number with two or three different images associated and the i did buy a website has some recommendations for um preferable i uh their preferred identifiers and the other important piece is the access uri which needs to be an online object sorry an online url so that the images can then be uploaded to the image galleries on say the id bio portal or the gbit portal so i’m going to save this one now and i’m going to use the hyperlink next to resource title to go back to my manage page and so now you can see i’ve got two data sets two data files and i’ve got two sets of mappings and now we’re going to look at the metadata um i mentioned that i pre-populated the metadata for this and we’ve got um since i did some editing of the metadata and we’re only publishing the gastropoda for this i’m actually going to change this to and the reason you want to fill in your metadata is so that the end users of the data will have a good idea of if this data set is fit for their use this is also where you set the publishing organization for this data set and in this case we’re just using a test organization this is also where you set the licensing for or waiver for the data set and gbif is allowing three waiver or date licenses or data waiver we’ve got the creative commons ccby the creative commons ccdy non-commercial and then there’s the public domain you can fill in a description and you have all of your contact information and you can hit fade and then as you work through the the metadata pages in ipt um this is the ipt is a really lovely way to add all of your metadata and i i feel like having worked with burtnett for so long and as an aggregator that with the ipt metadata has gotten much better because it’s a lot easier for people to do the data entry on it so there’s a geographic coverage section there’s taxonomic coverage and i’m going to add taxonomic coverage here for gastropoda and i’m going to select the rank of class and save and then there are a lot of other sections that you can continue to add metadata and one of those that i want to look at is citations i was beginning to say gbith is beginning to here at the secretariat do some really nice things with citations and the doi’s that kyle mentioned earlier and to be able to report on those kind of things within the portal so you can hand type a citation here or you can use the auto generation for a citation and this auto generation uses the data site preferred recommendation for how to cite a data set i’m going to use that and those are all the changes i’m going to make to the metadata right now and i’m going to move back to the manage resources screen and now we’re ready to publish because i know that i want to register this data set with gbith i’m going to go ahead and make it public but if i wasn’t ready to make it public yet i could publish a few times um before i make it public but i’m ready so i’m going to hit the publish button it pops up with a box for that you can put in some sort of summary about what you’re publishing so i’m publishing a new data set for pri coda you can use this you don’t have to but you can use it especially if you’ve made some changes so maybe you only updated your metadata so you could put in here that you the data set is republished but it was just for metadata changes or say you did a whole lot of new digitization you could indicate that you’ve added 10 000 more records or this data set now contains images so there’s a lot you could do with this to just give some more information about this version of the data set so publishing is beginning and it happens very quickly because this this file is very small um there’s only 42 gastropoda records after publishing it completed successfully and it gives me a series of log messages that we can review to see what happened so it says 70 lines did not match the filter criteria and it only wrote for 42 records it actually created 201 records for the media it added the eml file it added the meta file that carol talked about both of those and then it began the validation process so it validated um that basis record is always present because that was the required field and then it also validated that occurrence id was present that’s another required field and that it was unique and that every record had and then it did similar validation for the extension and then it said it was completely validated it compressed it and that it was generated successfully so at this point i’m going to go back to the resource overview and typically i would actually download it at this point and check it all before registering it with gebis but i’m ready to register it we’ll say that i’ve already checked it all so i’m going to click register and it asked me to confirm that i understand the giba sharing agreement and i say yes and at this point it’s contacting chiba and registering the data set it tells me that it successfully registered the data set in the registry and if i go down here to the registration section i now have a gbit euid with my test organization and it was endorsed in this case by danvis which is the the danish node for jeebus and i’m going to i’m going to give it a little bit of time to index at gbif and in the meantime i’m going to download the dharmacore archive and i want to show you what the metadata looks like in the ipt so i’m actually going to go to the home page so you can see what an end user might encounter from the home page you can open open up the metadata page and that’s that url i originally created with that short name you can see all the information i can see how many records are in the occurrence file i can see how many are in the extension file and i can download the darwincore archive and in the version section i can see that note that i added i’m going to download the file and then quickly i will open the file and within that archive that compressed file when it was decompressed you can see it’s got the eml the meta there’s the multimedia extension and then there’s the occurrence file so just like that image that carol showed in her slides and now i’m going to see if the data has indexed at g bus yet and this is a test site so it has you can see all of the metadata you can see the 42 records you can see all the records and i wanted to see some of the images image and you can see a record with detail with an image so you can see that this is near real time now this was a small data set so it was pretty simultaneous but a larger data set may take time more time and then you also might encounter being in line behind other large data sets so it can vary how quickly the indexing happens but for our example today it happened very quickly um so i that’s where i’m going to end my demonstration and i’m going to hand it back to kyle and he’s going to finish up.