Thank you to our Captioning Sponsors: balsamiq and MIT Libraries hello everyone welcome back from breakouts and break to today’s lightning talks section lightning talks just as a quick recap or a segment where if you have an idea a project or a practice that you would like to share with the community this is your chance to do so all of the lightning talk chat and q&a will be in the same session listing within whova all uh eight of them that we have today a little different from the talk blocks that we’ve had previously where we moved from session to session if you’ve got a question for one of the presenters please enter their first or last name and then your question they’ll all be presenting live so they likely won’t get to address questions until their presentation is over and that will let them find the questions that are for them our first talk is going to be from hardy titled load testing with locust.
io or how to make your prod-like server cry take it away hardy all right i’m going to push all these buttons here and try to share screen all right can you guys see that all right all right i’ll start hi my name is hardy pottinger and i’m a publishing systems developer with california digital library and this talk is all about load testing with locust. io or how to make your production like server cry uh hold up i have to say this first it is not okay to make a person cry servers or apps can’t cry in order to load test you’ll have to get past seeing your app as a person load testing is something you need to do when you find yourself asking how fast can this thing go this is a stage of any web development project you work for weeks or months and are getting ready to deploy and somebody finally asks the question how fast can this thing go or maybe have we given this thing enough iron enough memory enough cpu are the specs sufficient for the load we expect oh man what kind of load do we expect this one i can’t help you with but if you want to know the upper limits of what your web app can do given the kinds of iron you fed it locust can help the download and docs are available at the main site i recommend reading the quick start it’s especially helpful for basic web app load testing but not everything you want the load test falls exactly into that category i’ll talk a bit about api testing on the next slide but for now remember that locust is written in python and it’s organized you define tasks which can fit in the sets of tasks which can be organized in the sequences here’s the process of working with locust first you design your test you figure out what question you’re trying to answer how fast how many users can it support imagine the activity you want to replicate related to the question you’re asking make a checklist of each activity and then pick one to focus on first in the case of an api it’ll be a get or a post on an end point you probably know which endpoint will be the slowest or at least the one that will see the most use finally identify your environment probably staging but anything production-like will work just not prod okay then encode your test i’m glossing over this a bit but you know it might look something like this it’s python there are lots of examples on the locust site run it locally at first ask your teammates to review the results screenshots or screen share whatever works and here’s the dashboard what the dashboard looks like notice at the top it tells you what host you’re targeting and how many users which you can change without stopping the test you can stop the test at any point with the big red stop button and you can reset the stats at any point and then revise and repeat until you’re sure your tests are testing what you need to test each new test can be added to a task that write as many TaskSet as you need then organize your task set into a sequence now keep in mind locust is going to create hundreds thousands any number of simultaneous connections so start small you might not want to include a user login in your initial testing plan unless that’s the thing you want to test okay now you’re ready to test your production like server i remember the first time i ran locust against an api we were working on i asked my teammates i don’t know what to put in here how many users i was thinking about our api as a person how hard should i push our api they said hard so i did and we kept increasing the numbers until we saw exactly how many simultaneous users the system could support before locking up from the graphs and logs we were also able to see exactly when the system started struggling which is a cool thing to know and it’s exciting when you realize what you found but don’t forget that the visualizations offered then look in the locust dashboard aren’t saved so if you see something you want for later screenshot it or print it to a pdf or if if you do forget just run the test over then look over your data and try again if you need to and report your results and have fun any questions ping me on slack or whova great thank you hardy that does look like a question for you did come in on whova our next talk is going to be jacob uh with a presentation titled beyond passwords a login server for distributed identification all you jacob it seems like we may have lost jacob accidentally to technical issues so let’s just move on to the next presentation after that which is going to be ben with git stat keeping track of local git repositories sync status if you’re ready for that then you can take it away yeah sure thanks um so hi i’m ben companjen and i work as a research software engineer slash digital scholarship librarian at leiden universities and there we go um so in my work i create software and i look at lots of other people’s software so i can help researchers do their research more efficiently often that means that i create or clone a git repository in the git folder in my home directory i i wouldn’t call myself a hoarder but i got quite a lot of them um so these folders contain either toy projects or projects i checked out to to work on to provide bug fixes for or production codes that is distributed via github or gitlab well git is as you may know a distributed version control system i develop on my laptop but i want to push the results my changes to a centrally hosted repository like github or gitlab and i say shoot because well i don’t always do that then my laptop that my laptop needed to get a new battery and there was a good chance that my drive would be white would be wiped so a slight bit of panic yes i do keep backups with a time machine but um yeah i wasn’t sure that that kept that yeah that includes everything that i want uh plus i just need to get it up to get it to the central hosted system um so i wanted to have a tool to see which local repositories have changed that have that are not online that have changes that are not online so i created git stats uh to help me with that uh gitstat is a shell script that goes through all the files or all the folders in my git root folder and for each folder it yeah it outputs the folder name the current git status in porcelain mode which is very readable for computers it has all the untracked files and even shows stash but right so i can see all the untracked and modified files then it adds all the branches that are that i have it lists all the remotes if there are any and then a demarcation so that i know this is where this ends and uh yeah this all goes in a status file uh this script is run every 10 minutes thanks to this plist um launch agent um it’s kind of like a systemd unit descriptor for mac os but those are details as i said everything goes into status.
txt and right i can see that uh this repository has um an untracked readme file and lots and a few others uh it’s just a huge file so what do i um yeah i need something to um to monitor whether this is good whether i’m getting to a better place um so i i need to summarize this um so that i can also see the numbers and see that they go down over time um i had already been playing with prometheus and graffana for monitoring so why not use that i know that prometheus can read text files if they’re formatted correctly so my first attempt was yeah to create a prometheus style text file that lists the number of lines and this is a bit of ugly grabbing and echoing but it works and i get a git dose prompt file that prometheus can read so from here i see that the total number of lines the number of repositories i can already see that there are two uh directories that are not git repositories so that’s kind of bad um but yeah to show that i can indeed see these files in grafana i haven’t done anything in yeah since in in the past couple of hours but eventually this should go okay it looks like we had some more technical issues pop up there losing audio but we were also just about out of time i hope everything is okay with ben let’s move on to the next talk which is going to be lynette with prioritizing and organizing user stories around accessing authoritative data yes so this one is a video i’m lynette rail a developer at cornell university working on the link data for production grant i facilitate the best practices for authoritative working group i’m going to talk about the outputs of the first charter of the working group a core principle of the working group is to include members from different parts of the authoritative data pipeline the group includes authoritative data providers to improve understanding of how their data is being used it includes data consumers who represent the cataloging community and other uses of authoritative data in libraries it also includes developers that access authoritative data apis and create applications and tools that make that data available to consumers as part of the workflow the first charter focused on a common understanding of needs by defining user stories from the perspective of each of these roles defining 32 catalog user stories 44 application developer user stories both from the ui and backend perspective and 38 provider user stories once the user stories were defined we engaged the broader community in helping us to prioritize the cataloger user stories we reached out to the pcc community with a survey that allowed respondents to put their stories in buckets ranking their importance based on their own work from these we created four prioritization categories based on the ranked level of importance the labels in the graph are highly abbreviated but at the end of the presentation there is a link to the full analysis of the survey if you want to see the details of the user stories to give you a sense of the user stories these are the top five in the priority graph include extended context that is additional information about each search result that helps the user make a more accurate selection filtering search results by class type for example searching names limit to person names filtering out organizations and other types of names when searching for an exact match the user wants to know if that exact match doesn’t exist when editing a resource like a work that you want to connect to authoritative data like a subject the search needs to include the uri for each result this allows the resource to link to the exact result that the user selects and for authorities that are hierarchical include the broader and narrower terms for each search result with the full set of user stories defined and the results of the survey we created a document that organizes the user stories the primary focus is on cataloger user stories developer and provider user stories are listed with the cataloger user stories they support related user stories are gathered together user stories related to searching are together that is search results for an exact match left anchor search keyword search user stories related to refining results through filtering are together and user stories related to accurate selection like additional context relevancy rating those are together and each section also includes the priority that was assigned based on the survey so this is an example from the document of the section that gathers user stories related to performance within this section it further breaks into directly related user stories that impact performance for example the first subsection under performance is search results return quickly since it rated as a level one in the survey it is marked as having priority level one there’s a fuller definition of what we mean by search results returned quickly then the cataloger user stories are listed followed by supporting developer and provider user stories the next subsection which is time out gracefully again with a priority definition and user stories this is the basic pattern that we follow in the document first charter is complete we will soon be starting the second charter and we’ll be focusing on change management for authoritative data we’ll work to produce documentation on common types of changes in authoritative data create specifications on how best to represent those changes and make some recommendations for tooling that providers can use to create change documents and cache maintainers can use to consume those documents we’re in the process of identifying members for the second charter so please do express your interest if you would like to help move this work forward i’ve included links to the output of the first charter and to the main page of the second charter feel free to reach out to me if you want to learn more about this work or become part of the working group great excellent presentation lynette thank you and our next talk is going to be from anna titled mining labor records and collections as data at the university of utah anna all you okay is everyone seeing my screen okay all right um so i’m anna neatrour i’m interim head of digital library services and digital initiatives librarian at the university of utah and i’m going to talk a little bit about a collections data project that we have completed recently at the university of utah i’m really presenting on behalf of a team and so that’s all of us me and jeremy and rachel and our contact information is right there so when you’re talking about uh collections as data um a basic definition of it is that you’re trying to encourage the computational use of digitized and born digital collections at the university of utah we were also really interested in developing historical data sets based on our digital library materials because we have a newish digital scholarship center that’s housed in the library and not a lot of utah-based historical data sets that might be interesting for students and faculty to work with in a digital humanities or digital scholarship complex context and also collections as data is just kind of cool in utah we have the kennecott copper mine it is the largest man-made excavation and deepest open pit mine in the world there’s a photo from special collections of a miner working in the mine we have uh records from the copper mine company so over forty thousand records from approximately 1900 to 1919 and it contains detailed information about the miners so we were able to get all of these digitized for free uh through a partnership with family search but they weren’t going to index it for us so we needed to come up with a way to make this information in the records searchable and the collection that we were working with is just a small fraction of a much larger uh kennecott copper mine collection that we have in our special collections here’s two extremes of um sort of the difficulties in some of these mining cards we have one with incredibly faint text and then we have one with almost too much text as every single job for this minor repeatedly got annotated and added to on the same card we did some initial trials and we decided to focus on this area highlighted in yellow that contains demographic information about the miners because it was more structured and also more likely to be present on most of the cards there’s a lot of interesting employment information on these cards but it’s also very inconsistent and we decided to leave that for a later date there’s a little bit of tension between dublin core and standardized practices and what you might want to do if you’re developing a collections as data project so we had to create a very customized uh template for this project and introduce a lot of non-standard fields into our digital library repository which i usually don’t like when other people do it but then when i do it i feel like i’m justified and i think the results of this project were really interesting so we we do have things like eye color weight and height now as fields in our digital library repository so before the pandemic we had around seven thousands of these records transcribed and i really thought that this would be a project that we just kind of keep in the background um for folks to work on and maybe like in five to ten years we’d have all of them transcribed but when the work from home orders uh went out across our university we like many places decided to spin up additional transcription projects and so our students who would normally normally be working on scanning projects for us pivoted to transcription instead and they transcribed these directly in our metadata management tool and the records were then reviewed before we added them to our digital library once all the records were transcribed we’re able to explore these a little bit and so here’s some visualizations that my colleague rachel made in tableau so we can see uh where which different countries um the miners were coming from and so that the u.
s would be the top country represented isn’t really a surprise but followed closely by greece and japan we can also see employment by nationality by year and kind of see different trends in the data based on world war one based on things happening in the countries of origin for the miners we were talking with demographers about the project and they were super jazzed about being able to calculate bmi for minors um this this doesn’t look very dramatic to me but i’m just putting this screen up there uh just in case anyone’s interested um and a little bit a couple factoids the youngest miner was 15 years old and the oldest in that uh data set was 77 years old so this really encouraged us to develop new workflows think about our digital collections in a new way and really think beyond our digital library repository in terms of um thinking about what we can do to make collections accessible to people and we have a article available open access that talks about this a little bit and our other collections as data projects we have a github site where we have all of our collections is data materials as well as a separate uh github repository just for the kennecott miner records um and i’m really curious to hear more from the code4lib community about automated methods of dealing with transcription work that might help us in the next phase of our project thank you very much thank you anna our next presenter is going to be erin with a presentation designing for the most or a bellwether speaks all you erin hello okay so can everybody see my smiling bitmoji mug bitmoji is clear fantastic so hi folks just a visit from your future here i’m the ram with the bell around its neck i’m erin white erin r white on twitter this is my 11th code4lib i’m head of digital engagement at vcu libraries in richmond virginia i’m also the interim digital collections librarian for the past five years or so shout out to everyone who’s holding an interim appointment or who has absorbed vacancy in your area i know many of y’all have been doing this math too the past year in particular brought so much hardship across all vectors of our lives and at work that likely included layoffs retirements and other departures i’m in a relatively good position i get to say how much of this work has to get done and it turns out half-assing a job for a quarter of my time means projects move really slowly or not at all i’m sharing this with you not to complain and it’s also not an indictment of my library i share it because i think this is where we’re headed the early aughts were a boom time for mass digitization and library investment in digital collections it’s a time of huge growth and excitement in digital libraries but y’all library budgets are not getting bigger anytime soon it’s not that we’re in temporarily tough times i think this is just how things are and will be it sure seems to me that digital collections work and other types of important and often invisibleized work in the library will continue to be deprioritized when budget conversations inevitably get tough i won’t tell you not to hope and fight for the absolute best but i will tell you to plan for the worst or rather to plan for the most because this is where most most of us are headed and it’s not necessarily the worst it’s just way different there are a lot of ripple effects of disinvestment that i could talk about but i only have a few minutes so i’ll talk about the ones that haunt me the most at code4lib 2014 sumana harihareswara gave a keynote that i’m still thinking about seven years later she talked about the last mile problem the largest hurdle we face in making things usable she gave many good examples and even wrote it up into a code4lib journal article so the bottom line is that many people don’t use services even ones that are quote best for them because they’re simply not usable here’s a picture of the most beautiful bus stop in richmond virginia it’s my bus stop it’s not my house though while this bus stop has the loveliest views it has zero amenities it’s inaccessible for many of my neighbors it only works well for me because i have a smartphone and i can walk quickly and dodge traffic if any of those things were to go away or if the weather goes south i can’t use this service easily this example is the very literal definition of the last mile problem one of the ways the last mile problem has manifested in my work life has been that even after a year and a half of using islandora for our digital collections we still haven’t figured out a workflow to batch upload collections we have added only one item to our digital collections since fall 2019.
first of all as i said a few slides back this is a result of disinvestment we’ve had a vacant position for years this is also a documentation problem to get our process sorted we’ve been hanging on every word of this seven-year-old blog post that’s only accessible through the wayback machine shout out to the wayback machine this is also fundamentally a last mile problem this process assumes scripting experience and staff time to troubleshoot each bulk upload i’m actually ashamed to admit this i feel this failure in my body i know that if i carved out two solid days i could probably get something working right it seems so fundamental it should be simple if i just tried harder if i just had more time but this isn’t about me and this isn’t really about islandora either and i know a lot of this is fixed in version 8.
again this isn’t about islandora this is about beautiful bus stops that only a few people in good circumstances can use we can and must design more usable things for each other so i ask you to think of this how can we adjust the angle of our vision to set our sights on each other instead of the distant horizon of another cutting-edge revolutionary technology that will solve all our problems the object what if instead of thinking of this as planning for the worst we instead seeing it as planning for the most because most of us are pressed for time for money for the brain cells to rub together to create new workflows by considering institutions that have fewer resources we actually end up designing for everybody because the center is not holding the dividing line between have and have not institutions is only getting stronger with fewer institutions in between as cultural heritage organizations we’ll continue to become interdependent on each other as time goes along consortial collectively held platforms and communities are the way we need to go code4lib itself is a model of how this can work we can make this work so consider this an invitation let’s keep building the future we need together and i hope you’ll read this open access version of design justice because it really got me thinking in this direction so thank you thank you a ton erin our next presenter is going to be ash with the presentation using xpath 3.
m eastern or 3 p. m pacific where you can sing your heart out with other code4libbers over the internet it will be a fun time if you have a song that you do want to sing though please use the google form posted within the virtual meets thread on the whova community board so that we can also queue up your song in the playlist in advance for smooth singing don’t forget also tomorrow we have our trivia nights in the evening space is limited for that event so if you can please register in advance through the zoom registration link in the virtual meets thread on the whova community board in addition the community support volunteers for this evening’s karaoke night are and i do apologize for how bad i’m about to butcher these uh michelle janowiecki or m janowiecki on slack and mike giarlo or mj giarlo on slack with that thank you everybody for coming today and we’ll see you all tomorrow.