[OLS-2] Cohort call 3 - week 6 - Open Science I - Open software, hardware, data, and agile.

Oct 8, 2020 10:38 · 11373 words · 54 minute read essential reading different kinds repositories

Malvika: Cool. Welcome, everyone. This is Week Six of OLS-2. This is our third cohort call. And what we’re going to do today is talk about how we develop our project because there have been some discussions around how to start setting up your GitHub how to start thinking about different logistics you have around your project or the community. So some topics that we will talk about are a child development, we have Renato on the call, who will give a talk on that, we will talk about open software, we have Alex who will speak about that open data, Paula is on the call from Australia. And lastly, we’ll talk about open hardware given by Juli who is in Argentina, so we have recorded video from her. As usually, please make sure that you have your name written in so that we can attribute your contribution to the hack empty properly. We also have nice icebreakers so we can collect some useful software for all of you.

01:04 - So with that, I want to remind you, we have a code of conduct which is applicable to this call as well. So if for any reason, you have to report something, you can send an email to team at open lifesite.org. Or if you need to talk about something with one of the organizer, you can also use our personal emails. And with that, I’m going to give it to you who’s going to explain you how we’re going to conduct our breakout drums. Yo Yehudi: Thanks Malvika, and sorry for my computer malfunctions earlier, I ended up going in getting the computer I never used from downstairs and running and plugging that in and hoping the zoom version was up to date enough.

01:46 - So anyway, today it is to talk about the breakout rooms. So in the past, we’ve mostly used spoken English breakout rooms. And then in the last quarter, we experimented a bit with having both spoken and written discussion breakout rooms. And there’s a few different reasons for this. But mostly it’s to make sure that everyone can actually feel included and participate.

02:10 - So we have a small link, if you look around about line 60, you can see there’s some more information about the different types of interactions we have, both for the written and the spoken. And you can also catch up on that later. It’s not essential reading for now. But if you also look a bit further down in our Roll Call, you can see that we hit you have the option to choose either the written or spoken or both. And if you don’t have a strong preference, then please do select both. That just means that when we are when we are selecting different rooms, that we can mingle different groups between the two. So I would just ask actually, everyone now who hasn’t actually added the emote to for the breakout room preferences, if you could just go down to where your name is, and the roll call.

02:59 - This is roughly from I think line 74 downwards and just add one or both of those two emojis, and a little bit of a guide. So if you’re in a spoken room, generally this means that you can use English language speech to communicate with regards to whatever the breakout room task maybe if it’s a written room, there’s two ways you can do this. The first is you can use the Zoom chat. So when you are in zoom in a breakout rooms, zoom chat is private to the breakout rooms. So everyone might be listening to your conversation. The The downside of using a zoom chat is that it can’t really thread very well.

03:39 - So they’ll only ever be one person talking at a time. And if you want to address something that’s three or four comments up above, it can be a bit hard to follow. So the other way that you can do this as we have prompts in the hack MD, which which you can use and then you can use bullet points and sub bullet points as a way of actually communicating and following threads. So for example, we have a breakout room further down quite a bit further down. I don’t know what line it is right now just scrolling to try to find it myself.

04:12 - Malvika: Line 233 Yo Yehudi: You’re a hero Malvika Thank you so much. I just like scroll, scroll, scroll, try not to miss it. So yeah, 233 you can see some examples where we have some prompts for different written and different spoken breakout rooms. So is that reasonably clear? Can I have some thumbs up or thumbs down? If it’s a yes or no? Awesome. Okay, right. Seems reasonably clear. Thank you very much everyone. And we are definitely iterating on these and we will also very much appreciate any feedback or suggestions for improvement that we may have in the future.

So 04:50 - definitely, you know if there’s anything that doesn’t go quite right, don’t worry about sharing that with us. And okay, what’s up next malvika? Malvika: we just went To say that you have finally chosen a name for your cohort, your card is called the mask cohort. We had 11 votes on that last time I checked, I would say that we will close the poll, if you don’t like the name. We can also find out if you want to call it something else, but let’s just call it The Masked Cohort. Hooray, great job finding a name. And with that, we’re gonna get started. I’m gonna hand it over to Emmy about open science one. Emmy Tsang: Hi, everyone.

So um, yeah, this is one of the first 05:34 - Well, the first two calls on open science. Open Science is a very broad topic has many aspects and means different things in four different people. So we’ll be covering some of the aspects of open science projects in two calls. This first call will cover iterative project management, which is really important for collaboration. And then open source software for reproducible peer reviewed code, open source hardware for affordable and maintainable equipment.

And then finally, open data sharing research 06:09 - outputs. And then in the next call, we’ll talk more about open dissemination. So open access preferrence, open education and citizen science. So first talk that we have today is from Renato. He is a mentor in this cohort with us and a mentor and mentee last cohort. So I’ll hand it over to him to talk about agile and iterative project management. Renato: Okay, hi, everyone. Let me just share my screen. Can I get some thumbs up from Emmy or malvika? Yep. All right. Awesome. So, yeah, so today, I bring you the agile and interactive project management methods as a topic. First of all, I would like to say that I am not an expert on these topics, but I do use some of the principles and some of the ideas that that this framework gives us. So what First of all, what is actually agile, so agile is a kind of like a combination of different techniques with and you might see some of these keywords popping around, if you google these terms, you might see Scrum Kanban, waterfall, and so on.

And, and what these terms 07:29 - kind of convey are just different strategies to organize and to plan a project. The, the Agile principle, or the Agile kind of framework, also has a manifesto. And you can find this manifesto on the HR manifesto.org. website. And, and the name itself, and the process is mostly focused on delivering content as quickly as possible to minimize, like bureaucratic steps or things that are not directly directly going to produce an output. And so for instance, the the emphasis on prototyping and working solutions, rather than documentation.

08:18 - or, or, for instance, actually interacting between people as opposed to just establishing bureaucratic processes. And, and the name itself a dial is, is in part as a, as a need to respond to change. And this is something that you will see often in, in the business world. And and I’ll explain how this kind of plays in in the next slide. So if you’re if you’re familiar with the more traditional way of planning a project, you have often requirements set up from from from early stage, then you have a phase where you just design and you kind of try to plan and to think of all the possible situations where things can go wrong, or you want to make sure that you cover all the cases.

And eventually you get to 09:05 - a step where you actually implement the plan, you put the plan into action. And And afterwards, you validate and you you kind of have a delivery in the end. And this approach means that the time that goes between the start of the concept, all the way to the delivery of the product is a very large period of time. And therefore, it’s considered a high risk kind of approach, because you don’t really have anything until many months later or many years later, depending what you’re trying to build. To counter that the HL approach is a little bit more dynamic.

And the focus is that you have these 09:46 - these kind of iterated cycles, where you focus on milestones you focus on achieving small goals, small deliverables, and and this will actually give you a better notion of progress. And also a better notion of how, how your how your, how you are succeeding in achieving those milestones. So it gives you both the progress and something to show and to and to, to kind of feedback on. And then from there, you can also learn what worked, what didn’t work. And you can kind of go back to planning and readjust if anything needs to be done. And so this is a kind of like a cyclical process that goes back and forth. So this process, I didn’t specifically mention this, but this was a kind of a framework that was designed for for software development, primarily, although there’s a lot of things that we can learn and apply to other projects. And how do you actually take this to your project? And how do you apply it, so you can think of it as a little bit of a Matryoshka type situation where, in the end, the biggest matter you ask is your final product, this is what you want to deliver. But you’re gonna have to break down all the tasks that you have to do, you cannot just think of a project and do it all at once, you’re gonna have to identify small steps that you can do. And the key aspect in the in the agile approach is actually to break down these processes into Are these the like, this big project into milestones that will be like the, the intermediate level materials, in this case, and, and then tasks or even sub tasks, or issues in the in the case of GitHub, for instance, where you have smaller steps that can go between one and two hours, or at most one day of working, of working time.

And, and so how could this be 11:46 - made into practice. So borrowing an example that, that you set up from the from the inter mind project, you can see this kind of visualization of the several different issues. Each column represents a different version with an expected timeline and delivery date. And then each of these little bubbles, or you can think of it as post it’s as well, that’s where the Kanban kind of concept comes from. It has a bunch of labels describing what they are, it has a title, and you eventually can assign or you can move things between the panels and so on.

And the idea 12:24 - is that with this, you have a very visual overview of where you are in the project, what needs to be done. And you can also prioritize things by just moving them up and down in this kind of visual list. And as soon as something is complete, then it can be hidden from this view, or it can be moved elsewhere. So in this case, this is a is focused on version, so each of the columns is a version. But the more common approach is actually to have kind of like stages of the progress.

And here 12:54 - I’m borrowing the, the project from my actually, so there are some open science Montreal project where they already started using this, this approach. And you can see that the the GitHub provides this standard to do in progress and done and, and you can, you can see how you can plan things and leave them on the to do and then everything that is being worked on is in the in progress section with assigned to different people and so on. And, and one additional thing, so mentioning this, this breaking down of tasks is that with GitHub, you can have not just issues, but you can also have within an issue, you can have, like bullet points that you can make into checkboxes. And these will be recognized as a smaller steps within that issue that you can then complete and, and have as a as initial progress for for the task. And there’s also a way to somehow automate this process, I have to be honest, that I’m not super familiar with this.

But if you 14:03 - if you want, there is a way to simplify this process further. And, and but so in general terms, what what what are these issues or what what could you make into an issue. And and so you can think of For instance, if you wanted to make a website, you could try to break it down into creating content for the website, as well as Oops, sorry, as well as creating or finding a domain to actually have as the address for your website in there, you would have to for instance, decide what domain that would be agreed between everyone eventually purchase that domain and set it up as as the redirect for the GitHub account. And and similarly, you would have to create the content and then the different parts of actually creating the content, how you would go about it and different sections of your website and so on. And you can think of each of these bullet points as an issue.

15:00 - You could even think of these inner ones as, for instance, these check boxes that I was mentioning on on GitHub, for instance. And so there’s also a live demo that has been set up by by by the lls team. Before, I will make sure to add that link to the, to the handy. And and I’ll add some additional resources as well, if you want to read a bit more on that. And with this, I think we can move to any kind of questions into the discussion. I will. Thank you.

15:39 - Emmy Tsang: If you have any questions about Agile, or any of the things that are not mentioned, please feel free to put your questions into the chat can be where on the line 213 at the moment with questions. There is a lot of interesting conversations going on. So if there’s anyone Would anyone like to ask a question? verbally? Maybe? yourself? Renato: Yeah, so maybe picking on Sara’s point on collaboration. So the the exile manifest does emphasize collaboration as a key point or interaction between people as a key point, but I guess this will, this will depend a lot on what you what you want to do. Obviously, if it’s a personal project, this is kind of like entirely up to you.

But if at 16:35 - any point you decide to involve someone else in that project, then then I think it makes sense. And, and even as a, as a general overview, I do use this project organization structure, in my own personal projects, to have like, milestones and to break down tasks into smaller ones, and so on. And this is why I was saying that, uh, I’m not really an expert, but I do use some of the concepts, and they don’t really need to be rigid, like, you can make them as flexible as you want. Emmy Tsang: Okay, there’s also a question from David on the hack MD, if I could just repeat that. He’s really interested in the psychology and motivation of assigning people to tasks, do people self as I get asked, or if someone just assign them without discussion, and in the real world, without all what has worked the best for you? Yeah, so so there, I had mixed experiences. So it depends a lot on the project.

And it depends a 17:33 - lot how dynamic the project is. Renato: And it depends how you actually want to run the project. So in some projects, especially open source projects, you don’t really have clear responsibilities until you get with a high level of involvement with with the team. So it’s kind of like, whatever is open as an issue, you could potentially contribute to that. And in in a more structured project, you would have parts of a project that are responsible, like different people are responsible for different parts of the project.

18:08 - And and so you could use for instance, labels and so on to assign to your to mark your issues as belonging to these topics. And and I believe that in GitHub, you can even have the labels kind of automatic feedback to some people so that you can then assign yourself or you can pick from teams, I’m not entirely sure if it’s possible to have more than one person assigned to the to the same issue. But yeah, but usually you would like for, for efficiency reasons, it’s better to have one person responsible coordinating all the all the activity within one issue, I’d say. Aleks: Thank you. Emmy Tsang: Thank you. Renato: It’s also a question from Kate, concerning sort of the frequency of attending to this. So how often do you tend to log on? Is it like every day? Or is it once a week? Yeah, so that that also depends a lot on the project, and depends how much time you can dedicate to it.

So I have some projects 19:12 - where I go, I don’t know, on a almost daily basis, even hourly, I have the tab always open. And and I have other projects where I go once a week, or even once a month, depending how much time I can dedicate to them. But But I would say that if you want to use this properly, then then you can try to follow this. Getting Things Done approach as well. I can add a link there’s a there’s a very popular book or kind of oldish book on on this topic. And and it’s and there you can also have a bit more of this breaking down of the different phases.

So you first you kind of have a visual, you 19:49 - define all the tasks that need to be defined. But then at the end of every day, you also want to reassess what was achieved what was not and maybe plan that with the tasks that you want to do the next day. so on. So you can also break this process down further into, like actionable steps or because it inevitably if the project is big, you’re going to have an overload of issues. And and you want to kind of prioritize some of them so that it doesn’t overwhelm you if it’s too many. Emmy Tsang: Thank you very much. All right, we are just on time, thank you very much for not so for all the insights and sharing your experience with all these methodologies.

And next, we have Aleks, who will be talking about 20:36 - software. Aleks is an expert in this cohort, as well as the last one. Aleks: Hi, everyone. It’s really great to be here. Can you see my slides at all? Or not? Emmy Tsang: Oh, yes, you can. We saw them before. Aleks: Sorry. Oh, sorry. I okay, because I couldn’t see the green frame around my window. I thought it didn’t work. It’s perfect. Okay, so my name is Aleks Nenadic. I’m the training team lead at the software Sustainability Institute.

I’m 21:12 - based in Manchester and I’ll be talking to you a bit about open scientific code in research. big thank you to Yo for borrowing her slides. From her workshop on how to contribute to open source. There is a link to that workshop as well, if you wants to, to visit, she made my life much easier. So I just modified her slides slightly. Big, big thank you. Okay, so what what do we mean, when we say, open source? The first thing to know is that open source is not the same thing as being free.

So you 21:50 - have some free software, which isn’t open source. So it’s free to download, you can you can, it’s free to use, however, you download some binary executable, and you don’t get to see modify or reuse the source code itself. On the other side, you also have some open source software, which isn’t free, it doesn’t happen very often. But for example, you have you have software which is made openly available. However, sometimes you have to pay for support from the team, or some extra help.

So even though the 22:25 - source the source, the software is open, you still have to pay for some services. However, there is an intersection between the two where you have open source software that is also free to use. So what is option so sexually, it’s simply sharing the design of your work so that it can be reused and remixed by others. And it can be anything we are talking about open source code, however, it can be a methodology that you can make publicly open. It can be an algorithm, it can be data, it can be your photos, it can be anything that you’re working on.

23:03 - Why should you use and promote open source source, so just a bit on motivation. So the first thing is that once you make your work or your your, your code public, it can be peer reviewed and be used by by others. So you get second pair of eyes, you get more users for your code, there is something that we call hacker ethic where, which means that nobody should have to reinvent the wheel. If you benefited from someone else’s work, you should give back to the community by making your work open as well. This is also making the science advanced faster, because people don’t have to redo the same thing all over again.

23:48 - It also helps you get more from the community so you can collaborate with more people who might be distributed around the world. So you share your work others share back and it’s it’s a bit of a back and forth interaction. And ultimately you want your work in research to to be reproducible by others, and making your code and your data and your methodology open is the first step on that on that road. Again, a bit of motivation Why? Why Why would you want to make your code and your work open. There is a risk associated with with with the closed code.

So 24:31 - perhaps the picture on the top is you’ve probably seen it recently because it puts a spotlight on on research code during the pandemic. So the picture on top is an mathematical epidemiologist Neil Ferguson from the Imperial College. So they his team, they had a software on modeling pandemics, and the software was in existence over the at least past 10 years. However, it was only made one publicly available in March this year. And of course, because of the wider scrutiny, it was put in on under under a magnifying glass, loads of loads of bugs were uncovered.

So, perhaps, and 25:14 - then the team perhaps unfairly received loads of criticism. Now, this is not the criticism of anyone’s codes, what the point I’m trying to make is if you release your code early, and if you release often, you will get more eyes on it, it will ultimately improve your code, we all make mistakes, and there is no code that is that is bug free. However, you shouldn’t be be afraid to make those mistakes. Get it out there, have others have a look, it can all make, it can only make your your code better. It also aids transparency in research, being open and publishing your code and software and data.

So these top 26:02 - top guidelines for transparency in research have been created by a journal founders and societies. And they cover not only transparency of your analytical methods and software, but anything related to your research data, your research plan, all the research materials, everything, if possible. Always try and make this open. There are obviously some constraints, sometimes you can’t publish your data due to some privacy restrictions. So over 5000 journals have now signed up to this TOP guidelines, it’s getting loads of traction, and you should follow it too. So how do you know if software is open source, even if source code is available and viewable online, so it’s published somewhere where you can find it, and you can technically access it.

If there is not a license associated with the code, it’s 26:53 - not illegal to reuse it. So that’s one of the most important things if you want to make your software public, you need to have a license file, stating that it is in fact legal to for your code to be reused or remixed in any way. There are different license types. It’s potentially a minefield as well. However, there is loads of help as well on how to choose your your license. And once you do it, you need to specify that under a filed file, which is named either licensed or licensed dot txt or licensed attendee, which is stored in the root of your open source repository. So make sure that you always add a license to your software, or to data or to photos, or whatever it is that you publish, if you want everyone else to use it.

So again, once again, no one can 27:44 - reuse it unless you give them explicit rights to do so even in your in your mind, you’re thinking, Oh, I want to make this open until you until you have a license file. It’s not legal to to do so. Okay, so how do you go about publishing and sharing your open open code. So the first step for publishing and sharing is version control. Version Control is a system that record changes to a set of files within a folder over time. So you can recall a specific version later. It gives you, it gives you basically two things, it enables you to publish and share your code. And also, it gives you a way to backup and version control your code. So you can always go back to version, jump back in time, it’s like a time machine for your code. It’s a must have these days, it gives you as I said, it gives you both aspects publishing and sharing plus backup in version control. It’s just universally useful. So why use it what what it’s useful for, it’s going to help you to never mess up again.

Well, okay, it’s not, it’s not 28:52 - like you’re not going to mess up. But it’s going to make it easier for you to recover from your mistakes. If you overwrite your code or overwrite your data or make some make some blunder. It makes collaborating or code so much easier. As I said, it also offers off site backups. Because you’re putting your code in a centrally central server, you don’t actually have to backup your code on your external drive and anymore.

29:22 - So for example, if you work on on your desktop at work, and then you have your laptop at home, you don’t actually have to keep copying and backing up your code and transferring between these two, you can just easily sync with the central repository, and you’re good to go. So what’s not to like? Well, apart from learning how to actually use the version control system, which we recommend that you do. So, Git and GitHub. Git is one type of version control software. It’s not the only one but it’s one that is widely used today. And then there is GitHub as well, which is a central site that calls to Git repositories and also gives you a nice user interface to Git which is under the hood.

There are other tools similar to Git 30:11 - and GitHub, you have Mercurial and Bitbucket so Mercurial, Mercurial is a version control system similar to Git. Bitbucket is similar to GitHub, a user interface where you can host your repositories, there is also getting Git lab. So Git lab is just a slightly different user interface to GitHub and uses Git underneath. So at the bottom of the slide, you have a link to yours get workshop, if you want to have a look at that. And also, we strongly recommending learning some version control system, it doesn’t have to be Git.

But as I said, Everyone seems to be using 30:45 - it these days Git and GitHub, so it’s worthwhile, and it will pay off tenfold. There are loads of other workshops to help you learn git, or to get you started with GitHub. So just a couple of tips to make your work open source. So the first thing I’ve already mentioned, add a license file. Because unless you have it, it’s not legal to read your code. Try to avoid jargon, or at least try to explain it in plain English, if possible. At the readme file. It’s also one of the must haves to explain what your project is about how it is used, how you can contribute how to report bugs, consider having a roadmap, allowing others to see what was in your plans, and how you can set your priorities. Contributing guide if you have if you want others to contribute, make a file called contributing.md in the root of your repository, and set up some guidelines for contributors, so that they know what they should do if they want to help you out. issues. GitHub, Issue Tracker. And again, other other version control software also have issue trackers to record your bugs and supply new features.

32:04 - You can also use tags to make it easy to sort your issues, have a code of conduct all good projects habits, to to make sure that everyone who contributes and is part of the project is treated while and state how violations will be handled. GitHub has a code of conduct wizard to make it easy for you to add one citation. If you use other people’s software, you should cite it as well as you cite their research papers. And similarly, if you write your code, you want some credit for it. So in order to help people cite your software, add a citation.

md file to your repository with the suggested 32:43 - said citation how to cite you consider getting a DUI for your data or code. It’s now possible either using zenodo or figshare. And you get a DUI and then you also can use that to cite your site, your project, your software project, also add how to contact yourself and the team. So make sure you have your contact details there as well. So this is just a quick cookbook for you to get you started with with open source development. The ultimate goal is full of reproducibility. So you have many open computational tools around such as R in Python, the Jupiter notebook or lab, our studio, we have GitHub GitHub that I’ve mentioned already. And then you have binder. So consider container containerizing.

Your software code and data to allow for full 33:35 - reproducibility and binder sort of integrates with all these other software tools mentioned above on this slide. So just some pointers for further reading and some activities for you to get you started or to follow up and where to go from here, how to contribute to open source software. So think about maybe some software that you like or that you use, consider maybe checking some issues and see if you can contribute to that software. There is also the first timers only tag on GitHub to help you select some projects which are suitable to contribute for someone who is just just new to open source development. Or there is a website website called first timers.com.

You can try and participate in October 34:25 - fest or Mozilla global sprint or 24 pull request projects. They all happens throughout the year. There is just a journal of open science software, you can either submit your software to it or you can apply to be a reviewer. And finally, a great training resource on a cheering way a handbook for reproducible data science, it covers loads of the aspects of open source but it also gives you the bigger picture on how to make your research reproducible. And with that Just wanted to Thank you, everyone. And I think I might have been slightly over time. I really sorry about that.

Any questions? 35:08 - Emmy Tsang: Thank you so much, Aleks, for the very informative and insightful talk, useful tips and practices that you should follow open source software development. And I think that is a question from Arielle. Is there much of a difference between Git or slash GitHub and Mercurial slash Bitbucket? Aleks: I think the principles are the same. So they’re both distributed version control systems. Obviously, the commands that use that you use will be different, but I think many concepts would be the same. I’ve only used Mercurial a little bit to be honest. I think I switched to get at some point and under adjust remained there, I think you should probably go with what your community or group is using to make it easier for you. Emmy Tsang: It’s a very good tip. Thanks, Aleks. Folks, do we have any other questions quickly? I’m sure they can still go on the hack of the underlying tunings and seven now, if you have them, and I’m sure we can help answer them, as you think about what I was just talking about as well. For now, I’ll hand over to Yo, begin talking about open data. Yo Yehudi: Awesome. Thank you so much, Aleks and Emmy. So the next talk that we’re going to have is talking about open data and different ways to share data thoughtfully and responsibly.

So 36:42 - Paula, are you available? Paula: Hello, can you hear me? Yo Yehudi: We can Paula: skip a second, please. So I thought I was the last speaker Unknown: I’m sorry? You do have, you’re very slightly fuzzy I don’t know if there’s anything you can do about the microphone or anything? Maybe not if that’s okay. Yeah. Paula: I’ll put on my headphones, one minute please Can you hear me better? Yo Yehudi: That is much better. Paula: Wait a second. Emmy Tsang: Share Paula: can you see my screen? Yo Yehudi: We can? Yes. Paula: Maybe I should share again. Okay, so while I do this, can you please every one of you write one or two words about what open data means to you? So and I’ll fill up the gaps in my 10 minutes. Yo Yehudi: Beautiful.

That is - everyone, If we start with line 38:11 - 291. Paula: Thank you. Again, I think I’m ready, I might share that screen. You are. Good. My name is Paula Andrea Martinez. You can find me on Twitter, my handle orcid00. This presentation is open to you by Creative Commons CC-BY 4 international and uploaded to fitgshare. So I think I tick all the boxes that Aleks just mentioned. Great. So from the western, you just added on to the MD hack. I hope some of these are reproduced in this slide. It comes from the World Bank, it was an analysis they did with the Open Data handbook and the World Bank survey. And they mentioned that open data as many things for different people. So it’s an opportunity to share digital resources, make the community participate and empower them by giving them access, giving them transparency on how the data was collected and then share with with everyone by having no restrictions and making it available. So just to expand on that.

Like to give you 39:30 - the definition of what open data means by the Open Data handbook.org. They say that open data is data that can be freely used reuse and redistributed by anyone. And on this little cartoon on the right side it says it’s an unusual imitation for you or for anyone to use, modify and enhance your work. That’s the way that we can build on top For the people’s work and keep improving them. The concept of open data isn’t actually nothing new.

It comes 40:10 - from the since internet started this some guidelines that people should be following that Tim berners Lee put together, trying to make it a very simple step by step process to make your data open as much as possible. So the first thing is, whatever you want to share, you can put it online, in whatever format you have, but under an open license. So we Alex just mentioned, there’s different kinds of licenses, depending if you have code or if you have text, like in this case, I’m using Creative Commons for the license, then if you want to go a step farther, you go and make that data available in a structured way. So for example, you can put together all your data collected in a way that nobody can understand them. Or you can categorize your data and put it into different sections.

How is your process collected? How did 41:08 - you clean your data and how it’s what’s the output of that data. The third way is making this data once it’s out there, it’s on the internet, you have put it together kind of in a spreadsheet, for example, you can opt to have a non proprietary or open format. For example, instead of having an Excel that not everyone can open, you can have that as a text file or a comma separated data file. So that other people who don’t have access to an Excel spreadsheet software program, they can open it with an open source software programs that exists for other operating systems. The first step is the you who add a link, or you are I to the node what you are sharing and that is the URL.

And we would 42:01 - like to have this URL as a permanent resource or something, then it’s not going to change over time. And it’s always going to point to the original data that you’re sharing. Once you have that you make very good progress. And the last step you can make to contribute to open data is to link your data. So it’s very unusual that you have a data that comes by itself stands alone and links to nothing. So you have to try to put some extra information about how this data links to all the data. If you’ve collected from other sources, if you’ve clean it from other sources, if you contributed with all the people, all those things are things that you can link to your data and you provide context of what the result is. So with this steps, you are helping everyone and making it possible for others to reuse your data in a much more meaningful way. To continue with this presentation, I’ll touch on this three points that I think are very important when you share your data. First is ethics. The second one is how you link your data a little bit more in details, and then the fair data principles.

So first, for essays, a lot of people think 43:21 - that the ethics code just at the end. So when you try to communicate and distribute your data, you should be ethical on what you’re sharing. But instead, I want you to remind yourself that ethics comes from the moment that you apply for funding someone is paying for this research, and they want this research to be public or No. What’s the motivation for that? How are you designing your project? Are you being bias? Or about your data collection? How are you resourcing your data? How is your analysis being done? Are you influencing the results of your analysis? by any processor you’re taking? How are you interpreting the data? Are you skipping some steps? Are you trying to collect what it costs for a better p value on your results, all of that it’s part of the data ethics. So remember that and think that ethics is all the way in your approach the clock cycle.

44:21 - The other important thing that I think why the open data exists, it’s to create knowledge. And to create knowledge, we have to understand each other. So for that we use standards also, we use the syntax. So for example, I’ve heard in a presentation about the human genome where people were uploading a lot of human genomes are now ECG colleagues, but they some people name in hormone, other people will name a human other people name a man. There’s so many words that relate to the same thing.

But a machine will not know this if they’re not part of 44:59 - a vocabulary or ontology and to see the difference, I’d recommend you to go into w3.org. To see how you can build an ontology, where you can find ontologies already existing and how you can contribute to those and name your data in a reasonable way that has a relationship that has context and you provide the connection to what you’re sharing. Last but not least, the fair guiding data principles. This is also not a new concept, but it was a published paper in 2016, about thinking all the things you do, as a researcher, we usually have this little nice picture of our paper that is two or three pages long, that we spend two or three years collecting and cleaning and putting together right. So there’s a lot of work performed, that it’s not visible to the public.

And we want to emphasize all of that we want to leverage 46:01 - the work that you’re doing, we want you to have credit for the work that you’re doing. And so that’s why the principles were put together as a guideline. And it has some steps and recommendations that are mentioned before. And on this presentation, you have some links that I recommend you to follow. So they go further or explain them in Simple English terms about what they are. Then there’s the force 11, fair data principles group, if you want to contribute, you can go there and, and help. And a new initiative that I’m also part of the chairing committee is fair for research software. So initially, the guiding principles were only for data. And now we want to have software for surface citizen, having the same principles apply to software. They relate a lot to the data principles, but there’s some modifications that we need to do.

And you can also read 47:01 - that paper towards first principles for research software, and that was published last year. Unknown: Some communities that are being part of that I’m still part of that you can go and look up for more information on the Open Data handbook by the Open Knowledge Foundation, the last oil sell had a presentation by one of the members. There’s a lot of information in the digital curation center, not only about open data, but also licensing, and many other topics, or you work on top Look, this the foster open science that has a lot of other materials that you can reuse, it also has tutorials, you can share them, they’re all attribution base, they are open, as referred to in ways you might have heard many times already. Then there’s also the open con global conference that has set later conferences around the world. Now everything is online. So you can also start one of those in your locality.

And for the token, the topic of open 48:11 - data, the research data Alliance, I think it’s a very valuable resource. It’s a group of volunteers around the globe, that navels enables the open sharing and reuse of data, there’s so many different topics, I’ll welcome you to be part of them. And as Aleks mentioned, if you want to start using open data, what best to put hands on into hackaton, for example, there’s a lot of challenges that you can be solving social, economic, and environmental where Open Data can help. Paula: With that, I’d like to thank you for your attention. This is also an open image by padri hostenback.

And the the 48:59 - open data is just part of the road to open science I think all of you are following in. If you have questions, you can contact me by by Twitter. So thank you. Yo Yehudi: Thank you so much, Paula, that was a really, really interesting presentation. I definitely learned things that I didn’t know before there. So thank you so much for that. So if anyone has any questions right now, that is around about line three to four, or you can also type them in the chat.

In fact, I see one here from 49:33 - Muhammad, I understand you’re using an accessing the data but when the data is open to modify, does that not make it less reliable? Understand it can get improved, but on the other hand, it might just get ripped. Any any comments Paula? Paula: Yeah, sure. In open data, there’s an important concept that is called data transformation. I think it relates to this question in that way. When you use data that was previously collected for a different purpose, you transform that data into serving another purpose. You might be improving. I particularly don’t know how to make it less valuable. Yo Yehudi: Okay, we have another question from Emma. Where do you suggest putting links to old data to make your data five star? in a separate file of your repo or in a research paper slash data paper? Paula: I’ll probably recommended to put it in a different file. But then you also have to follow the guidelines of who is published publishing your data. So some journals have specific places where you can put your data. But remembering linking is the key.

Right? So wherever you have 50:56 - it, have this forward links from one resource to the although Yo Yehudi: I’ll actually add one comment of my own on this is that small datasets, it’s probably okay to put on GitHub, but anything big, you might want to consider putting it somewhere else like perhaps Figshare or Dryad, or Zenodo? Yeah. Unknown: Some of these are for pay. And another another question from Peter, “data benefit sharing is a big worry for researchers in Africa, we’re often where the sample collectors and analysis happens elsewhere. How can benefit sharing be maintained with open data?” that sounds like a tough one. Paula: What… I understand the point, but when you have open data doesn’t mean that it’s free, same as open software. So if you put open data out there with a license, and the license requires attribution, you’re also getting something back.

So 52:00 - even if people are doing the analysis somewhere else, they’re still going to side where the data source comes from. And they’ll give you citation points that will give you a recognition, you can also add the process of the collection. There’s other benefits of sharing them. Yo Yehudi: Okay, we have one more from Kate Simpson, I think probably once we’re done with this question we need to move on. But feel free to add more questions. And then we can put them in the hack MD and follow up later. So Kate says, If data is collected from homes like temperature, humidity Building Performance, and was not originally planned to be made open, do you think opening it up anonymouslys o the householders cannot be traced - is okay, or is it unethical? Paula: I think you were touching on another important point that I haven’t mentioned, which is sensitive data.

52:58 - Unknown: sensitive data has some guidelines. And yeah, making an anonymous or D identifying the data is one of those. Usually, if you have to deal with sensitive data, you have to go through an ethics process. So that’s part of the data ethics. And that depends on your institution that depends on the kind of data that you’re collecting. And also if you have consent of the people who you are collecting the data from.

53:26 - Yo Yehudi: Thanks, Kate and, Paula, for that I do agree that thinking critically about whether the data is something you have the right to share, and whether it’s okay to share is important and that we shouldn’t be radically open. If it’s, if it’s something that other people don’t want, sharing them. Open Data is not a good thing. Okay, thank you so much, Paula, I think we will now move on to our breakouts. There is some nice notes in the cut in the hack MD about the Turing way guide to ethical research, that this may address things like making sure that your data is shared ethically. But moving on to the breakout. So this is where we get to try out the new new new written and spoken breakout rooms that we’ve been working on.

And I’ve done my 54:12 - very best to sort everyone correctly. I’ve already noticed that scrolling up to find the emojis and then scrolling back down to the activity is a little bit challenging. So we may have to iterate on this further later. But for now, the task for our breakout room is actually to think back to when we were Oh gosh. There we go talking about our iterative project management, as this is the first talk that we had from annatto.

So think about 54:41 - the milestones that you have in your projects and try to think what you would what you would break them down to so think okay, so I know what my really big goal is, but what are some smaller achievable chunks that I can do step by step that I can add to my roadmap. So those breakout rooms, we have about four or five people per breakout room, we’ve done our best to slot you into the rational spoken. If at any point we have made a mistake or assigned you to the wrong room. Remember the can ask for help, I think you may be able to leave a room or sort yourself into new rooms if you have the newest version of zoom, but I’m not sure about that yet. So you can ask for help, if needed. Or also hop over to slack if needed to ask for help if we have sorted you wrong at any points or if you have any questions.

But as a reminder, 55:28 - the task is to try and discuss with others breaking your milestone down into achievable chunks. Is that reasonably clear? And can I hit some thumbs up? If so. We have phones. Okay, I’m going to send you all off. And we have about 10 minutes to discuss, spend five minutes, silently working on this and then five minutes sharing with your group. And I we will be going in any moment. Amazing. Okay. Are we all back? I think we are. Right. Okay. So hopefully you all had some good discussions and the breakout rooms. So we just have a few quick minutes of sharing. How, how this went.

So this could be talking specifically about the 56:23 - discussion of breaking stuff down to agile or Also, if you want to discuss your experiences, especially with the written rooms Either is fine. Unknown: So there’s some bullet points online 410. I am literally going to mute for a minute or two and just let y’all write down some thoughts. And then once we’ve had a minute or two to write down thoughts about what you found interesting and challenging about this process, may run through some of the answers. Those line 410. And 419r ight now. Yo Yehudi: Just a tiny reminder, if you’re not speaking, it’s usually best to make sure that your microphone is on mute just so that ground noise doesn’t come through. I will mute a couple of people.

58:44 - Okay, so I’m gonna just read through some of the responses. So there’s still a whole bunch of people who are adding notes to the the room specific groups, and I can see some people talking about some of the things that they’ve broken down that they’re planning to do in the future, including twitch streaming of your group meetings, which sounds amazing. Unknown: Talk about open. That is incredibly open. Yo Yehudi: And then I also see some other discussions where people are saying that if they didn’t realize about the time, that one’s on me, sorry, I should have been sending reminders about how long you had left, and we will try and do better with it. I’m really sorry. And Kate shares that it’s really satisfying to break it down into really small chunks, but often something that you really do. or other people saying that’s watching each other’s solutions to different problems can be really interesting and stimulating, which I absolutely agree like it’s the one who’s project completely different to yours and seeing what they do, or just an external perspective. All of those can be really nice.

59:52 - Does anyone have a comment, or anything they’d like to share verbally, they’d like to share out loud Okay, okay, we will move on, feel free to continue adding comments about your own if you wish. And and so our final talk is a talk by Julieta. She’s actually based in, I think Latin America. And right now it is quite early. So she was kind enough to actually pre record her talk. So Malvika over to you. Malvika: Yeah, so didn’t give a bit background huli is running a project called Open hardware leadership, which is a similar program as open life science, but the folks who were developing open hardware, and I have met her in different contexts where she had been promoting open hardware in Argentina, where she’s based.

And we really wanted her to, to 00:58 - talk to our cohort. And that’s why requested for a video, I’m going to try to share and please let me know if the volume is all right. Juli: Hi, everyone. Today, I will be sharing some of the work I do as part of the Open Science hardware community, what is open science hardware, and why I think it’s important to discuss this in the context of the context of the open, mind in general and open and lifestyle program in particular. Unknown: So first of all, thank you open live site for inviting me to share what we do I consider our work is super complimentary. And I’m really impressed with the results of the program about myself, I’m Julieta Arancio. I’m from Buenos Aires, Argentina. I’m currently living in Switzerland.

I’m 01:47 - finishing my PhD in open science hardware and its contribution to democratizing knowledge production in the global south. And as most people in open science, I have many hats. So I’m also co organizing the Latin American community of open science hardware. And I have co founded with Alex, Kutchera, and Andre Chagas set program that is a sibling of open lifesci. It’s called Open hardware makers, where we give support to new open hardware projects.

02:17 - Juli: So but what is open source hardware? Yeah, what is it all about? Okay, so it basically goes down to this, people who want to make science, the tools to make science. And nowadays those tools are what we call a black box. And they’re a bad box, because we just input samples from information or some kind of input, and we get an output out of them. But we’re not hundred percent sure of the processes that produce this outputs. Of course, we know about the principles, but we cannot inspect tools. And the fact that we can not do so brings many problems to researchers in general. First of all, this, as we don’t know how they work, this tools are very difficult to modify and customize. And I don’t know if you aren’t aware, but scientists are having a study to be one of the groups that customize their tools the most, which makes sense, right, because you’re always trying to pursue new questions and to see a bit more, so you need your tools to be able to change with your ideas. Unknown: Second, being proprietary, these are hard to maintain. So we have a horrible example. Unfortunately, with the corporate crisis in the US where some hospitals, Juli: they don’t want to repair ventilators afraid of breaking patent laws.

Another example and other problems This is that 03:46 - science tools are usually produced by a small number of companies and this are therefore able to set in general very expensive prices. So many universities around the world cannot access to equipment. So as you can see, these are consequences that consequences that are faced by almost everyone in academia. But the impact is certainly bigger in those countries that have low investment in science and technology on one side, because science tools are more expensive in the global south. But also when science equipment breaks down.

Let’s say in the UK, it’s 04:19 - a very different experience. That’s what it does in Ecuador common owner Cantina. So, why because you depend on the supplier who charges you a lot for sending specialist professionals or replacement parts. This generates huge delays in the case that is possible. And you also have the restriction of imports and exports, which makes it very difficult to access.

So all this very practical stuff translates 04:44 - into questions that are not pursued translates into limitations for for research and less powerful essays. And at the end of the day, less diversity of perspectives and science and the main idea The main takeaway here is that we are reinforcing the pattern where most science is produced by those who have already money to produce it. So some people started thinking about why not opening the designs of science tools, and some changes and access to hardware design during the lesson years makes us think that it’s possible. So for example, the reliability of software free open source software for designing and testing hardware. The rise of 3d printing is ideal for prototyping and for low volume distributed production, the access to cheaper electronic components and projects like Arduino and the democratized electronic design, the massification of the internet as a way to share experiences and learn from each other.

And the 05:42 - work of an amazing community that’s been doing a lot for improving the ways in which hardware designs and shirt. The good thing is that it’s been a while and the results are showing. So here, I’m just I will just show three very good examples of open science how to work in action. But there are many, many others inside and outside academia. And the good thing is that because they’re open, they can learn from each other.

So on the left in 06:11 - Tanzania open fracture is a project that has enabled short circuit production of microscopes that are being used in education, research and clinical diagnosis of malaria. open fracture is a UK project, originally, but they are producing now microscopes at a local makerspace. And this can be easily source repaired without dealing with inputs and huge costs and delays. Another example, audio math is a success in conservation biology, you can imagine it as a very big ear that loves sound all around it. It has a very big community of users improving and customizing it, and it’s leading to developing new methods to address research questions that before were considered untestable.

06:53 - Finally, here in bedroom epidemiology, researchers have designed and built an open source device to track malaria spread in Amazonian indigenous populations, and they were able to do so in a way that respects the preferences of this populations. And that tolerates also the very difficult conditions weather conditions of the Amazon. It’s so it’s it’s such a growth in open science hardware that we also see, for example, policies, national policies, policy recommendations for Finland, which are suggesting the country to adopt an open science hardware strategy at a national level. So that was these year trying to join me. So just as our open science how to work is any piece of hardware that is used for scientific investigations, and that anyone can obtain assemble your society, modify, share, and sell because you can build of course hardware, but you may also want to just buy it and make sure that you can modify it or do anything you want with it later. The word hardware in some languages may bias towards electronics, but open hardware open science, however, refers also besides standard lab equipment to auxiliary materials, sensors, biological research, reagents, analog and digital, digital electronics, and also mechanical tools.

08:17 - And this huge circle in the middle reminds us that the heart of open science hardware is good documentation. So user guides, but also contribution contribution guides sharing files in editable formats, many other good practices that the community has defined to ensure that everyone has access to your hardware design, with the idea of improving projects instead of constantly reinventing the wheel. Unknown: So one of the main ideas here for the commit from the community is that if we really want open science to be transformative, we cannot limit Open Science practice to open data and open access publications, that’s a minimum threshold. But if we really want to open science to other actors inside and outside academia, we need to be able to share the tools we use to work and this as I mentioned, has benefits in terms of efficiency and of course reproducibility in science but also in opening who is able to bring your scientific knowledge and whose questions matter so there you see the logo in the bottom got community is the community gathering people pushing for Open Science hardware to be ubiquitous by 2025 and it really might do to check the manifesto assign it if you agree with our values. Finally, how can you get involved How can you participate? Okay, so the entry point I would say this the GOSH forum you see that URL there? You can drop a question people are super friendly, you can search for the regional communities in the forum and connect to the people that are near you.

You can also use the open know how project which was 09:54 - fine stability protocol that was established for open hardware in general not only for science, because You will see their open hareware projects and in many different platforms. Some of them are in GitHub, get lab wiki, factory, Instructables, hackaday, there are many platforms, but open know how is a protocol that unifies findability. whatever platform your project is in, you can search for keywords there. You can also read or submitted assigned to the journal open. However, the cool stuff is that your documentation will also be peer reviewed, which is super interesting.

And you can follow open hardware 10:30 - makers in Twitter, where we will announce our next cohort after this year where we run a super big community and expert consultation to improve our curriculum and program. So that’s all thank you so much for listening. Those are the ways you can contact me and feel free to ask me any questions. Thank you. Malvika: So that was Juli. Juli is on the slack. So you can ask questions there, we have also left. Emmy Tsang: Sorry. Malvika: Yes, tiny glitch there. Unknown: So that was it for today, before we close out, we want you to know that we also have homework for today. So let me find the HackMD.

11:30 - Just so you know the homework frequency is reducing now what you have done in the past, which which was about creating README file, starting your contribution file, starting your code of conduct, and road mapping, these are something that you will continue to advance. So it’s not that the week have weeks are over, you are going to work on this. So please continue working on those aspects. In the next week, please start to share your Project Online. We have demonstrated good pages, you can also use Google site, which could be similar for a lot of people who don’t want to use GitHub. We also can use WordPress, free or other any platform. If you want to just strictly be on GitHub, that’s also fine. But please have a detailed README, please choose a license, start working on your code of conduct. Create a project development plan, we have created a small assignment that can help you use agile method that is linked in the line number 466. And after these assignment, think about putting your roadmap up.

But this is something that we will 12:42 - also talk about in next week, mostly because we will talk about science dissemination. So our project is not just about developing but sharing as well. So this is something we want to start working on in the next week onwards. With that we are done on time. That’s amazing. Any questions that you have, we can take it in the last one minute but I’ll stop recording. Thank you for joining today. Yo Yehudi: Thanks, Malvika .