GTN Smörgåsbord - Day 5 - Climate FATES (fixed)

Mar 18, 2021 09:35 · 9397 words · 45 minute read

Hi I’m Anne, I’m working at the University of Oslo in Norway and I will guide you through this galaxy tutorial on the Functionally Assembled Terrestrial Ecosystem Simulator FATES. This is part of the galaxy climate workbench and I will be very happy to answer any question on this tutorial or any question related to the usage of climate data in Galaxy. So what will we learn in this tutorial is, first how to run this FATES model using the galaxy tool and how to input - to upload input data for for running the model, how to prepare the input data is out of scope of this tutorial but we will also learn how to customize your run or to analyze your model outputs and finally to create a workflow for making your research fully reproducible.

00:56 - A strong warning before you start this this tutorial it can take quite a lot of time to run the model; we are running five years to have some scientifically significant results and it takes about three to four hours on on the galaxy climate - the european instance. So to make sure you can also analyze the results and finish the tutorial in time on time I have also prepared some pre-made simulation that you can upload in your history, I will show it in a separate video.

Thank you. Let’s start this tutorial. First make sure you are logging on the galaxy climate instance so make sure you have this galaxy climate. So this is the european galaxy instance use galaxy dot eu and in front you add climate dot so the full address is https colon slash climate dot use galaxy dot eu. Make sure you are logged in here here i’m logging as my username and the tutorial we do together is this one in the climate category here and this functionally assembled terrestrial ecosystem simulator FATES.

02:24 - So we are taking the hands-on here and we can start first to create a new history and then to create uh to upload the data. So as we will be using this galaxy climate instance in europe we have already the data available on this galaxy instance so we will not upload the data from zenodo but we will directly import the data from a data library and I will show you how to do that. But first let’s create a new history so I click here on this plus to create a new history and I will rename the history I will call it FATES for this tutorial and then to upload the data I will go to the shared data and data libraries and you should see this earth system community modeling then the CTSM which is for community terrestrial system model and then this is where you can find the input data and the restart file so I will explain that just after what uh what it is about but first let’s select the input data so this is really the forcing for the climate model for the FATES model and restart file so here this is the restart sorry and the input data is here.

Let’s import it. So I will export it as a data set to my history import and that’s it. Let’s go to analyze and because they’re already on the system this is already green so everything is alright. So let’s first discuss these two input files, because this is quite important I will show you again the tutorial we have to go back to here So the input data so for preparing this input data this is completely out of scope of this tutorial and we’ll probably in the future write a separate tutorial for explaining how to prepare the input data.

What we provide is a table to the model where at least we have these four folders in your table which is the ATM, which is for the atmospheric component so the forcing for the FATES model for the atmosphere, the CPL which is for the coupling which says how the atmosphere and the land can interact with each other and the LND which is for the land so this is where the FATES component is is available and the share which is also a share folder where we have for instance the topography and other information relative to non-specific data.

So the model we will be running is what we call a single point so we have only one point which is a location so with the latitude and longitude and this is uh a point in Norway so it will represent uh represent some Norwegian alpine tundra ecosystem so this is a latitude and longitude and the elevation. So why do we have this different location? This is mostly because we can then compare with some observations because this is a location where we make a measure measurements on a regular basis so they have been prepared for you and they are ready to use.

So this is a site included in the modeling platform which is developed under the emerald project which is a project at the university of Oslo in Norway so we have uploaded this data set so this input data which is a input underscore version. So for each version of the model you have to be careful to have a different version of the input data.

07:07 - And we have another file which is a restart file this is mostly because the model any climate model if you start from scratch so without any restart file the model will be very unstable so what we usually do to have a quite a stable starting point, we first run the model for quite a very long time so here we run it for 200 2 300 years so it can stabilize uh it can be stabilized and we can make some further simulation. So this is usually what what you need to do with climate model; this is the difference between a call start and a restart model.

So we will now set up the CLM-ATES simulation so we will be using this CTSM FATES emerald galaxy tool which is specific to the norwegian ecosystem and it is based on a general model uh CLM FATES and it has been adapted for this uh no vision location you can assign more information if you click for instance here. An easy way to find the tool is to click here and then it will appear in the in in the middle part. So what do we need to start the mode?l Uh we first need to make sure we have the right input data so this input input files a table and second we need to make sure we specify um in the customized part we change we don’t want a startup - startup is what I called before a call start- we want to start from this restart file where we have run this 2 300 years to make sure the model and the results are scientifically meaningful so I will change it from startup to hybrid and I need to put a reference model so here this is the name of of this experiment so here for instance we put the name of the case for each experiment so when I run this one I gave a name which was called alp one underscore riff briefcase.

09:48 - I also need to specify when I where I want to start for this reference so I will start from this 2 300 years and this 01 01 which is the first of january.

10:08 - And here this is a start date which we can use for storing the model output and I will uh I want to start from a very start date 0 0 1 to make sure it will go forward so I will start from 0 0 1 and the first january of this year so I mean this is uh mostly for reference for yourself it doesn’t have any uh significance in terms of uh science scientific results.

10:40 - Um what else do we need to do? So here we are we can select different model resolutions so different location in norway but we will take this alp one for this case and here this is the name of your experiment so which is a string you can choose usually whatever you want but it has to be meaningful; so for this one I will call it alp 1 xp which is experiment so this is very classical in climate modeling to call experiment with us like here this is a model resolution and experiment for experiment.

11:21 - What else do we need to change? So make sure here for the restart we don’t want to take the input file but we want to take this restart well here we want to take the input data we will run it for five years so here’s we have five but we need to change from n days to n year so you can type n years and then select.

11:50 - Um what else do we want to change? I think this is this is it for this one here this is some advanced customization we will see later on when we do some different simulations. So here you are ready to start you can click and execute.

12:10 - So you will have many different output file which I will explain once this is done it it usually takes it takes a bit of time so you can make a break and we’ll come back later when it is done when it’s first running it will be orange and then when it’s finished.

12:45 - So as you can see now it is running it’s all orange it will still take some time before it it finishes. Let’s have a look at what we will get as output. So we have many what we call log files here with like the atm. log which is for the atmosphere component and this is because this is some kind of coupled model between the atmosphere and the land; so we’ll have one for the atmosphere one for the land this lnd. log , and a log file this is mostly uh given it will give you some information about the run and what has been performed performed for this component.

So this is interesting to see if everything was all right. Then you have some log file for this cpl is a coupling because it uh the land and the atmosphere they need to communicate from information uh between the surface and the atmosphere and then you have a log files it’s a csm log file which is also a log file for the all clm fate component. What else do we have? We have this rough log file which is for the runoff which we need to have always when running a land and atmosphere component uh usually i’m I don’t really look into detail for this component this is not very important for us.

14:17 - Then we’ll have uh some case info information about the case it’s mostly like a text file getting some very basic information about your experiment and recalling you what uh what setting you have done and you will create this restart uh info file which is also a text file which we will use for instance if you want to continue the run, so it’s always have a like a restart info file and you should have somewhere the restart file which is a table, so these two files here they go together and we only use them if we want to continue the run after for instance here we are running five year if we want to run a longer simulation after the five year.

15:04 - Uh we’ll see later how to do that. Um I haven’t explained this worked here here this is where the model is running and I look at this table and I download it on my laptop if I want to check something especially when the run was not very successful or well when the results are not as expected otherwise the most important output for us is what we call the history file and it will be in this one, so it’s a it’s a collection because we can have more than one output file depending on how long you are running the model.

So we’ll wait for uh the simulation to finish and we’ll look at this file for checking the model output.

16:02 - So let’s go back to our simulation it is completed now; as we can see all the tasks are green and the run has successfully completed. We can look at some of the files the files especially for instance if you want to look at the logs file like here for the atmospheric component so it’s quite large so it will only show some part of it but it’s usually, I mean this is successful, and we also have some other information uh like the restart file - restart info which tells you how many years have run, so for instance we know here we have started from year one and we have run five years, so the next restart file will be at six year and january first.

Now what we want to do is to analyze the model output netcdf data format, and all the data will be in this history file collection if we click on it here we have only one file, because we have run a short simulation so sometimes we can have more and if we click on it we can see the format and see this is a binary format and the first thing we will do is to change the format to net cdf because the tool for now didn’t manage to successfully detect the NETCDF format.

So we click here on the edit attributes and in a data type and here I will switch to net ctf, and I will select and I will click on change data type. So it will spin for a few seconds normally not very long.

18:07 - And the next step will be to change the name of this file so this file is also in this collection here it should be done by now yes no not yet. We will change its name um because the name contains some dots and some special characters that may not be um correctly used for some tools in a galaxy, so the the best is always to change to some short name and meaningful name, so we can use for instance this net cdf file in in panoply which is an interactive tool for to visualize net cdf data.

So here I will again edit the attributes and I will call this file ALP one underscore x dot nc. Yes and I will save.

19:15 - So now we have both the format is netcdf and the name is alp1 underscore x dot nc then let’s go back to the tutorial.

19:33 - In the climate and this is this one the functional assembled terrestrial ecosystem simulator and so far what we have done is uh we have uploaded the data and setting up the clm FATES simulation this step here for creating a new case and what we have done also is to change the type and rename the data set. And now what we will use is this net cdf metadata info to get some information on the names of the variables and the dimensions of all the different variables, so then we can later extract some meaningful information for instance if we want to visualize some of the variables.

So we will use this net cdf x array metadata info to generate two input files containing information on the metadata, and we will try to answer to this question: which is what are the short names of the relevant variables? And which one will you pick if you want to uh to result in a millimeter per second? So this is uh to get some information uh about the canopy transpiration.

21:01 - So let’s click on this tool here so we’ll have presented here. We check that you are taking the right input file which is alp one underscore x dot nc and we can execute.

21:22 - So it will generate two files; one containing some short information about the different dimensions of the variable which we use for different tools - x-ray tools - and the other one will be all the metadata contained in these netcdf files.

21:55 - And then we’ll search for the canopy transpiration variable so it’s still running let’s wait.

22:08 - Yeah so it’s done. Let’s have a look at the two different files so this one contains, oops there is an error, all the different information about the variables but also the dimensions, so this is always a this is a net cdf data so we have this dimension at the top and we have the length of each dimension integer here and we have all the different variables and for each of them we know the different dimensions and we also know some we get some more information about the metadata like the log name here and the unit so this is quite standard.

23:00 - Uh we are following the climate uh convention climate forecast convention cf convention for getting the net cdf output; and you can see you will have lots of variables and at the very bottom you will get all the global attributes, which usually you know um where and who has created this file uh, we know which conversion is used which is a cf 1. 0 and some other global information about this data set. So now let’s look at the second metadata information, so this one is usually the one we use for some other tools and what we get is a list of variable names and dimension for each of the variables.

23:59 - So this is the info file you can use to get some metadata information about all the different variables, and in particular the long names which is usually where we find the variable names we we can relate to uh scientific variables. Um so if let’s go back to the question the question was: identify which variables would provide you some insights about the canopy transpiration.

24:35 - So usually what I do is I go here and I do ctrl f to search for canopy transpiration, and here I found already one variable so here we have one variable fctr - which is a function of time and lnd grid. LND grid is a number of points in the grid and here this is a single location simulation so this variable will be equal to 1 which we can check at the very top LND grid here you can see is one. So if we go back to this variable and this is probably the first one again this fctr this is the one we have seen um, and the units are what bar square meters so this is one of the variable, and the time is mean so this mean that the variable is average over each period of time.

So here um so the time step of the model is different from the model output frequency. We average outputs over a month; so each value is an average over the entire month.

26:14 - Let’s look at the next; one so we have another one which is qve gt and this time this is exactly the same variable but the units are different this is millimeter per second for the canopy transpiration. So we have we can say if we want to answer to the question: which variables would provide you some insight about the canopy transpiration? We can say that with the fates model we have two variables one is called fctr and the unit is watt per square meter and the other variable is q v e gt which is in millimeter per second.

So if you want to have a variable in millimeter per second you would need to take this variable qvegt The second question was what are the dimensions of this variable? And this is what I briefly mentioned before this is over time and for one single location so if we look at how many times do we have so if you remember we run five years so the time will be 60.

27:36 - So we have answer to the first question let’s go back now to the tutorial.

27:52 - So we have done this metadata info, and what we will do next is uh to quickly visualize the data with panoply.

28:06 - So how do we start panoply- panoplay is a interactive tool and the easiest is to start it from from here I will show and then to use this live galaxy so if you look at it here we see that, we suggest to use this live dot use galaxy dot eu and this is mostly to avoid the problems when you are opening the application so let’s do that.

28:37 - I will here go to the interactive tool, and you will find this panoply here.

28:46 - And make sure you are taking the right input which is here this alpha 1 underscore x dot in c, which is a net cdf history file exactly the same file we got some metadata info so far, and now we will really visualize the data so over the five years of simulation.

29:06 - I execute. And here to have no problem when I start this panoply interactive tool in the in active interactive tool panel I will switch to the live dot use galaxy dot eu. So this is the same portal, but this is a different view we mostly see the interactive tools.

29:46 - And what I will do here I will first login, so i’m logging, I will make sure i’m in the right history.

29:58 - So I need to go in the history where my panoply tool has been started, so I can switch to this one, and then analyze data, and I will go to user active interactive tools and I will click here to get started.

30:26 - And panoply just starts so here this is the panel you will get initially and you have to select this input data alph1, and you can click on it and it will appear here, and then open.

30:44 - As you can see here you have all the different variables similar to what we have seen previously with xarray metadata info tool with here’s a dimension so it says if this is 1d variable 2d variable or 3d variable etc, for instance we have mostly 1d and 2d variables because this is a single location simulation and here this is all the metadata in a similar way than the metadata information of the galaxy tool. So if we look at what is asked in the tutorial, we want to search for some variables so, for instance we would like to find what is a long name of mortality so in this tool what is quite nice is everything is in alphabetical order so if I go to l,m it will show up very quickly, mortality.

So this mortality here oops if I double click it will show you how to visualize but let’s first look at um some metadata. So when you click on the variable here, on the left button of your mouse, it will appear here and you will see the variable name and the different dimensions so we have 60 values for the time, we have a different level this left bft 12 so this is the different pft values; we will see this plant functional types. And we still have one single pointso we have lnd grid equals to one.

The long name is rate of total motor mortality by pft plant functional die.

32:54 - So to answer to the question; we have just answered to the first question. If we go which is the rate of total mortality by pft the long name of the mortality variable.

33:07 - What is its physical unit ? So we can see the units are just below here and the units are this endive pha and per year.

33:22 - So the next uh question is to plot the total carbon in the live plant leaves.

33:30 - So this variable is called the short name is leaf fc- a leaf. So we have to search for l, leaf sc, which is the total carbon in the live plant leaves. Let’s click here and if you double click it will show you how to make a plot And so we can. We want to plot the variable as a function of the time so along the axis and this is a plot we get here. So every time we get a plot and we want to save the plot in panoply so you have here to go in file and save the image as so you can eventually change the name if you want to, and make sure you put this image in the output folder so then when we will close the panoply- quit panoply- it will be saved back into our galaxy history.

34:53 - So do you observe any pattern in this plot and does it make sense from a scientific point of view? So what I can briefly say here is if we look at the time here and the different uh months and years we can clearly see some seasonal cycle, which is quite uh normal because the carbon uh in life plants uh leaves will uh will change as a function of the time depending if this is winter or summer. Okay so we have saved our plot in in the output folder.

35:42 - Now we can also plot the rate of total mortality per tft which is a second variable this mortality.

35:52 - So here what you can do once your plot is finished and you have saved the image you can go in file and you can close. So don’t quit panoply because if you quit uh you will be you will live completely panoply and you will not be able to to plot in any other variables. So let’s go to the mortality which is here, and again I left click twice to get this panel for plotting you can also get it in in the view here.

36:35 - And here we will select. So this is a 2d variable as you can see so panoply will offer you the possibility to create a 2d plot, and then you have to choose if you want to have this fade this pft on the x-axis or and that all the time, and usually what we do is we like to have the time on the x-axis which is the evolution of the pft, and on on the vertical - on the y-axis - we will see all the different pft so I will create this plot. So by default it takes a quite not very user-friendly color scale so we’ll first change the color, to make sure we can see a bit better the pattern and so that we can answer to the question, which is again: can you observe any pattern and if this makes sense for you? So how to customize your plot in panoply.

37:47 - First thing we can do is to change this column up so you can choose any color map you you would find appropriate for for your visualization. So i’m sorry, it didn’t select.

38:13 - There are so many different ones sometimes it’s difficult to choose so when you find.

38:21 - Yes this one are usually quite nice for making some plots I don’t know if there is much difference let’s take that one for instance, and then the second thing we will do is to to change the vertical so it’s a grid here, so there are many things you can customize some like the minimum and the maximum value which are this min and max, but that’s okay we will not change them, and the grid you can change the grid for for your plot so you can see a bit more this plot correctly.

So let’s uh go to um this solid, don’t remember exactly, like for instance no that’s not this one.

39:33 - Uh oh is this not not the x we want. The y sorry yes so here this is uh it will allow us to highlight a bit more this pattern which is at this level um. One thing we can change now is to make sure it fits and fill all the plots so instead of starting at 0 5 here we can put and 1 here for instance yes maybe, well maybe this is okay here, maybe five it’s maybe a bit too big um so we can maybe it was 10. Yeah we can still keep one and here I can see it doesn’t start at zero because we’ll only get some output at the end of the first month, so instead of starting at zero um I will put one month which would be like 31 days to fill it.

40:56 - And then what we can see here is that we clearly have a seasonal cycle again if we look at the time here and this is always again this winter and summer values, and this is essentially for this pft which is this pft2.

41:22 - Again I will save this plot so I can save the image as, and again I make sure i’m in the output folder. sS if you want for instance again and you can play a bit with this and you can change for instance different color scale different values here, so you can see a bit more and adjust your plot for for your publication; so usually I try a few things to see to make sure I can highlight very well the different uh seasonality here, so for instance it looks like it’s color scale really highlight more what I would like to highlight which is this seasonality, so I will save again this image in this folder.

Um yeah I need to give a new name, so I will probably col for color, so I have some plots which I can save.

42:35 - And now you can make several plots and try different variables I will close this plot here feel free to experiment with other variables that can be of interest for you, and save all the outputs in the output folder and once you are done you can quit panoply, and then we can close this, and this one and go back here, and you will see this panoply output here will terminate in a few seconds, and we’ll have a plot which we can download on on our laptop.

43:29 - Yeah so we have this plot here which I can look.

43:36 - And here this is usually, it tells you the version of the tool of the panoply tool. So we have done this three plots and then you can choose the one that would be more appropriate, for instance for your publication so you can download your file for instance here on your laptop for later usage but you have it in your history so then I go back to face here.

44:05 - So we have done already uh the first two step, and we’ll now use, not an interactive tool, but another tool for making some visualization.

44:26 - So we have used panoply for plotting and analyzing our results; why do we want to use a galaxy tool for for analyzing and plotting the results instead? Why not simply always use panoply ? The thing is we want to make a workflow at the end, we want to automate our research, so running a panoply an interactive tool is not the best way to automate your research. It’s really good for exploring your data for making new research and new discovery but then the next step is usually to automate your process.

So we’ll go back to this training here and where we have been so far we have random model, we have used panoply to analyze, we have inspected the metadata, and now what we will do is this part; so to use a galaxy tool for analyzing the CLM FATES simulation, so to create some visualization but directly from a galaxy tool so we can incorporate this step in in our workflow. So if you remember before we have used already some xarray tools so this one now we will us make some selection; so we will select the values from this leaf fc variable, so we will have the value as text tabular data so we can then use for instance a gg plot for making a plot in a galaxy.

So let’s take this one, and as you can see we have to make sure we take the right input file always this history file and we’ll have this metadata info in for this variable as an input, which is the result of some of the previous steps we have done already and then we will run it, at the end we’ll always rename this so I can already copy it we will rename the data set for future usage it’s always a good practice and we will extract the leaf fc variable so I can click here to get the tool.

46:50 - And I have run the alp underscore x dot nc as an input which is in a NETCDF file and immediately it already uh it has chosen this tabular variable which is a list of variables and the dimension,s and which is the results of the previous metadata x array tool we have used.

47:14 - For the variable name we’ll select leaf fc which is here you can select if we want to select manual under the coordinates it’s not necessary we want everything and we have only one point so it’s not really necessary to select and reduce the amount of data and then we can execute.

47:49 - So it will take some time. Okay so it starts running.

48:23 - So it’s quite a small tool so it shouldn’t take too long because we don’t have so much data, we have only five years of simulation and if you have look at the introduction of climate data you may remember that to be scientifically meaningful in climate we usually take period of time about at least 20 to 30 years, so this is here a very short simulation we do which is still interesting because we can see some seasonality, but it would not be something we can use for um assessing the climate of this -change in climate for this location.

So if we look here and we click on this view data we can see now we have extracted this leaf fc variable uh, which we have already done before uh with a panoply and here we have one single location so it’s this variable is not very meaningful and here this is all the different time; so this is for the first year, at the beginning, so the time will be 0 2 because we have started the simulation on the 1st january and then the output the model will output some results every month and it makes an average over the previous month.

So here we have up to january um your six which is the average of the previous months and here they are the values. And what we want now is to prepare this tabular file for a scatter plot using gg plot so the first step would be um to prepare this first column the time in in a way that can be understood by ggplot and one of the problem we have here this is after uh the year, the month and the day we have a space for the time, but the time is completely irrelevant for this climate model because um we have only zero everywhere - which correspond at midnight - so what we will do is we will use a tool um to remove this pattern so to split and only keep this one.

50:51 - So if we go back to the tutorial we’ll see we have done this first part which was to select this and get tabular values of the left fc and now what we will do is we will clean the data, clean the date, using this replace part of the text We’ll take the result of the previous step, which we will rename first the variables - the mod, the output file - to this I forgot to do it so I do it again, and I go here and I will rename the file, always very good practice I forgot, and I will ctrl v and I save.

So it will change the name i’m good, and then I can go back to my tutorial here and to this tool here where I will use this replace part of text so I will select the input this net cdf file which has been converted to tabular by extracting only one variable, and I will find this pattern which is the time and I will. What I will do is I will replace all the occurrence of this pattern so of 0 colon 0 colon 0 and I will remove entirely these values so that we have a clean tabular output, and we will find and replace text in the entire line.

So let’s do that. So what we do here make sure I have the right tabular and the file to process in this net cdf x array selection what pattern do we want to find this zero zero colon zero zero colon zero zero, which corresponds to hour, minute and second. We want to replace with some empty because we want to remove this it’s not very meaningful for us. So find pattern is a regular expression no this is not a regular expression this is only the pattern we want to remove we place all occurrence of the pattern yes we want to replace all the occurrence.

53:29 - Is it case sensitive? It doesn’t really matter because this is numbers and we want to find the whole words. We can say yes this is what we want to find we want to make sure we find all the zero column zeros or column zero zero and we want to replace replace it in the entire line which is the default. As you can see we can always get some email notification at the end of the tool which is not here very necessary but we can be useful when running the clm fake model so I can execute.

54:27 - Waiting for execution. Okay so it starts running. It shouldn’t be long. The only very long step is the first one when we rent a model. So here if we look at the results and we can click.

55:02 - So we have suppresses the timeand in the dates here so we only have the year, the month and the day and then the value of the leaf fc. So now this is quite clean and we can use for instance ggplot, the scatter plot, with ggplot to plot the left leaf fc value as a function of time so here you can search for ggplot it’s usually coming very quickly so there is no need. Yes and you can see for instance this one would do very well.

55:40 - Um yeah so we haven’t renamed since this is probably best before we do that to rename again so sorry I go back I forgot to rename and I will rename it to meaningful name which is a leaf fc and this is a clean dot tabular so this is really the clean values of my leaf fc extracted from the net cdf file. And sorry again now I can finally plot. So it’s it’s not mandatory to change the name but it’s usually a good practice, for for yourself so you remember exactly what you have done.

56:21 - What do we want to do? So we want to plot um in the value in this leaf fc clean tabula with the first colon is the number- the row number - then the second one is a time, and we have also this ln grid which is not what we want. So we have zero, we have the time is one and we have this two two this one the time sorry, no this is the row number is one the time is two this ln with this three as we can see here and the fourth is leaf fc. So what we want to plot is for instance uh first column has a function of the fourth so leaf fc.

57:11 - We can give a meaningful title which is here the total carbon in live plant leaves. Label for the x-axis is time and we can even put some more information, but time would be sufficient; we mostly want to see the evolution. And for the y level we can say this is leaf fc and we can put the unit for instance which is a kilo c h a minus one.

57:53 - Oh and I made a typo which is kilogram see so we have this some more advanced option which we can customize the plots. For instance we can say here we want points and line um and in the output option. So we can close the advanced option. ANd in the output option we can change the width and for instance of your plot and the height uh, because this is a square by default but uh here we have five years of data. But so what we would like to have is a plot which is larger than uh its height so we can really see the different years.

So I will put for instance 19. 0 here and here this is five. Add why nineteen because I have five years so it’s, I mean I could put twenty if I want. Uh it’s approximately to have some larger some widths for each ear and the rest I don’t really need to change; you can change the format but you can keep it in pg if you want.

59:35 - So again uh we here we have only uh plot plotted the f leaf fc. You feel free to try to plot any other variables to try it out. Maybe there are some other tools in galaxy for plotting, so if you make nice plots and you want to report it please share what you have done.

60:24 - And again this step shouldn’t be too long since it’s mostly taking your tabular and making a ggplot.

60:48 - Yes this is done and here you have your plot so it’s quite large because it’s good resolution so what you can do to visualize it better you can download it, and you can for instance plot it and oops. So for instance if we look at what it looks like and this is what you should see.

61:22 - Yeah so this is really the same plot as before but uh with a gg plot.

61:29 - So this is full can be fully automated into your workflow.

61:43 - So now it starts to be quite exciting because we have run a simulation with FATES and we can automate some plots and create some plots automatically. So we will now create a workflow from the history we have so we can rerun our simulation and plots - so to make it fully uh reproducible - and it will allow us to be able to run new kind of simulation. So for instance changing the inputs uh data new data or some parameters of the simulation etc so it’s uh it’s quite interesting.

So let’s do this conversion so what we will do is we will extract the workflow from the history so we’ll go in the history menu and we’ll extract the workflow with the workflow extract workflow option.

62:41 - And we will have to remove any unwanted steps so for instance if you remember we have used panoply - an interactive tool - and in a workflow we we don’t really want to keep this interactive tool because we want to have a fully automated task tasks. So we will remove this step we will rename the workflow to something more meaningful. So for instance we’ll take this clm fade help one five years of simulation so we can copy here the string um, and I will create a new workflow that we can edit before we run it.

So let’s do that we we go to the um menu option in the history. Oops; where’s my mouse I can’t see it here. In this menu here this will you click and you see all the different options you can do with your history, and the first thing we will do is to extract the workflow. Here we can give a new name for the workflow.

64:06 - And before we create this workflow we will check the different steps and we will remove for instance. So here we see the inputs the fades and all the different outputs generated the x array metadata. This one I don’t really want so I will remove. This is a panoply; not needed for automated workflows.

64:35 - I keep the lcdf x array selection because this is where we select the leaf fc to create a plot and this replace and the scatter plot. So then i’m ready I will create workflows. And I can click on the edit here and check the workflow. So I will check it we need to do a few things before we can run so if you can see some tasks are not connected so we’ll have to look at it a bit more. We can reorganize inputs so I can see there is inputs the input is here, and then this is a restart file if you remember.

65:26 - Then I can look at this which is my clm FATES model. And here’s a main issue if I see. I can reorganize a bit.

65:46 - Um so this is a different expression for the gg plot here which is a final plot but there is no connection here so why what is happening so here we are creating a workflow. So it’s a generic to any kind of simulation you would do and the thing if you remember - I briefly mentioned it - but when we are running the clm fades emerald model the output the history file is a collection, and in our case we have only one file but sometimes when we are running long simulation, long simulations the model will generate more than one history files.

So one history file so we will definitely need to extract one of the file - the one we want to plot. Or we would need to merge or do something more complex but here because we have only one because we are running five year what we will do is uh to add a new task in between to select which history file from the collection we want to use um for plotting so. And the tool we will be using is called extract data set. So you can do it from here.

67:11 - Yeah this one. So you can select it from here or you can go back to your tutorial and go when we are making this look extract and here you can click on this, and it will automatically open the tools.

67:47 - Takes a bit of time to get the tool. If it doesn’t work and/or is a bit too slow, I suggest you only take this from here yeah. So what we will select here we’ll connect this one to that one and when we click here we’ll extract the first data set, yes this is fine because we have only one. We need to make sure so here you click on the configure output we want to make sure it has a right type. So we’ll we’ll change the type and we’ll put it to netcdf so this is mostly to make sure the output is net cdf.

68:40 - For the rest I think we can leave it this way. And here this one you have to click here, so you have to click here to make it uh. If you remember just before it was red so I i click on on this connector so that now I can connect these two here. So we have a full workflow.

69:08 - And you can rearrange your flow as you wish so it is a bit more easy to understand the different steps.

69:24 - Okay. Now once this is done you can save it here.

69:44 - So here we have created a workflow, but now what uh we want to do is to reuse this workflow uh. So you can share this workflow with others. If you remember if you look here you can have the workflow options, you can download it if you want on your laptop and it will be a dot ga file, it’s just galaxy workflow. Or you can run it so all your workflow, workflows will be um sorry in the workflow, sorry in the workflow panel.

70:32 - Yeah and this is the latest will be at the top. It has been created one minute ago and if you want to run it. So you click on this and you have to customize the different steps of your workflow. So what we would like to do is to rerun a simulation, but instead of running this exact same simulation we will change the co2 values and the atmospheric co2 so that we can assess um the change in behavior for the plant which is this FATES model when the co2 the atmospheric co2 is increased in the atmosphere.

71:27 - And this is really typical in in climate change. We know that we have more and more co2 in the atmosphere and the condition for the plants are changing and they are impacting how they grow and how they die, so the fate of the plants. So this is quite an interesting simulation to do.

71:50 - So we have to edit this workflow. Make sure you have the right input and output. So make sure you are taking here the input data which is this one, and the restart file because we want to do exactly the same simulation.

72:11 - But this time what we want to do is so this is input data so this is net cdf sorry. So we want to update some of the steps.

72:32 - Which is I can’t see where is? Oh sorry this was this one um. I need to customize the model this is still five years, so this is not that one. I think this is in the advanced option ,where we can customize the co2 values. Yeah so here I can edit, and instead of 367 your mole per mole, I will quadruple this values. I will put 1468, which is a lot more; so then we will really see a significant change in the behavior of the plants so then the model output.

And for the rest, I can leave everything as is. Make sure you check everything and the different steps you extract and then we can run the workflow And again it will take some time, because uh as you remember it takes about three to four hours for running the CLM FATES. So all the tasks here will appear and once this is done you will be able to compare and see whether your results are different or not. And what what we see in terms of response um when we significantly increase the atmospheric co2.

74:16 - We can really see some significant changes in after running five years. So if we look at the plot we had before for instance uh let’s take go back to the panoply output because they are usually nicer to look at.

74:36 - Like this one uh with you know. This is not the right values. Here yeah this is this yes this is a total carbon in life plant leaves so there is no uh big increase. When when we run this five years, and uh with a quadruple co2 the values will be shifted here on the y-axis because we have a lot more co2. So if we look at results here, because it will take far too long for running, so you can look at it for instance tomorrow. But I can show you so here this is what we had before and.

So it’s more or less flat in terms of values. I mean it’s slightly changing but not that much after several years. But when you are looking at um the same variable but in under condition where we have quadruple the co2. You will see that it’s it’s quite significant how the it will behave for for this variable - which is totally expected um. So it means the model works quite well. Um at the end of the this tutorial we show you how to share your your work, so uh how to share your history, which I strongly encourage you to do.

So um so you can know you can make your history share share shared the um with a link or you can even publish it and give the permission to anyone to access your history.

76:31 - And we can you can also use this workflow hub, which is a new way to exchange your workflow. And this is particularly interesting if you want to exchange your workflow and run it on a different galaxy instance so you can of course download it but here it will be publicly available. Um and if we copy this link here, and I show you we have already put this workflow for for this tutorial and you click on the galaxy climate and we have three workflows and you can this is, this one the CLM-FATES_ALP1 simulation five years.

And what is nice is you can generate this workflow and this image where you can see all the different steps of your workflow.

77:38 - So for for this I strongly encourage you to look at the documentation, and there is a some kind of few steps to to fulfill to get this workflow image. For instance, so we it will convert the galaxy workflow to a common workflow language. workflow. Uh if you have any questions you can ask me on the on this on slack. So I hope you enjoy this tutorial. Uh it’s of course uh quite long if you do it in a day. But um as you can see there are many stages where you have to wait for quite a while before getting the results.

78:26 - So for this I will make sure you can see the results in in my history. Thank you.