hello and welcome to the QIIME 2 online workshop lecture for longitudinal analyses I am Dr. Zachary Burcham and I’m here with my colleague Dr. Alexandra Evans in this lecture we’ll be describing what a longitudinal study looks like why you might want to consider this type of study design the pros and cons of this study design and discuss how longitudinal microbiome studies can ultimately be analyzed using QIIME 2.
00:32 - longitudinal studies are designed to employ continuous repeated measures from an initial sample or samples over time this can be following a patient’s gut microbiome composition from before they take a certain medication and over the course of treatment or it could be measuring the temporal impact on soil microbial communities in the amazon rainforest the key is that you are sampling the same individual or location over a prolonged period of time traditionally longitudinal studies have been observational in nature with quantitative and or qualitative measures being taken during any combination of exposures and outcomes but the small visual scales and culturing difficulties of microbial communities make pure observation close to impossible highlighting the need of increased measurements to detect complex patterns of change in this simple example of a longitudinal study we have a bacterial community in its initial state and as time progresses we take repeated measurements from the same community until we reach the final measurement and end of our study longitudinal studies have their advantages and disadvantages some of these advantages include one the ability to identify and relate events to exposures or treatments for example what types of microbes are affected by taking a medication two then you can define these exposures to time how long does this medication take to impact the gut microbiome how long does this impact last this leads to three being able to establish a sequence of events for example when this microbial community is exposed to the medication is there an immediate response you might see within 24 hours firmicutes are diminished but then return to initial levels again within 48 hours or may not recover at all you also be able to measure variability between your subjects against the full cohort what I mean by this is that there will always be some variability between each subject not every subject will react exactly the same way to a treatment but with repeated measures you will be able to capture this variability and then compare it across treatment groups for example comparing the variability within the treatment group versus comparing the variability of the treatment group to the control group the last advantage I’ll mention five is that longitudinal studies provide information on microbial community development stability response and recovery that otherwise might be missed in a single measurement study design some of the disadvantages include one the loss of participants over time depending on the nature of your study and time frame the loss of some of your participants might be inevitable volunteers in the human cohort can change their mind move away or drop out for any number of reasons if you are sampling locations they may become destroyed or inaccessible in the future this launch can reduce the representative nature of your cohort if you do not lose subjects evenly across groups and can reduce your statistical power data analysis can become extremely challenging if for example you lose 75 percent of your control group in none of your treatment group typically the longer your study the higher risk you have of participant loss two it can become difficult to assess the impact of exposure to the observed outcome this can arise from highly complex systems having many known and unknown variables and remember that correlation does not equal causation be careful not to overreach in your conclusions three a deeper understanding of statistical testing is needed to avoid inaccurate conclusions using tests that are not designed for longitudinal data can lead to misrepresentation and underutilization of your data lastly 5 longitudinal studies typically have increased time commitments in higher financial demands let’s take a look at another example which shows the benefit of using a longitudinal study design in this example we begin with our initial community structure on the left with the pathogen present in red before a treatment takes place immediately after sampling the initial community a treatment occurs and the community is sampled again after 24 hours this is the example of the simplest form of a longitudinal study commonly called a pre-post study because sampling only occurs pre and post treatment while this type of sampling is great for gaining exploratory data for a pilot study it doesn’t typically give you enough information to make strong conclusions initial conclusions may suggest that the treatment clears the pathogen from the patient however you may be missing what takes place immediately after treatment before the 24 hour post treatment measurement and you are likely missing what changes take place after that 24-hour post-treatment measurement for example after 48 hours the pathogen could return not return there could be the emergence of a new pathogen previously lying dormant or a negative or positive effect on the commensals each of these scenarios can drastically change your conclusion on the effectiveness of that treatment towards the pathogen in 2016 a study was published in science translational medicine tracking the 16s rRNA gene microbiota compositions of 43 infants in the united states infants were sampled at regular intervals from births to two years of age and associations were investigated between antibiotic use delivery mode and predominant diet on microbiota development and composition early childhood is a critical stage for the foundation and development of both the microbiome and host early life antibiotic exposures cesarean section and formula feeding could disrupt microbiome establishment and adversely affect health later in life these exposures contributed to altered establishment of maternal bacteria delayed microbiome development and altered alpha diversity these findings illustrate the complexity of early life microbiome development and its sensitivity to perturbation let’s take a look at the data from this paper and see the benefits of using longitudinal study design here you can see we have four babies two of which were born by vaginal delivery and two of which were born by cesarean delivery the first major microbial exposure for a vaginally born infant is in the birth canal a potentially important event for establishing a healthy microbiome early in life cesarean section bypasses this exposure altering the initial pool of microbes to which the neonate is exposed a fecal simple is obtained from each baby immediately following birth and the microbial composition analyzed this is an example of taking a single non-repeated measurement compared to vaginally born infant cesarean delivered infants showed significantly greater phylogenetic diversity richness and evenness at baseline with this method you may suspect from the relative abundances on the right that delivery mode has statistically significant but rather small effects on the infant gut microbiome if we extended the sampling to collect fecal samples each month following birth from the same infant this would represent a longitudinal study as you can see here now we can see a clearer picture of how cesarean delivery alters the infant gut microbiome through early development you can see that delivery mode has a small effect on the initial fecal microbiome but this effect amplifies within the first month of life in fact alpha diversity declined significantly in cesarean born and fenced during the first month after birth and cesarean-born children subsequently displayed lower diversity and richness up to two years of age especially after eight months of age for the baby’s first valve movement fecal beta diversity was not significantly different suggesting that the microbiota colonizing infants was of similar complexity therefore caesarean sections significantly altered microbial beta diversity compared to vaginal delivery had the study ended just with the initial post-birth measurement they would have missed significant perturbations later in life caused by different delivery modes let’s go back to our study design now we’re going to add another variable to the mix where one baby from each delivery mode is breastfed while the other is formula fed they compared two major dietary groups that best describe dietary variation in this cohort in fits who were dominantly meaning more than 50 percent of feedings were breastfed or predominantly formula fed for the first three months of life what type of impact might this cause they found that the phylogenetic diversity and bacterial richness growth rates were significantly decreased in formula fed children during 12 to 24 months of life formula feeding also altered beta diversity and decreased microbiota maturation during 12 to 24 months of life the impact cesarean delivery can also be seen in both diet types this study demonstrates the plasticity of infant gut microbiome and the importance of longitudinal study design so how should you tackle the design and implementation of a longitudinal study one it is important that your methods of data collection and recording are standardized and consistent over the course of the study there are multiple tools available that can help you standardize your data collection one of the primary methods is through consort which provides an easy to implement method to be transparent in your data reporting by providing protocols and checklists to ensure studies collect appropriate metadata these will help you collect reliable data and metadata which will ease future analyses two the frequency and degree of sampling should vary according to your specific research goals and questions in some cases sampling every hour may be necessary while in others every year may be needed carefully think about this ahead of time you do not want to miss important measurements due to poor planning 3.
while it’s not completely avoidable every possible effort should be made to ensure maximal retention of participants as we have already discussed participant dropout can significantly hurt your study design it is important to also remember that four longitudinal studies can require an extensive amount of time to complete plan accordingly for the future lastly extreme caution should be taken when choosing a statistical analysis to use mishandling your data will link to misrepresentation if any doubt arises about your statistical approach consult your local statistician how should you tackle the statistical analysis of a longitudinal study there are multiple factors to consider when performing these analyses some of them are how you will account for the intra-individual correlation and variability of measures data from the same individual will also need to be paired since they are no longer independent you will need to consider what your fixed and dynamic or random effects will be deciding whether a factor is a fixed effect or a random effect can be complicated in general a factor should be a fixed effect if the different factor levels represent all possible discrete values for example in our infant study delivery mode sex and diet are designated as fixed effects since this encompasses all the possible values conversely a factor should be a random effect if its values represent random samples from a population for example we could imagine having variables like body weight or daily calories from breast milk such values would represent random samples from within a population and are unlikely to capture all possible values representative of the whole population if you’re not sure about the factors in your experiment consult your local statistician it is also important to consider differences between time intervals are they consistent or unevenly spaced and do you have any missing data to account for commonly applied statistical approaches are one univariate and multivariate repeated measure ANOVAs note in both cases the assumption of equal interval links and normal distribution in all groups and that only means are compared sacrificing individual specific data two mixed effect regression models focus specifically on individual change over time whilst accounting for variation in the timing of repeated measures and for missing or unequal data instances and three generalized estimating equation models that rely on the independence of individuals within the population to focus primarily on regression data some of these we will discuss later on in the lecture remember if tests commonly used for cross-sectional studies are used the data will be under-utilized variability will be underestimated and the likelihood of false negatives will increase now I will turn it over to Dr.
Evans to discuss the QIIME 2 plugin for longitudinal and paired sample analysis of microbiome data thank you Dr. Burcham now that you have learned the advantages and disadvantages of longitudinal analyses and a bit on how to design these studies let’s talk about the tools that are available for you through the QIIME 2 platform fortunately there’s a plugin specific for longitudinal analyses the QIIME 2 longitudinal plugin supports tools for the visual and statistical evaluation of longitudinal or paired data which helps us overcome the issue of data dependence available features include interactive plotting by way of volatility plots and box plots linear mixed effects models paired differences in distances non-parametric microbial interdependence tests first differences in distances and longitudinal feature identification the QIIME 2 longitudinal plugin was first introduced in a publication in the journal m systems in that paper the authors used the early childhood antibiotics and microbiome study or ECAM introduced previously by Dr.
Burcham to demonstrate the application of the QIIME 2 longitudinal plugin to longitudinal data I used data and figures from the m systems paper to guide this discussion as a reminder the ECAM study was published in 2016 in science translational medicine this study tracked the 16s RNA gene microbiota composition of 43 infants in the united states from birth to two years of age by doing so the authors identified multiple disturbance events which were associated with antibiotic exposure birth mode and diet back to the QIIME 2 longitudinal plug-in one of the primary visualization tools available in this plugin is the volatility visualizer this visualizer combines control charts and spaghetti plots to create an interactive volatility chart volatility in this context refers to the temporal stability of a metric over time and between subjects with microbial volatility representing the variance in microbial abundance diversity or other metrics over time why do we care about volatility well it could provide an indication of disturbance disease or other event with the volatility visualizer we can track how stable metric is over time in one or more groups so here in the bottom right hand corner we can see an example of a volatility control chart volatility charts allow users the flexibility to control the metric plotted on the y-axis seen here where it says metric column and the grouping factor by which subjects will be combined to determine and visualize group beings which in this example is delivery mode which we have vaginal or caesarean the control limits are at two and three standard deviations from the global mean which is this solid line here which is helpful for visualizing potential outlier data points or identifying a significant disturbance event another cool aspect of the QIIME 2 longitudinal plug-in is the feature volatility action the feature volatility method is an exploratory method that wraps the QIIME 2 sample classifier which you should have already talked about and uses supervised random force regression at least by default this identifies features that are predictive of a state or time point this method is advantageous because it’s not biased towards dominant taxa the feature volatility action provides interactive volatility plots of feature abundances across time as well as feature importance values and descriptive statistics including global mean global variance net average change and cumulative average change let’s take a closer look at the utility of the feature volatility action as applied to the ECAM data this tool can be used to further examine which bacterial genera are associated with early gut microbiome development in delivery mode using 71 features to train the model only a handful of features are accounting for a majority of the total feature importance so in this column here we have our importance and as we can see only a handful of tax error are actually contributing high importance values after evaluating the model accuracy and assessing feature importance we can use this information to focus our analysis on interesting and important features here the authors chose to focus on by phytobacterium and focally bacterium seen in a and b respectively due to their high importance values mean abundance and cumulative average change which we can kind of see and see here as our descriptive statistics so let’s take a closer look at these volatility plots what we can see in these plots are the relative abundance on the y-axis and time in months on the x-axis so in our bi-phytobacterium plot we see that there’s an increase in bi-phytobacterium relative abundance from zero to six months and a subsequent decline from six to 18 months whereas in facali bacterium we can see that there’s not really an increase between 0 and 6 months but then an increase does occur between 6 and 12 months another thing to keep in mind in these plots is the individual variation so each of these smaller thinner lines represent the spaghetti lines which stand for a single subject so you can see that there’s a high degree of variation occurring in these plots because the feature volatility action is largely exploratory we can use other actions in the chine2 longitudinal plug-in to support our results statistical tests supported by the QIIME 2 longitudinal plug-in include linear mix-fx modeling which we’ll focus on ANOVA and various pairwise statistical tests including the wilcoxon signed rank test for dependent samples Kruskal Wallis for independent samples and men whitney u for independent samples the pairwise statistical tests are implemented in the pairwise distances or differences actions pairwise differences in distances should be used to determine differences between dependent or paired samples this could be pre or post treatment samples or samples that were processed in two different ways first differences can handle microbial feature abundance data or other metrics from a metadata file in contrast pairwise distances act directly on a distance matrix to examine distances between paired samples and whether these distances differ significantly between groups paired differences and distances are only useful for pair data so what if we have multiple time points like in the ECAM study in this case we can use linear mixed effects modeling an lme model tests the relationship between a single response variable and one or more independent variables where observations are made across dependent samples these models are able to account for both fixed and random effects as Dr.
Burcham pointed out you can think of fixed effects as factor levels with all levels of interest accounted for in the ECAM study the fixed effects include variables like diet delivery mode and sex on the other hand random effects include a number of possible measurements that encompass a degree of variation in our example in the ECAM study random intercepts or subject id and random slopes for month of life are included as random effects so in answering our question from the previous slide our relative abundances of bi-phytobacterium and phocalybacterium impacted by time and subject the random intercept by subject id suggests that baseline values for these abundances can vary by subject which will now be accounted for in the lme model here a random slope by month of life suggests that the relationship between microbial abundance and time is not the same for each individual which again can be accounted for in the model the output of these models will look something like what is in this table here to the right as we can see in this table we have our z-scores and our p-values indicating significance so based on our lme model we can see that by phytobacterium relative abundance was impacted by diet at six months of life and there’s a significant interaction between diet and delivery mode which you can see here if you do decide to use lme models keep in mind that there are assumptions that need to be met regarding the residuals.
residuals can be checked using the lme action or the residuals can be downloaded and checked outside of the QIIME 2 platform ultimately if you are unsure what statistical tests should be applied to your data set you should check with the statistician what are some other ways we can assess longitudinal data one way is to track the rate of change over time the first difference action in the time to longitudinal plugin allows us to assess the magnitude of change in some metadata value of interest such as Shannon’s diversity for example between successive fixed intervals a similar method first distances allows us to calculate the beta diversity distance between successive samples this is useful for identifying trends and beta diversity over time both of these methods can then be visualized using volatility plots or further assess using linear mixed effects models both also have an optional baseline parameter so that each successive interval will track differences in some metric or calculate distance from a baseline the baseline could be prior to applying some type of treatment or it could be a different subject or reference set of samples this could be useful in several microbiome experimental contexts for example in comparing between fermentations in their inocula or between intact and disturbed environments during recovery from disturbance in the QIIME to longitudinal paper the authors used first distances to examine how beta diversity between successive samples collected from the same subject changed over time in each subject in the ECAM study let’s take a look at some of this output so what are we actually looking at here we can see that we’ve actually piped our first distance analysis into a volatility plot for visualization so let’s focus here on a here we can see UniFrac distances tracked across time for two different groups one where the mode of delivery was vaginal and the other where the delivery mode was caesarean so hopefully you can see that UniFrac distance to previous time point is here on the y-axis and month is here on the x-axis you can see that both delivery modes show large beta diversity distances in the first month so you have very high distances here this signifies a dramatic shift in the gut microbiome but we can see that this shift was slightly greater in cesarean individuals here in red again these results can be piped through an lme model to see if changes are significant after one month we can see that the gut microbiome stabilizes over time for both groups remember here you are seeing the distance between successive time points so one month here is compared to two months here two months is compared to three months three months is compared to four months etcetera etcetera now let’s look at b in b we can see the application of the baseline parameter using the first distance action each distance at each time point is calculated between that time point and the baseline rather than comparing that time point to a successive time point so here we can see that cesarean-delivered children show greater phylogenetic change from the baseline in the first three months of life this is represented by the fact that these distances from the baseline are much higher than you see for the distances from the baseline that you can see for children that were born vaginally now let’s look at c in c we can see how altering the distance metric of interest can aid our interpretations and the last two examples a and b we saw that the distance that we were looking at was UniFrac and c we’re looking at Jaccard Jaccard distances are a measure of dissimilarity and indicate the proportion of features not shared so by using this method with first distances we can track the proportion of features not shared across time or from a baseline in this case the baseline was the mother’s gut microbiome early on there aren’t many features shared between the mother and infant gut microbiome but the proportion of shared features increases after the first year of life you can see that here let’s talk about some additional actions that are available in the QIIME 2 longitudinal plug-in that may be useful for your particular study design these include the maturity index prediction and the non-parametric microbial interdependence test maturity index prediction was originally designed to predict between group differences and intestinal microbiome development by age using supervised regression some things to keep in mind if you plan to use this method are that it generally requires both treatment and control groups large sample sizes and is not tenable to missing data nonparametric microbial interdependence test on the other hand is robust to missing samples this method focuses on the interdependence of features taxa ASVs OTUs what have you and how this changes over time pairwise correlations are calculated between each pair of features within a subject to create a microbial interdependence correlation matrix then a distance matrix is computed between subjects measuring the temporal correlation distance between and within group distances can then be evaluated using methods such as permanova if you do use this method make sure you have greater than five time points you can also find out more information about both of these methods and these publications here what have we learned not only are longitudinal studies important for unveiling trends of microbiological activity over time but they are invaluable to our understanding of the existing heterogeneity and temporal patterns which can be tracked within and between experiments if or when you decide to perform a longitudinal study you now know that there are a variety of methods available in the QIIME 2 longitudinal plugin to ease your analysis thank you for attending this segment on longitudinal methods you can use the link provided on this slide to navigate to the QIIME 2 longitudinal tutorial.