Bayesian Hierarchical Models
Apr 23, 2020 20:26 · 7960 words · 38 minute read
- Hierarchical Bayes, so what I kinda wanna do here, is build on what Shannon talked about yesterday afternoon; which I think really laid the foundation for, in my mind, the idea that ecological data are complex. They have a lot of idiosyncrasies, and it’s important for us to bring appropriate statistical tools to deal with that complexity of data. Often construct models individualistically for specific types of analysis based on the characteristics of those data and their challenges, rather than taking data and trying to twist them sometimes inappropriately to fit some long-standing canned test; the classic ones we learned, an interest that’s what derived, quite a while ago, at a time when we couldn’t solve problems computationally. And so you had to solve things analytically. So when you think about like, why regression has so many assumptions.
01:06 - It has so many assumptions because they need to be able to solve it analytically. If you didn’t have to solve an analytical, you wouldn’t have made those assumptions, and we no longer have to solve it analytically. So we should not feel bound to those assumptions. In terms of Hierarchical Models, I think they deal with; one of the things I was trying to highlight in my lecture yesterday morning, which was, to me one of the real characteristics of ecological systems, is their heterogeneity and their variability, and the fact that we can’t, we can often account for that variability even if we can’t always explain exactly what’s causing that. And so it’s important to be able to account for that unexplained variable, variability, or better way of thinking of it, is the as yet unexplained variability.
01:55 - Because sometimes, we can use Hy-Ko models to characterize variability, and then that helps point us to what scales in space, in time, in organization, in phylogeny, in whatever; that are the most important in driving our system to help us actually understand where there might be additional process understanding. That said, we deal with systems where there is just incredible complexity, and we can never measure everything. And so Hierarchical Models are a way of helping us account for the fact that we can’t ever measure everything. There’s always going to be unexplained variability. So I’m gonna start with a very simple case, to hopefully give you the idea behind it.
02:38 - So imagine I have data, and I’m sampling this data over different observational units. This may be different individuals, different plots, different watersheds, different lakes, streams, islands; however we are doing the sampling, we, ecologists are frequently sampling the world. And so imagine I’ve got data sets coming from Y different sampling units. And these data sets themselves might just not be individual numbers, but whole vectors or matrices of observations, whole tables of data. But they’re the data I collected on a particular observational unit.
03:15 - It could be a plot, it could be a year, it could be a species. And when I’m confronted with making the same measurements over different observational units, I’m often confronted with a conundrum of how to analyze that data, in terms of how I think about the independence of that data. So one thing I can do, is I can take data from a bunch of different plots, I can lump all of that data together and fit one model to that aggregated data. As a simplest case, imagine I’ve got observations from a bunch of different plots, and I’m just gonna fit a mean and say, what’s the mean abundance of a specific species. Well I’m gonna aggregate the data from all the different plots, and calculate an overall mean.
03:59 - At the other extreme, I might say, well the observations within each plot are not equivalent to each other. You know, things going on in one plot might be slightly different than things going on on another plot. And so I might wanna calculate independent means for each of these plots separately. So these represent the extremes, and what I think of as a continuum; from all the observations being completely identical, to all the observations being completely independent. And often what we have in reality, is something in between; where things are different from plot to plot, unit to unit, river to river; but they’re not completely independent.
04:41 - They’re not, you know, the, what’s going on here, is not just utterly wildly different from what’s going on over there. There’s shared information about the process. And this is where Hierarchical Models, are handy so they provide this intermediate case, to represent this continuum. So in a simple Hierarchical Model for, say calculating a mean, I would have, a mean that I’m calculating at each site, which I’m calling here the theta one, theta two theta three; it’s telling me what is the meaning of this data, but then I’m also gonna fit a Hierarchical, across site mean. So it’s called Hierarchical because it, from the perspective of the parameters in the model, I’ve added another layer.
05:27 - What’s going on here, is described by a model at a higher level. This is a fairly simple model, ‘cause all it’s saying is, what’s going on at this levels, so I had to just have an overall mean, and I have some site to site variability. Which is actually a pretty powerful concept though, you know, simple one which is that I’ve, once I’ve done this, I’ve now actually partitioned my variability. So before the variance here is just the variance of the whole data sets. Here, I have variability within each data set, but I’m fitting them independently.
06:04 - And here I’ve taken the overall variance, and partitioned into the variability within each of these sites, and the variability across sites. So I’ve taken the overall variability and partitioned it into an across site component, versus a within site component. Kinda said this represents a continuum, between these two extremes because that continuum is driven by what you see in the data. If the data tell you, that your sites are very similar, then your cross-site variance is going to be very small, and this model is going to behave more like this one. If the data tells you, that there’s a lot of variability from site to site, then this across site mean isn’t providing much constraint and it’s gonna behave more like this independent means.
06:50 - So the actual amount of variability tells you, like it kind of informs to what extent the system behaves on this end to that end, and kind of represents that whole continuum in between. From a practical perspective what we do, when we fit Hierarchical Models, is we’re not fitting these means, taking them as knowns, and then fitting that mean. But what we’re actually doing is fitting them both at the same time. So from the perspective of doing this in the context of a Bayesian Model, when I fit this model, I have a mean, I have to have a prior on that. And it’s just a prior with some parameters I give.
07:31 - Here, I have priors on each of these independent means, here, when I’m fitting this mean, this is essentially acting as a prior, on what’s going on in this particular site, but this itself is unknown, so I have to have priors on this thing. So I’ve pushed up the priors in the model to a higher level. And so it’s kind of like, from the perspective of each site, you’re fitting it giving an informative prior, but you’re calculating informative prior, from what you’re seeing at all the other sites simultaneously. And that actually leads to something I’ll reinforce as I go through this some more; is this idea of borrowing strength. That the information, the inference you make at each of the sites, borrows information from the other sites.
08:22 - Because the cross site variability tells us with overall mean, and variability is and that acts then as the prior for what you’re seeing site by site. So I’m gonna go from the high level concept, and dive into to some of the math of how you would do this for a simple case. And then from there I’m gonna actually dive into kind of the JAGS code of how you might implement these things. But then add, kind of explore some of the complexities as we go along. And then again highlight some of the strengths of thinking about things Hierarchically.
09:00 - So what I’ve kind of done here, is I’ve taken what we looked at graphically earlier, and kind of written it down in terms of some probability distribution. So here I’m saying Y of k, so k could be any one of those datasets from each of those sites. Assuming that it’s distributed normally with some site-specific mean, and some variance. I’m starting now with a model, where I’m saying, this is data, these two are unknown parameters, and then since these two are unknown parameters, I’m just gonna put priors on them. So these are the priors on those. So this is a model where I’m fitting a mean independently to every single dataset, but just for, to keep things simpler, I’ve just fit one variance for all of them.
09:50 - So I’m assuming those, the variability within the sites are, is similar. I could easily have fit a site specific variance as well. So this is just the basic Bayesian Model for fitting a mean. Likelihood of the data given a mean and variance, priors on the mean and the variance. Okay, so here I’m fitting each data set independently, but assuming they each have the same prior.
10:17 - So what I’m writing down this model, these I’m using Taos, S1 and S2 are actuals; specific numbers would need to be plugged in there for your choice of prior. I’ll just say it explicitly here, the IG is an inverse gamma. It’s not the only possible choice of a prior on a variance, it’s one that I use frequently because it’s conjugate. Okay, so if we want to extend this, to instead of being independent means, with a common variance for Hy-Ko Models, instead assume the prior contains unknown model parameter. So if we wanna move this to Hierarchical Model, we want this not to be a specific number we put in, but we want that to be an unknown that we’re fitting based on the data.
11:01 - So if that becomes an unknown that we’re fitting based on the data, then we need priors on these two things. What’s our prior on the mean, and what’s our prior on the Tao, so that Tao, instead of becoming the, being the mean on the variance them on uk becomes our site across site variance. Hierarchical mean, some of your common variance, yeah, we’re just gonna put priors on each of these. So that’s an unknown, so it needs a prior, that’s an unknown, so it needs a prior. Those of you who’ve done Bayes a lot, I know there’s those of you who they’re relatively new to it, you know in Bayesian inference everything that is an unknown needs a prior.
11:46 - Anything you’re trying to estimate has a prior. So once we move having cross-site mean and variance being unknowns, we need to put priors on them. I put this slide in orange because it highlights kind of the, some of the key take home messages, about Hierarchical Models, and why I find them particularly useful. So first of all, what allowing us to do is to model the variability and the parameters of a model. In the simple case with model with two parameters, a mean and a variance.
12:17 - So remodeling the variability in the mean. Any model of any complexity, from a simple mean, to a linear model, to nonlinear model, to some complex process Bayes Models, can be fit in a Hierarchical context; and in that context we use this additional hierarchical layer, to model the variability in the parameters. So in some of the works, in one of my postdocs Istem is, doing right now, she’s fitting process-based terrestrial carbon cycle models in a Hierarchical framework, where she’s asking, what is the site to site variability in these mechanistic models with mechanistic parameters. But the concept is that it’s completely identical. You have some model of what’s going on at each site, and then you have a statistical model above that, that describes the site to site variability in the parameters.
13:13 - And there’s lots of reasons to expect, that in real ecological systems, that the parameters or models are not always identical for every site. But at the same time, there’s reason to expect that they’re not completely independent of each other. If we think that they are common processes, we won’t, we don’t think that the variability from one site to another is unbound. That there’s some range of variability of what we would expect to find, and that nature kind of tells us about what that range of variability is. And then again, we can kind of try to understand what might be causing that.
13:52 - So it’s allowing us to take the overall unexplained variability and system and partition it out into multiple terms. So in the simple example, we partitioned the variability and mean across sites. But we can partition variability and Hierarchical Models, at multiple layers and in multiple ways. So often things that represent your sampling strategies, things that would be different if you repeated the experiment again. So if I were to repeat some study a second time, I would put out different plots.
14:27 - I would make the measurements in different years, I would measure different individuals. So those things are the sorts of things you would put in a Hierarchical model. If I repeated the experiment again, I would, I might might use the same experimental treatment levels. So the treatment levels are not random effects; those are your covariates. So things that are, that you think explain the variability mechanistically, or you think are tied to hypotheses, those are the things that you would put in, in your process model.
14:55 - But the things that are random, things that are variable, things like plots, individuals; often things having to do with your sampling schemes. And in a lot of the work we do, they often point to specific, spatial scales, temporal scales that help us actually partition the variability explicitly. And again, this idea of of borrowing strength, across data sets. In a perfect world, which may happen if you’re an agronomist where everything is laid out on plots that are as, perfectly homogenized as possible and everything is perfectly balanced, you don’t need to borrow strength. But it’s often true that you have, for example unbalanced data sets, where you have more information at these sites, and less information at these other sites and you can actually borrow constraint, through the Hierarchical level from your data rich parts of your analysis to help better constraints some of your more data poor parts.
16:02 - So in that sense, it’s kind of the well constrained parts are constraining the Hierarchical level, and then that is kind of acting as an informative prior on the poorly constrained parts. So taking to extreme, you would be borrowing strength to a new, to a site that has zero data. That would be the, at the limit, and you can’t get less than zero data. If someone knows how to make negative data, I don’t wanna hear about it. (laughs) Actually that’s not true, I’ve had a few undergraduates generate negative data. (laughter) ‘Cause they turns out they broke the data we already had. (laughs) The idea of negative, zero data, so is one strengths of Hierarchical Models is they also let you formally think about a distinguishing, distinguishing between within sample and out of sample inference. So if I am trying to predict what I think is gonna happen, so this brings us back to forecasting. If I’m trying to predict what’s gonna happen in this site, I use this model. If I’m trying to predict it, (mumbles) this site, I use this model, if I predict this side I use this model.
17:22 - If I have site number four, site number four is theta four, theta four has no observations, it’s out of sample. I’ve never made measurements before there. How do I predict that? Well, I have this Hierarchical layer, that provides an informative prior, on what I expect to see here, and what the degree of site to site variability is in this system. So I can formally say, I’m using this model with in site inference and I’m essentially using this hierarchical layer to provide us a constraint on the out of sample inference. And because you have to integrate over this site to site variability, you get a very natural result which is out of sample predictions, are more uncertain, than in sample predictions which should happen. That makes perfect sense. This gives you a formal way of doing it.
18:18 - By contrast, when you fit models independently, statisticians mean something very specific when they meet, say independent, they really meet it. If I am fitting models independently, if I have site four, from a statistical perspective, the fact that I fit models at these three sites, I have zero basis for making a prediction at site four. I have no idea. If I assume that all the things, that these things are just completely independent, that means I’m assuming that if I had a new site, it’s independent of everything I’ve seen before. So I can’t, I’m not learning anything that’s useful to me to make that prediction at that new site. ‘Cause I’m assuming that it was, it’s independent of the sites I’ve already seen.
19:05 - This gives me a way of predicting it, but it says that it’s, I predict the same thing everywhere; which might not make sense. And even if it does make sense, the hierarchical model, would tell you that; because it would tell you it’s not seeing any site to site variability. Okay, so it has this idea of purchasing variability of borrowing strength. The details of Hierarchical Models are usually hidden in the subscripts. This is the thing that I think kills people when they’re trying to read Hierarchical Models, and they haven’t seen them before; which is usually we just blow over the equations.
19:41 - (students laughs) Let’s be honest, you’re reading through some stats textbook or some really mathy paper, you’re following the words, suddenly some of this equation you just jump, (laughs) you read the next words; in hierarchical models, not only the equation is important, but what’s important all is hidden in the subscripts. I come back here, the fact that this, the detail that of what’s random and what the Hierarchical effect is hidden in that subscript k that really matters. That this is, that I’m fitting this for each site and then I have this. And if I have a site effect in a year effect and I have like a k and then I have a t; then I have to pay attention ‘cause all of your layers of your hierarchy usually get ended up expressed in terms of your subscripts. So pay attention to them. And also to remember that from a statistical perspective, Hierarchical Models are hierarchical with respect to parameters.
20:33 - You’re making a model where there’s multiple layers of parameters. It’s like with linear models, linear models can have polynomial terms in them. They’re linear with respect to the parameters, even though statisticians, whenever they classify anything, they’re always thinking about the parameters, not the equations, not the data. So you can have Hierarchical Models on data sets that are not themselves hierarchical. That said, you can also write models that are hierarchical in the data.
20:59 - So a very simple Hierarchical Model might just be, I have an across site mean and in that across mean tells me about the variability from site to site. But there’s nothing that says you can’t add process at that hierarchical level. There’s nothing that says I can’t try to put explanatory variables at that hierarchical level. So I have site to site variability, I’ve, I’m capturing that, I can write a model at the hierarchical level that says, here’s what I think explains that site to site variability. I can add process, I mean I can add a mechanistic model at the hierarchical level to say this is what explains, but whatever you’re writing at the hierarchical level, is trying to explain why the parameters in the model, in the lower level models are different.
21:45 - So you’re always writing models about the parameters; and why the parameters would change. And in fact, one of the things you’ll see discussed at the end of the hands on activity, but you don’t have to implement it, is a not uncommon thing to do for, for example, Random Time Effect or Random Spatial Effect is to move from a simple independent random effect to one where there’s auto correlation. So you might have a random year effect, but you might not assume those random year effects might themselves be auto correlated. So you’ve built an auto correlation model at the hierarchical level or you might build a spatial model at the hierarchical level saying the parameters are varying from site to site, but they’re not varying independently. I’ve also, I think, I’ll give an example later of where I’ve put in, putting covariates at the hierarchical level.
22:36 - It’s the modeling of the parameters that makes it hierarchical. Okay, so one of the most common special cases of Hierarchical models are what are called Random Effects Models. And so if we take what we had before, kind of a model that says, each site has a mean, there’s a cross-site mean and variance. And so I have a within site variance, and a cross-site variance and across site mean as the unknown parameters that I’m fitting. I can rewrite this model of the mean at each site, writing it in terms of the global mean across all sites plus alpha K, which is how this particular site differs from the global mean.
23:23 - So I’m just refactoring this term in this way. When I do that, instead of a model for the mean at each site, I have a model at the hierarchical level for how each site differs from the global mean. Now I’m gonna assume that the deep differences from the global mean are unbiased, so these should be zero centered. How every site differs from the zero mean, should be centered around the global mean. We’ve kind of a weird model to write something that the difference in the global mean are not centered around the mean.
24:01 - Probably wouldn’t wanna make that assumption. (laughter) So this is always the zero, and again this tau stays as the description of site to site variability or plot to plot variability or unit to unit whatever. I have the same prior on the within site variability, and I have the same prior on the global across site mean, and the same prior on the site to site. The only real difference is, I’ve re expressed this in terms of having the global mean at this level and the deviation from it. That’s kind of how we rewrite Hierarchical models in terms of Random Effects Models.
24:42 - For a simple mean, model mean that gives you the exact same result. It can be handy though when you write more complex models that have multiple sources of randomness to them. So again, highlighting that random effects always have a mean of zero, and the random effects are again partitioning, random effects variance attribute portion of the uncertainty to a specific source. So we’re partitioning the overall variability into the within site and the across site in this case. The random effects framework lets us, I think write models more easily that have multiple sources of uncertainty.
25:21 - So I alluded to this earlier but I just wanted to come back to it more formally. What things can be random effects, what do we put in that Hierarchical level? Random Effects or things that would be different if their experiment was replicated. So like I said before, the plots, the blocks, the years, the individuals, the study sites, they would be different if you repeated. And we’re often using random effects as one way of accounting for the lack of independence between, of observations within a block, or observations within a year, observations made on the same individual are not independent of each other. Any of our treatments covariates at several things we have hypotheses about, our fixed effects; and so this gives us in terms of linear models or generalized linear models; it gives us a class of models once you add random effects in the things that we’re always in those models we’ve fixed effects, you add random effects and you end up with what’s called a mixed effects model.
26:24 - Okay, so things to watch for, when writing down random effects models. They’re not magic, they can’t estimate variability across plots, units, individuals, lakes, rivers, islands, watersheds, whatever, if you don’t have replication. They can do things that sometimes seem like magic, like I’ve written random effect models with random effects on individual plants; where I actually technically have fitting more parameters in the model than there are individuals in the study. Which seems like magic but only works, because I’ve been following those individuals over time, and therefore I have some basis for saying what, how is this individual’s behavior different than other individual’s behavior. If I just measured that individual once, I would have no basis for saying, how is this guy different? I’d have no within individual replication.
27:26 - Likewise, if I have a study that has one plot, I can’t put a plot random effect on it, because my estimate of plot to plot variability is on, will, it’s literally just my prior. I mean technically you can do it, but what you get back is what you put in. I have an assumption about what the, my prior understanding of site to site variability is, I see one plot, I have not updated my understanding of plot to plot variability. If I fit a model with two plots, I mean Bayes lets you fit a parameter, fit a variance with N equals one, but likewise you just get the prior back. You can fit it with N equals two, you get a little bit of constraint but not much.
28:09 - The degree to which this can blow up on you, has a lot to do with what your priors are. So if you’re starting with really uninformed of priors on your, say site to site variability, and then you throw a really small number of sites at it, you will get very little constraint. They can converge but they’re gonna be all over the place. I’ve written down models where, it can be very difficult to distinguish the residual noise term from the random effects if you don’t have enough replication. That said, I mean there are a few places, particularly if you’re updating on previous work, where you might actually have an informed prior; on say what your site to site or individual to individual, or watershed to watershed variability actually is.
28:53 - So if you do have informative priors, you can get by with much lower replication. But I find it not uncommon for folks to be like, I’ve got big datasets with lots and lots of rows in them, why is nothing converging? It’s like well because those big data sets represent three sites and you’re fitting a site effect and you don’t have actually much constraint, on what your site to site variability is. You know, I can set up a data logger to measure data every second at a site, but if I only set up two data loggers, I don’t know much about their variability, and an infinite amount of observations at the, at those sites, don’t tell me about the variability across them. I had this slide yesterday just to reinforce again, this idea that the partitioning of things into random effects actually affects the inferences and the predictions we make. So now you can actually think about what we were doing when I showed this slide yesterday.
29:50 - So I was fitting a simple model that just has a random year effect, and a random site effect. And so here, when I’m making that prediction to a new year, for a known site, I’m doing an in site, in sample prediction where the site effect is known for a known site, but the year effect is not. So I have to integrate over the uncertainty of the year effect. But in this case, there’s not much observed year to year variability. So that’s where that prediction comes from.
30:23 - By contrast, if I’m making a prediction in year 10 for a new site, I have a known year effect, but I have an unknown side effect. I have to integrate over that site to site variability, which happens to be large. And if I’m making a prediction for a new site in a new year, I have to integrate over both of those sources of uncertainty; and then likewise over here. So this is just kind of highlights, again some of those points that I made earlier about how Hierarchical models affect prediction, because they allow us to make predictions about unobserved sites, species, years et cetera, whatever we’ve made a random effects. When we make the out of sample predictions, we’re integrating over that uncertainty, versus in sample, where we’re not; but also pointing out something I didn’t say before, is that because the Hierarchical model provides these informative priors, when making out of sample predictions you can often improve your predictive ability at those new sites with a relatively small number of observations because they had this benefit of the Hierarchical constraint.
31:31 - And I’ll give an example of that in a little bit with some work we did with allometry; is where, you know, if I have no information, if I just a few observations on a rare species, I can get some constraint, much more constraint in a Hierarchical model because I’m borrowing strength than I can, if I was just trying to fit that rare species by itself, without saying, I’m learning something from say, similar species. And this is an example, it goes back to my own graduate work where we’re working with forest models, and this, unfortunately it ended up low resolution but here were. These are Timesteps in years, so this is a 1000 year simulation, adult tree density, two species that were put into competition with each other. The parameters describing the means or the identical between these two simulations, what was different was whether you counted for the random effects on those parameters. And actually showing that, in this simulation we qualitatively changed the coexistence criteria, depending on whether we just stuck in like every parameters at it’s mean, or accounted for the fact that there was structure to the variability.
32:48 - So here, when we don’t include the random effects, the variance is the same, because we put that, that variability gets pushed into the residual, we’re including it, but we’re not partitioning it. And here we’re showing you partitioning it we’d actually, that structure allowed a species to coexist that, and actually changed which one was dominant. Okay, other general Rule of Thumb with any statistical modeling, but definitely with Hierarchical modeling; start simple, progressively add complexity. That’ll be a good Rule of Thumb as you dive into your own projects this week. Write down the absolutely simplest model you can for something, and then add a random effect one at a time.
33:34 - Add some of the stuff Shannon talked about in terms of like dealing with latent variables, and errors in variables, and observation error; add these things in a little bit at a time. If you add everything in at once, if you write down the Uber model that you think is gonna be the one that like accounts for everything, and it doesn’t work, you don’t know why. (laughs) If you add things one at a time, and it stops working, you’re like, aha, that last thing I added, that was the thing that didn’t work. If you had two factors at a time and it doesn’t work, you have to go take them out individually and figure out which one caused the problem. So to kick a simple example, imagining a bunch of spaghetti plot of biomass through time, but this is structured; the color here represents observations coming from the same observational block.
34:23 - So there’s some special structure that they’re from the same location, and then obviously there’s time. What I wanna do here is to dive into what this would look like in terms of some JAGS code. Some of you use other languages like Stan and stuff like that. But the codes look very similar, and we’re using JAGS this week. So first model, fitting a global mean. I’ve a prior in the mean, I have the prior in the variance, I have a likelihood for the data, giving normal likelihood with some mean and variance.
34:59 - And here the data happens to be stored in array, that loops over time, block and individuals. So I’m just looking up each individual, and this is, if it weren’t for the fact that the data was structured in array, this is just fitting a mean, which is about the simplest Baysian model you can write down. Confidence interval is very tight, predictive interval is very wide. Now it doesn’t make a particularly interesting prediction, (laughs) ‘cause it’s fitting a mean, and it’s the same every year. and every individual. Let’s try to explain some of that year to year effect.
35:33 - So we’re gonna write down, a model that says the expected value of X at some specific time, depends on that global mean and my alpha t; so my random effect for time at a specific time. That then goes into my likelihood. And then because I have a random effect for alpha, I’ve to write down its model. So I have a loop over all of my years, where each alpha t is sampled from a normal, and again with mean zero and tau t being my year to year variability. Tau t is an unknown, so that I need to put a prior on it. So I’ve kind of tried to highlight in red, the things that I added in order to add this HierarchICal Random Effect.
36:27 - I added the alpha, I needed to add the estimation of the alpha in a loop, and then that year to year variability needed a prior. And so now I get a slightly wider confidence interval because I’m now estimating more parameters, but my predictive intervals are showing much more interesting, because I’m now kind of capturing some of the year to year variability. I’m not explaining why there’s year to year variability, but I’m capturing that it exists. I could then take that output for all of those random year effects, and then ask that kind of an exploratory mode, what explains those year effects. That’s actually something that I would very often do is, take the posterior estimates of a random effect, and then I can do, quickly do a bunch of exploratory analysis.
37:22 - Plot the random year effects against temperature, plot the random year effects against precipitation, plot the random year effects against wind, or humidity or nutrient deposition or whatever it is that I think is explaining that year to year variability, get a feel for what is likely to be important, and then go back and add those effects into the model. Without going and refitting a bajillion models. We can likewise write very similarly the model that accounts for Random Block Effect. So instead of a Random Effect on Time, we’ve Random Effect of Block, after loop over the blocks, and I have a variance on the block which needs a prior. It’s essentially the exact same code, except I’m looping over a different factor; just how the data is structured.
38:16 - And then I didn’t put the confidence interval on this ‘cause it makes it messy, but we can see that the posteriors don’t vary from year to year, because I didn’t put a Year Effect in; but I am seeing that there are consistent differences. You know, the black and the cyan are, tends to be high, the blue and the green tend to be low, if you’re color blind, you have no what’s going on anymore. (students laugh) And then if you wanted to put a Random Effect on both, we just have that the expected value of X, depends on both time and block; with an overall mean of Random Effect on Block and a Random Effect on Time. So this is actually where, I think a good example of why, refactoring Hierarchical models in terms of Random Effects, is actually handy; because it’s easy to then write down a model like this that are, where the different Random Effects are just combined additively. If you move on to more complex models, you can likewise put Random Effects on parameters.
39:15 - So if I’ve a, say a linear regression, and I just have an additive Random Effect on the end, that essentially is a Random Effect on the intercept. But I could likewise write a model where I put a Random Effect on the slope, to say that the slope might be different from site to site, not just the intercept. And so I would write down a model that says, beta two is beta two mean, plus beta two Random Effect; for whatever is causing that Random Effect. And then I just plug that into the regression. Cool, and I don’t actually plot this because if it’d end up really noisy.
39:54 - I just actually plot what I show here, just overall summary table of the models that were fit. Every model has a mean, every model has this residual uncertainty. We can see as we added Random Effects, we actually started explaining more and more of that unexplained residual uncertainty. So overall variability is here, but we know we’ve partitioned that out, and explained the Random Effects that don’t have process to them, but they do and still explain, over 2⁄3 of the variability. We can see our DIC score, which is our model selection criteria, does support including those Random Effects.
40:34 - And the other thing that we do see, which is a kind of a classic thing that goes on in model selection; when we increase the number of parameters, the uncertainty in our parameters do increase. So when we’re only fitting the mean, we’re very confident about the mean. When we’re fitting this model, we’re considerably less confident about the mean because we’re fitting the mean, and then we’re also fitting a variance. But then we’re fitting the cross-site variant, the temporal variance across block variance, and then we’re fitting 10 year effects and five block effects. So this is where you can get Hierarchical models that suddenly seem like they have huge numbers of parameters, but an important thing to remember about those alphas, is they’re not independent.
41:18 - So the total number of, so this is one of the things that’s challenging about hierarchal models, you know, what is the number of parameters in the model? Well, it’s not the number of alphas because those alphas are not independent. So the effective sample size of parameters is often much smaller and maybe it’s ill-defined, just by the structure of the model, because it really depends, on what the estimates of those Random Effect variances actually are. So the Random Effect variances are very tight, the effective number of parameters is much smaller, ‘cause the Random Effect variances are very large, you’re effectively are fitting an alpha for every site. Because you’re effectively fitting every site independently. So it’s something that it merges data set that the number of, it makes it kind of, makes things a little unintuitive.
42:08 - But that the idea that the, effectively the number of parameters in the model, depend on what data it’s fit to. Okay, I showed this earlier, but I wanted to come back to it again; to show in going beyond fitting a mean we can fit models that have both Fixed and Random Effects. Kinda went through this earlier, some high lever points here this idea that, we’re using Random Effects to try to explain the unexplained variability. We can use this to point to things that we need to explain, and then we can add covariates to try to explain this. So this is exactly what we were talking about earlier, this idea starting with Random Effects, figuring out the scales that need additional explanation, trying to put process understanding on the things that aren’t understood to try to explain the unexplained variability, but noting that sometimes those additional Fixed Effects are not justified.
43:09 - So that’s where model selection is helpful. They might explain it but it might not be parsimonious. So a simple year effect, so consider some number of new younger produced per adult female in a population of birds. Then let’s say that we fit a year effect and it shows that there is a coherent year to year variability through the whole population in that reproductive output. And based on that, you could use as your year effects to look at the additional covariates; without necessarily, and like I said, you can do the exploratory analysis on that without having to go back and refit a bajillion models.
43:53 - If you have simple data, you can go ahead and fit a bajillion models. But if you have complex, if you have large data and complex models, you might not, you know, things that were, it takes a week for the MCMC to run, you don’t wanna try a bajillion candidate models. You wanna have some idea of which ones are likely, and then you can refine the model to add these additional covariates. So some take home message is, is this, just as important to account for the variability, as it is to account for the covariates. And this is again, something I tried to emphasize yesterday, when we’re making, when we’re doing forecasts, how we characterize the uncertainties, how we partition them into different sources; the driver uncertainty, the initial condition uncertainty, the parameter uncertainty, the Random Effects.
44:44 - Understanding the variability can be just as important, to our ability to make predictions as what the mean model is predicting. And then we use these Random Effects to account for these unmeasured, often unmeasurable covariates, these things we can’t yet explain. The slides like I said, are all in line, so you come back to ‘em. So this is an example from my own dissertation work, where we we’re trying to fit relationships between tree diameter and tree height. And one of the take homes is this, this idea of borrowing strength.
45:15 - So here are common species, I have measurements of diameter and height across the full range and lots of things. And so the black line is the Bayesian estimate, the red line is just the linear aggression, that line is the maximum tree size, tree height from the literature and they’re the same. I’ve lots of data, the Hierarchical model gives me the same observation, same estimate, so that I just fit these things independently; because the data at that scale dominates. Here, I have tree species that are predominantly just in a canopy. I didn’t, there’s not recruitment for these species, and so some of my linear regression fits say, you know, yeah, they’re right at their maximum height, and if the diameter is smaller, they would have been taller.
46:13 - (laughs) Because that’s the maximum likelihood fit. By contrast, here’s what we’re getting. We’re getting the idea that, because we see consistent across species variability and the relationship between size and height, that it says, well, no, trees have to start small and get big. This is the pattern by which trees tend to start small and get big, and I make sure it goes through the data. So I have an intercept that’s consistent with the data, I have a slope that’s largely coming from the Hierarchical model. And likewise over here, we have rare species that tend to be more in the understory, where, you know, sometimes I have no relationship to height, sometimes I’m growing super fast.
46:59 - The Hierarchical model gives much more biologically plausible, because I am borrowing the strength from the common species, to constrain the rare species. Or species that aren’t necessarily always rare, but not observed over a range of variability, that lets me to constrain the parameters fully. And then these other examples just go on to say that you can also apply Random Effects to a nonlinear model. So here’s a simple Beverton-Holt pop’n Density-Dependent Model, recruits are a nonlinear function of spawners. This alpha parameter and this r parameter, were treat as random effects, treated hierarchically, so they were fit with an overall mean on each of those, and then variability on each of those.
47:46 - And so you can get, you can fit them hierarchically, and I think that Hierarchical fits or the dash lines, in the site to site fits are the solid lines, and the site to site site, level fits, sometimes do these somewhat uncensorable things. And then here’s another example of a nonlinear model, this is coming out of one of my grad students. Dissertations are fitting the Farquhar von Caemmerer, Berry Photosynthesis Model, which is nonlinear with the change point transition between these two parts. And here we decided, that we wanted to explain the variability in two of the parameters, the Vcmax, which controls the maximum photosynthetic capacity, and this alpha, that let use of the quantum yield, which can kinda controls photosynthesis at low lights. And we actually wrote down models for those things in these nonlinear process-based models, that are a function of Fixed Effects and Random Effects.
48:40 - So we actually had a random leaf effect, a fixed effect for month, a fixed effect for leaf nitrogen, and then here we had a fixed effect for leaf chlorophyll and leaf, specific leaf area. And so we ended up with a model that’s got a lot going on, (laughs) quickly, how it behaved, predict observed before, and then after we accounted for the leaf, the leaf variability. .