State Space Models

Apr 23, 2020 20:27 · 4202 words · 20 minute read na missing gonna depend error

So State Space Models. So again, to remind you of the kind of inferential world we’ve been in, we generally think about modeling statistics as kind of signal versus noise, and we have some model of that signal. And in this case, you know, a linear model with some kind of slope and intercept. And we put biological or ecological relevance on those parameters, oftentimes that we want to assign signal to. And then we describe the uncertainty, in some ways, that we’ve discussed. Yesterday, we went through all the different kind of assumptions that we generally have, when thinking about this type of modeling approach where you’re fitting some linear model through a cloud of data.

00:49 - And sometimes those data are missing and what do you do when they’re missing. You guys have now all done confidence intervals and predictive intervals and so you know, that you come out with confidence Intervals that are based on how tightly you got those parameter values, the precision of the parameter values, right? And you can always make those confidence intervals smaller by adding more data in, you get better parameter value estimation. But when you do your predictive interval, you wanna include some of that uncertainty, because you’re predicting. So if you’re predicting at a given spot here, on the X axis, you would start with the mean, right? So your prediction would be at the mean, but you would have some interval around it based on both that observation error, and how well, you were able to characterize the model. Okay, so this is kind of all background, what we’ve done the last couple days.

01:40 - And when we’re forecasting, we’re generally thinking out beyond the end of when we have data. So out of the range of the data that we have. And so you can imagine that the confidence intervals, and certainly the predictive intervals get bigger, when you don’t have data and that’s again, a standard approach. You fit a line you can predict out, you assume that, that line is representing a bigger population than what you’ve sampled. So if this is like some atmospheric measure, and this is time, then if we extended the types of inference that we’ve been talking about, where you fit a linear model through a cloud of data, and you try to predict out in time, you can imagine that the things that we’ve talked about, the reasons that you have uncertainty in the data, all come into play at once, right.

02:27 - Like so we’re outside of the range of our data, we still have observation error, and oftentimes, certainly with climate or with a lot of ecological things, we’re also making predictions based on variables themselves that aren’t perfect. They haven’t been measured completely and maybe this, atmospheric measure is actually a compilation of multiple measures. And so all of that makes this prediction issue difficult. Ecologists have not had a ton of success in making forecasts or predictions that are then validated in five to 10 years, even when you do wait five to 10 years. And a lot of it, you know, if you predict on this mean line, even if you predict within your data range, you’re almost never right.

03:14 - If you were to make a prediction on the line, and then go back and collect data, you know, you don’t necessarily expect to get within, you don’t expect to hit the line. In fact, you almost never expect to hit the line, right? And so understanding how off the line and why you might not be on the line, is super important. And it only gets more important, when you’re trying to do it outside of the range of your data. When you have a bunch of latent variables. So again, you’re measuring something like MPP.

03:44 - And, yet you’re really measuring something else, but you’re interpreting it as MPP. And you have them connected in time. So you wanna know how that ecosystem measure changes over time. Then you start to get into the realm of state space models, right? So one thing I didn’t point out in the last few years, I put time on the X axis and then showed you a linear model that obviously violates the idea of independence. You could just jam all that into correlations in the error, but that’s not necessarily gonna help if you’re forecasting, right, it may help you get a better P value estimate or be more confident about your P value on the slope. But it’s not necessarily gonna help you forecast outside of your data better.

04:29 - So when you have latent variables that are connected in time or space, that is, they’re not independent. A state space model is also a dynamic model. So it’s a model that a future state depends on the current state, or in much of what we’re gonna talk about, it’s gonna depend the future state, is gonna be the next one, right? So with just a lag of one, but it could also be different lags. If it’s a linear process that you’re thinking about, then it’s also called a dynamic linear model. So putting all this up there, because you may might hear some of these in ways that seemed kind of interchangeable. And they often are. But dynamic model is state space model.

05:14 - And if it’s got a linear process, then dynamic linear model. So when I say that the current state, right, X of T, depends on the current state in this case would be T plus one depends on the last state. So knowing something about what happened tells you about what it should be now. State in this vocabulary world, is the variable of interest rate, the state of the system. And if you remember three slides ago, I said it was latent variables connected, right.

05:45 - So the other part of this is that you had that data model, we’re out there, measuring Y and then interpreting X, right. So you have your process model, which is the X’s. So now we have moved from X’s as always representing your predictor variable, okay. It is now a latent variable of interest your response variable, but not the one you measured. And in this case, I’m showing that, that is where the the Markov process is, right? That’s where you’ve got that dependent structure.

06:20 - One state depends on the last but your observations Y, are still your data model. They’re what you measure. And then you put a model in there to identify how they relate to X. And, importantly, you get processor and an observation error rate. So you can partition the error out. And because this is gonna be a Bayesian process, you then have distributions on those errors that you’re going to estimate. So the state space model is really, it’s just a really flexible framework for identifying how states of interest transition through time or space.

06:57 - They don’t need to be normal, neither X nor Y needs to be normal, they don’t need to be the same type of data. You can have multiple types of data, multiple Y’s and forming an X. So yesterday I showed an example of fecundity that was informed by cones and seeds that were collected at different scales and sometimes in different years. So that’s two Y’s and forming X, the fecundity. They don’t need to have the same timescale, missing data fine.

07:21 - Again, this is probability based, so everything unknown, including an NA in your data set, is treated as a random variable, right? And it’ll be drawn from that that data model, and it handles multiple data sources. So it’s a really powerful, inflexible framework for forecasting and for ecological inference more broadly. What we’ve been talking about up till now, is process where knowing the last state, T minus one, tells you everything you need to know about the current state, plus error, right? So it’s a random walk. You’re basically adding a little error each time to kind of move away from what you were the last time, right. And so that’s the same thing I showed before.

08:09 - But now I’ve specified that this is a random walk model, and you’ll hear that term come up a lot. It’s used as a no model. You know, this model tells us that there’s no predictor variables in here, right? So there’s no kind of explanatory process. It’s a no model that sometimes does better than your best type of medical explanatory process, but kind of in the gut, really shouldn’t. If you got the explanatory process better. So this is often where you start, right? You wanna get this working first.

08:41 - And again, to break this apart again, in the graphic notation, we’ve got these X’s, these are our process model. This is the true state, not what we’re actually observing, but the true state. In the example I did yesterday, this would be fecundity, so this is like the the actual fecundity of a tree at the last time, the current time and the next time. Okay, you know, there’s no more explanatory variables in this process, it’s just knowing it, tells you what it will be, we have a parameter model, and you have some error that you specify and fit on that process, right? So each time you know where you are, there’s a little error that’s added and that’s how you get to the next time. And there’s no real structure to that error, other than what you specify and the data model. These are what you observe.

09:30 - And so, importantly, what I want you to get from this kind of slow tedious process of building this graphic notation, is that these are dependent on each other and these are not, Y’s are independent, conditioned on the process and X. Okay, so which is, everyone should go ah, it’s a big deal, right? ‘Cause you’ve got away from independent, so that error in the data model, is not beholden to the independence assumption, you put your error on those Y’s as well, right? But it’s not beholden to independent assumptions. So what does this look like in code, now that you guys have all had some some time with Jags and the code, we’ve got your data model, this should look really familiar. You know, your Y eyes are normally distributed with some mean X. That’s your true state. And the true state is normally distributed around the last true state.

10:35 - And then you have your priors ‘cause you’re estimating both of the errors, your process error and your observation error. And there’s a new line here. Your initial X yeah, so when you start, you don’t have an initial X, you need to put a prior on that, your initial condition. These are data that are New York cases of flu from 2015. We’ve got the months across the, I should have said, this is the cases of something in New York City in 2015. And the months are across the X axis and what kind of epidemic happens at this scale, but so it’s flu. And there’s some missing data here.

11:16 - And they’re kind of important missing data, right? Because you can imagine if you’re dealing with, in the public health department, or if you work in a hospital or selling Kleenexes, you know, you kind of wanna know, is this gonna like, is this peeked out? Am I gonna sell a ton more Kleenex? Should I stock the shelves? There are reasons that knowing these data are important. It’s obviously nonlinear, right? And it’s also, so it’s the flu, so one person with the flu give the flu to you know, two people who give the flu to four, right. So you know that there’s some structure in the actual spread of flu right. So non independent cases. So obviously we’re not gonna fit a line through these data and, do anything useful. And so state space approach. And so I ran the state space model on that, the random walk.

12:14 - And in this case, it did a pretty decent job coming up with converged processing observation error. There’s a lot of data here, but this is one of the things so it’s one of the strengths of a state space model is that you can partition observation and process error. But it can also be really difficult to partition observation error and process error, you can get real trade offs among the two, you know, and you could you can see some kind of five Ariat densities come out, you know, in general, if you have any way to constrain usually observation, of if you know, something about how off you might be and you can constrain observation area, then you can get a better estimate of process error. And so this is the, you know, the random walk was run and things converged, yay. And then you make the credible intervals around that.

13:13 - So these are the 95% credible intervals, you know, and surprisingly, didn’t do a great job at the initial state, did an okay job at the late state. And you have the biggest uncertainty where you don’t have data. And in fact, an uncertainty gets bigger, the further away you are from data, right? If you remember the the graphic notation, best information, the best estimates you’re gonna get about XT, are always gonna be where you have T plus one and T minus one. If you don’t have either of those, then you’re gonna get less and less certain about XT, which should make intuitive sense. And so that’s what happens here, right? You’re not super certain.

13:55 - And you know, you get some information, the flu could peek at 3000 cases or 6000 cases, is kind of what it’s telling you and that’s better than no information. And it may be less certain than if I had done the model without characterizing the uncertainty and propagating it. But that doesn’t make it wrong. Right, you’re more likely to actually have the true estimate be within that range. Even if it may seem less satisfying to have a bigger potential range, it’s the right, bigger potential range. Of course, this time you have the data and you put it back in and yay, it did a good job.

14:39 - And it actually looks almost, kind of like the air is symmetrical. But there are a lot of data and so knowing last month’s flu, turns out to be a pretty good predictor of next month’s flu. But what if you only have the first half, or you don’t even know if it’s the first half. What if you only have like an initial ramp up of the flu and you know, you’ve run out of vaccines and you wanna know if maybe you’ve started to make a difference or things are actually gonna get worse, right? So again, this is an important question. And right now you’ve got lots of data in the T minus one kind of realm.

15:16 - But as you get to the T plus one realm, you’re getting into places where you’re gonna have to be estimating a bunch of X’s with no future information. And so again, you would expect the uncertainty, as you get further away from here, to get bigger. It doesn’t matter if you, you know, can fit a spline or something to this perfectly, and you just extend it out, you should still get less certain about where you are, as you get further away from the last data point you had. Okay, so hopefully, intuitively, that should make sense. And in fact, you can plot that out and even though the first model, where you had some information back here to constrain it again, look like you know, knowing last month is a really, you know, powerful way to predict this month.

16:08 - That’s only true up to a point, right? So this, this is good, right? There’s more information out there. We’re not completely helpless. Beyond knowing the current state as a predictor of the future state, there’s gotta be more information out there, that is important. So this is a random walk, right? And even though it looks like it does really good, it only does really good if you have a lot of data and you have data, you know, on both sides of the series you’re trying to forecast and oftentimes, we’re not trying to forecast things, we already have some future data kind of, to bound. So, you know, how do we go from boundless possibilities in the future, to actually putting some bounds on that, which I mean, this is almost useless, right? This forecast here is almost useless for management of any sort. And obviously, you would wanna pull out some predictor variables that help constrain that beyond the random, no model.

17:11 - And so a dynamic linear state space model is, you actually just kinda add the linear model in, right. So the true state is still a function of the last true state, but also of a linear process. And there are many special cases, where you move beyond the linear process, and I’ll show one today. But you’ve all been doing this. And in this case, I’m putting temperature in here, which doesn’t have, there’s some biological reason to think temperature is predictive of flu, but mostly I chose it because things peak in the winter. And then you still have your data model. I didn’t model this. But you can imagine that there were temperatures that were lower in the beginning of the flu season, and as they got lower again, the temperature should have some ability to pull that uncertainty air in, right, okay.

18:05 - And there are lots of other things you would want to put in, if you were doing this for some public health reason. So you can also do a nonlinear state space. And the example I’m gonna cover here, is kind of a mark recapture example. A lot of people out here who think about animals and how you sample animals. And in this sampling mechanism, you go out and you capture animals, and you tag them, and then you let them go, you go out again, and you capture them again.

18:36 - And you assume that the proportion of animals you recapture, but you assume that it’s some proportion of the total population. Right, you marked some proportion the first time and when you go back out and you recapture and you get those marked ones back that, right, it’s a ratio, you get the proportion of marked ones you get, the ones you marked is proportional to the proportion of the initial capture and the total population, okay? So the more you do that, presumably the more information you get, there are some assumptions. It’s random and it’s not 100%, which usually isn’t a hard assumption to meet. Obviously, don’t do it with plants. Okay, so the data look very different than the last flu counts. So an individual record, the individual is I and the record itself is a vector of, in this case, somebody went out five times and they captured it, then they didn’t and they captured it then they didn’t, then they didn’t.

19:37 - And so there are different realities that match with this capture history. This is a Y, an observed capture history. These are the potential true states, so here’s Z as the true state. The potential true states are that, it was alive here. And and pretty much that’s the only one that works there, because then it was alive again, right? All of those true states involve being alive, but not captured, ‘cause it’s captured again. And you kind of assume, if you capture it, that is, unless the the way you marked it was in some way vague.

20:13 - If you capture it, and it’s been marked, it’s alive. And it survived until you captured it right, by definition, but you don’t know about these last two, right? You know that, your capture probability isn’t 100%, ‘cause you got the zero and then it showed up alive again. And that actually is good. So you don’t know about these two. So the different possibilities are that it died, and you didn’t capture it, or that it survived one more capture, one more sampling period, or it survived all of the sampling periods, right. And so now we have a true state that instead of being a continuous distribution, we’re not gonna describe it that way. We’re actually gonna define the discrete possible states and assign them for our probabilities.

20:56 - From this we don’t know when it died, but we do know that, we didn’t capture it with 100% probability. And so we get some information about survival, without being recaptured. So this is the graphic notation, you’ve got your observations that are independent, right? There’s no arrows connecting them, you’ve got your process model that are dependent through time. And you’ve got your, in this case, the parameter model is doing error that also varies temporarily. You could just have one P and one S. And you could say that all X’s have the same constant variance.

21:33 - Or you could imagine that variance might change throughout the season. And your ability to observe might change throughout the season. So in this kind of example, we’re assuming that the variance in our ability to observe, given that it’s alive, and our variants in being alive, vary during the course of tease during the progression of these data sources. So survival, we’re gonna use a Bernoulli probability. And in this case, again, we’re setting up these these kind of discrete probability descriptions.

22:08 - So you’ve got the probability that X the true state at time, right now is alive, given that it was alive last time. Okay, so we’re defining that as ST. And the probability that’s alive, given that it was not alive last time, is zero. And then the probability that you don’t see it, given that it was alive last time, is just one minus that alternative, right? There are two options there and they add up to one. And again, there are two options here the given that, if it wasn’t alive last time, it’s either not alive this time, or it was like last time, we already said that’s zero, so it’s one, right so you’ve got two different ways that you’re getting to a probability of one. The probability of your different states which are zero and one, each have to equal one, ‘cause we’re in probability round, so it has to equal one.

23:04 - And then you do the same thing with the observation model. But the observation model, remember is not, Y’s aren’t dependent on each other. So it’s Y is not alive, or you don’t observe it, conditioned on the fact that you really didn’t estimate that it was alive last time. And then you put your priors on the errors in this model. And I guess I will point out that, I’ve noticed as I’ve gone through this slide, sometimes I’m using T plus one and T and sometimes I’m using T and T minus one.

23:37 - Just remember the graphic notation where you have X, T is in the middle, and they’re both important. So, if it wasn’t bothering anyone then forget I just said it. (laughing) And so this is actually data from our own John Foster, who is working on mouse recapture study. That’s 11 years. So I wanted to have a final kind of figure to show, that, you know, even with this nonlinear approach, you get very similar credible intervals and they get wider, right? So, you know, these parts are, put the data on, but those parts are where there aren’t data and he’s estimating this on a daily, you know, are they alive on a given day. And so there’s lots of days off in between sampling bouts, and the longer that goes, the wider the uncertainty gets, which is what you would expect.

24:32 - And these solid lines are the minimum number of lives, so you know, that that you caught 10, there’s at least 10 out there that are alive. And so we plot that to make sure dipping below that, which would be an indication that we had the dead thing showing up in the trap. Yeah, but it is also just really cool data. So he’s doing a good job of it. And so that’s the end of the kind of introduction to State Space that I had prepared. .