Expert Elicitation

Apr 23, 2020 20:51 · 5777 words · 28 minute read navigate narrowly constrained choosing applied

  • So, expert elicitation is a pretty powerful process. It’s a rigorous approach to be able to systematically describe relationships, whether those are probability distributions, or connections between various variables that we expect, and really make our scientific knowledge transparent. It’s been used in a number of cases where there simply is not data and information to be able to make assessments, but experts have knowledge that needs to be included in decisions, so, for example, some pretty famous examples of very rigorous expert elicitations being employed related to choosing between different locations for where you could store nuclear waste in the United States. And it requires thinking about tens of thousands of years from now, and implications for failure of the storage containers, and when you would expect those to degrade. It involves calculating or quantifying probabilities related to how much you would expect if you were to have seepage, when you would expect it to potentially hit ground water sources or other things that would impact humans and human health.

01:30 - And so, these are really useful techniques to employ within forecasting because, in some cases, we have a lot of data to be able to describe things that we care about, sometimes we don’t have as much data as we would like, and we can use clever statistical approaches or other mathematical approaches to fill in the gaps, and in other cases, we really need to use experts. The paper that I put in the reading list was done by Granger Morgan, who has done a number of expert elicitation exercises within the U.S., primarily focused on public sector decision making, so a lot of things relevant to environmental management, and he says in his paper, “Done well, expert elicitation can make a valuable “contribution to informed decision making. “Done poorly, it can lead to useless “or even misleading results that can lead decision makers astray, alienate experts, “and wrongly discredit the entire approach.” And so, similar to anything we do, there’s probably a way of doing it right and many ways of doing it wrong.

02:45 - And so, if you think that an approach like this is useful, and that it’s probably worthwhile to make sure it’s designed right and potentially involve experts in helping to navigate some of those processes. Just to give you a sense of the things that need to be considered when designing this, and this is not to discourage you from using this, it’s more to say we use it a lot, individually, within modeling processes, because we are experts ourselves, but when we are going out and seeking outside expert opinion, there’s just a different level of rigor that has to be applied. So, you wanna make sure that, if you’re asking a question, it’s relevant to someone’s expertise and that they will be able to make some kind of, in this case, probabilistic or predictive judgment. You don’t want to use qualitative uncertainty language, there’s a number of studies that have shown if you use words like “likely” or “with high certainty” or some of the language that’s used in the IPCC, they define the probabilities that are associated with it, but if you don’t define those, what you find are terms like “likely” have ranges between 30 and 70%, so, what do you do with that when you’ve structured something qualitatively? There’s a lot of mental shortcuts, this word called “heuristics” that we use a lot, that’s basically, these mental shortcuts that we use a lot in decision-making, they can rear their heads in really unpleasant ways if you’ve structured your elicitation to more or less play to some of those heuristics, so you wanna be aware of the ones that typically arise when working with people, and keep those in mind so that you design to minimize or potentially mathematically correct for biases like overconfidence. I can give you a paper of ours that we did that after collecting elicitations. Experts, you wanna choose them carefully.

04:46 - You wanna think about the protocols of how you’re setting this up and how you’re iteratively refining this, so, similar to your data, you wanna have calibration checks and ways of validating it. You wanna think carefully about the uncertainty, a functional form, the diversity of experts similar to, given what you wanna say so that you’re not creating representation that’s too narrow based off of the entire expert field, and you wanna think carefully about combining expert judgements because a lot of times we might wanna collapse probability distributions across multiple experts, and there’s cases where that works perfectly well. I’ve done it a reasonable number of times, and there’s cases where it doesn’t make sense, similar to ecosystems. Some you can aggregate and think about them in terms of more general characteristics; some are so distinct you really need to treat them independently. So, I wanna give you an example because we keep talking about weather forecasting as a correlation.

05:52 - Decision analysts have been in the mix from the very beginning of weather forecasting. So, we went from this situation where we weren’t very good about it; we now have models that are able to predict the thunderstorms that are gonna occur and the hail that that brings down at a pretty reasonable prediction level. However, there was a natural experiment that we had with weather forecasters that we don’t necessarily have with ecological forecasting, but it gives us insight into how we might be able to think above experts with ecological forecasting. So with weather forecasts, from the very beginning days, even before we had models, we had local weather forecasters who were the local experts who would predict weather conditions. Now, you couldn’t do that very well, like without some of the more sophisticated technology, but you had a situation where you had tons of these people, plus, you had your model predictions, and allowed you to actually compare experts versus models.

07:04 - So starting back in 1965 when a lot of this started, you could look at an expert, a weather forecaster, versus a weather model, and see which performed better using scoring roles. Experts did pretty well for awhile. And that’s because a lot of the models weren’t able to predict as much as what the experts knew, and we’re not necessarily able to do that at the local level. You had a lot of local knowledge that was being incorporated into those probabilistic assessments that couldn’t be captured by the model. But, over time, over two decades or so, the models caught up, and actually I think about in the mid 80’s is when the models started outperforming regularly most weather forecasters. And so I think this gives some insights into how we think about it in terms of forecasting because we’re still using expert knowledge.

08:11 - So this is one that pulls from multiple different data sources, but how they think about aggregating is not necessarily using models, even though you have individual sources based off of models and data sources, they actually have an expert process that developed these qualitative scales related to whether or not drought is persistent or drought improvements are likely in things in between. Now, I would argue that there could be greater transparency with the process that they used to make those judgements, but it is an example of how we have experts operationalize into forecasts that are used for a lot of decision-making processes. Which means, I would argue, especially at this stage where ecological forecasts are really nascent, that if we think about how we combine ecological forecasts with formalized expert judgment, really understanding the experts’ assessment of a particular problem that we’re trying to understand and solve, that we can actually get better ecological forecasts, and that that might be something that is useful to operationalize as we work to improve the skill of the models over time. So we don’t have to wait for models that perform slightly better than 50-50; we can get predictions and forecasts on timescales that are decision-relevant, by thinking about how we combine these more effectively. So, the three major ways that I see expert elicitation being particularly useful within forecasting is, one, is we don’t need to use them for uninformative priors; there’s a lot of cases where we have existing knowledge, and we can construct an informative prior.

10:11 - And expert judgment, even your own expert judgment going into those kind of modeling contexts in a formalized way, can be a really powerful way of including it within these kind of Bayesian forecasting approaches. Likelihoods, so this is not dissimilar from the weather forecasting situation where they use a model as a prior, and they had local information you got a posterior distribution, that was combining those. In other cases, a lot of times in the models that I’ve constructed, we have questions whether there simply exists no data or no data on the phenomenon that we care about. And that means that expert elicitation can be a really useful way of sort of bridging the gap and constructing the likelihood when you have an uninformative prior and you really need data. The other is, so these are our constructing probability distributions, and we’re gonna do that in a second, the other thing is we’ve talked about these relationships between various variables and the causal structure that we’re thinking about, and there are approaches, a conceptual model, influence diagrams, mental modeling, where what you’re doing is you’re capturing your expert knowledge, structural equation modeling, you’re capturing your expert knowledge and formally representing it in a causal structure of that model, and formalizing it, and formalizing the choices that you’re making, because there’s trade-offs with any of these modeling choices.

11:57 - We’re not able to represent absolutely everything, and there’s a reason why models penalize complexity. Those are some entry points, but what we wanted to do was actually construct a probability distribution. So, I’m gonna invite Mike up here to talk about constructing a prior related to the travel time from Boston University to Logan Airport. - I’ve done this example in both my graduate classes, with the forecasting class and my Bayesian class, and I’ve run into students years later, that remember nothing about Bayes’ Theorem, but never miss flights anymore. (everyone laughs) And I know not all of you, or most of you have not lived in Boston, so I’m gonna rely on some of those folks in the back who do.

12:48 - I’m also feeling a little nervous doing this in front of Melissa who is a bonafide decision scientist, - [Melissa] I’m totally judging you. - Because she started with the quote about how done poorly, and I was like, (everyone laughs) I’ve never gotten formal training in this; I’ve just been reading up on it. I’m like, “She’s gonna just tell me “I’ve been doing it wrong for years.” My understanding, which all started as book knowledge, really reiterates the point Melissa made about some of the cognitive biases and the importance of doing elicitation in a way that tries to avoid those cognitive biases. And the place that I have used elicitation most often has been in the construction of priors.

13:32 - I feel like if you read the literature when you’re trying to learn Bayes’, there are an abundance of uninformative priors used all over the place, and if you think about that, well, statisticians don’t actually know how anything works in the world. (laughs) They don’t know how your system works; they don’t have any expert judgment. So of course they’re gonna use uninformative priors. But you do. You know a lot about how your systems work. But I’ve seen so much of the ecological Bayes’ literature is just ripe with misapplication of very, very broad priors on things that we do know a lot more about.

14:15 - On the flip side, I’ve seen, particularly in the process-based modeling literature, a lot of examples of one of the most important cognitive biases with elicitation, which is anchoring. So, if you have a process-based model, that model has default parameters. If you ask someone who’s been using that model for a few years or developed it for decades, to construct a prior, they will center their prior on the default parameters. They will then set fairly narrow bounds around that, and often they do it uncritically. I’ve seen tons of papers where people’s priors were the default parameters plus or minus 20%, which is, yeah, you’re wa-ay over-covered.

14:59 - For some of those parameters, if you actually sit back and say what’s the biological range of variability for this process, it’s not 20%. (everyone laughs) You don’t actually know that. But I’ve been through this exercise with folks that build these global or system models and saying, okay, let’s step back, and think about what values parameters can take on. And one of the ways that I’ve done this, has always focused on building a probability distribution, but building a probability distribution in terms of the CDF, the Cumulative Distribution Function. For example, if we have a normal distribution, centered on zero, standard normal, variance one, that’s our PDF, but the corresponding CDF for that, kinda looks like a loget, and it’s centered on zero, minus one to one, it’s the integral of our PDF.

16:05 - And for various reasons, it’s often easier to work through the logic of how the CDF might be structured and then having constructed that CDF, to essentially take the druid of it to get back to the PDF. So, if anchoring is one of the key cognitive biases, anchoring starts by thinking about the middle. I think what we often want to do when thinking about constructing priors is to instead think about the extremes. So what we’re gonna to do in constructing a CDF is we’re gonna work from the outside, in. So, in a few days, a lot of you will be leaving Boston and need to get to the airport.

16:48 - You’ll be starting here on campus or fairly close to campus. What’s the minimum time that it can physically take to get from here to Logan? Under the absolutely, positively best possible case, disregarding all state and federal law (laughs) and no traffic. - [Melissa] And any form of transportation. - Yeah, no limitations. Well we know for one thing, it’s a positive. We’re not assuming that we can have a time portal and go back and we can get to the airport before we left. But there’s many days when I wish I did.

(laughs) 17:29 - But yeah, for reference if you don’t drive on the roads, it’s a little under five miles to campus, from campus to the airport. So maybe you hop on your private helicopter, it’s an Apache Attack helicopter, it’s going, cruising. It’s gonna take, yeah, a minute or two. So let’s say the lower bound might be around one minute. Is there an upper bound on how long it takes to get to the airport? (everyone laughs) Given that the Boston traffic can sometimes suck, and there is a background mortality rate, you might argue there is not, there (laughs) that you might not ever make it to Logan. - [Melissa] Show me the maximum amount of time walking there, if you were walking five miles, is that hour-and-a-half? - But it could be raining. It could be snowing.

18:36 - I’ve tried to walk down the street out there when there was literally 10 feet of snow. So assuming that you did not die en route, it still could take, say, tens of hours, at an extreme. That is your worst, worst day nightmare. But we say that worst nightmare doesn’t actually happen very often; it’s the asymptote of a probability distribution. It is out in the tails. - [Student] And it’s really good to get out of that, this is a really scary situation, how to get to a realistic model or understanding of what the system is. - That bias has a name. (laughs) The Availability Bias. (laughs) - [Student] ‘Cause I think there’s some times it can be really helpful to just step outside of emotions and biases and things like that, and I was wondering if– - Yeah, Availability Bias.

19:36 - So it also depends on your framing, because, if what you’re trying to do is like, would I take blue skies planning, so you’re trying to preemptively think like, we are trying to manage for various species, and we’re thinking about introducing feral pigs, would this be a good idea? And you talk with different experts, I would approach it two ways, and this is usually how I construct my elicitations. The first thing I do when I meet with an expert, is actually I sit down and talk with them about that system. And I get them to just literally explain that to me. I construct a conceptual model. How many of you think in box-and-arrow diagrams? Okay, yeah, so there’s a couple of you, and if you think in box-and-arrow diagrams, you have a really powerful skillset, because when people are talking, you’re literally like, ch-ch-ch-ch, and drawing these. That’s what you’re doing during that kind of phase, is that you’re literally having someone explain to me, the relationships for a phenomenon that we care about, and you’re sketching it out, and then you’re stopping, and just checking.

20:46 - This is what I’m hearing, is this the way that you conceptualize it? Which of these are some of the dominant drivers, which are the minor, are there other factors that you would expect that would influence it? The whole idea behind that is to do exactly what you were talking about, Jen. Let’s get people outside, thinking both about the dominant things that are likely to affect that system, what they would typically model, but also about these other factors that would influence it, both environmentally, but also the human-social factors, that may not be included in an ecological forecast, but absolutely in some cases affect it when we’re doing projections. And we need to set boundary conditions. Then, after we’ve done that, and I construct a conceptual model, they’ve had a time to reflect on it, I’ve maybe even sent them a summary, so that if they have other ideas, they’ve been able to do it, usually in a separate session, that’s when I do a probabilistic assessment. Because when you do that, and ideally with a conceptual modeling, and with other things, what you want them to do is also connect the dots. So, we’re thinking about the effect of the ground snake on the… (everyone laughs) - Crow.

22:08 - - On everything but the crow in particular. And I was trying to think of the Mariana crow. What are some of the other invasive species examples? What do we think might have similarities or differences? What are the invasive species introductions that have worked out well? To create some balance. Which ones have worked really poorly? What have been some of the unanticipated effects so that then if you’re constructing some type of probability elicitation over something you care about, you’re getting them to think about it both in terms of those extremes and what is the most likely to happen. That is easier to do in blue skies planning, when something hasn’t happened.

22:55 - Once something happens, you know something. Often times it’s bad. (everyone laughs) That constrains things a bit. (laughs) - [Student] Well, you’re no longer trying to forecast into event that’s gonna happen. - Well, they’re not trying to forecast if it’s gonna happen; they’re trying to forecast the impact, what would happen, and that’s a slightly different situation. And if you’re doing these iterative forecasts, in some cases it can be, there are not many examples of long-term, systematic, expert-focused forecasting.

23:33 - The drought monitor outlooks are one example where that uses experts regularly, like on a weekly basis, to think about that. It’s really expensive and time-consuming, and large federal agencies, maybe Fish and Wildlife Service (laughs) are the ones that really have the capacity to operationalize those kinds of approaches. - Interest of time, I’m gonna have to nip that in the bud, because I thought we were gonna spend half-an-hour on this; we’ve spent closer to an hour on this. But I’m gonna get you guys started on– Might instead say, well realistically, let’s think about, say, 99% of the time. So, we’ll get rid of that last 1% of truly abysmal cases and think about, well, 99% of the time, what are the odds.

24:26 - Well, 99% of the time your car’s not gonna break down and you’re gonna get stuck walking. But, there’s gonna be some pretty bad weather days and pretty awful rush-hour traffic that you could be stuck in where you’re crawling along. So let’s say 60 minutes. And so we’d say at this 99%– - [Melissa] 60 minutes seems low for a 99th percentile. I mean I would think maybe more like hour-and-a-half, two hours if you ran into, say you had an accident that shut down all lanes of traffic driving towards Logan. - Two hours? It was slow. (everyone laughs) - [Student] Yeah, it was an hour-and-a-half for me.

25:16 - - Yeah, hopping on the T, and it breaks down. I’ve tried to get across Boston where you walk out here and a Sox game just let out. Or you’re trying to get out and, yeah. Yeah, it could be bad. So one of the things you might do during elicitation is say, what are these extremes, but then you might then ask yourself, okay, what could cause something to be worse than that extreme. So it’s useful to have checks, to say, okay, that thing that we wrote down, what could cause this to be worse, to be skeptical of where you start, to revisit the points you’ve put down, and think, what am I missing? I’m not just talking about a bad day, I’m talking about a really bad day. So the process that we would go through would be to march inwards, so we might go from 99 to say, 95%.

26:10 - Say, one in 20, that’s a pretty everyday bad day. (laughs) That happens once in awhile. - So one of the things I’ve found useful in doing these kinds of things is when you work with experts, especially with this kind of example of the airport, is, you saw me go straight to an hour-and-a-half because that’s the worst experience I personally have had. And usually, a lot of times, experts think of, I mean they’re experts, so they’re thinking of what they’re familiar with, and so, you might have something like an hour-and-a-half, and Mike’s absolutely right, you basically then push them to talk about the extremes. Tens of hours (laughs), probably your true extreme bounds if you were creating brackets, so maybe not a realistic bounds, but a true extreme bounds, because I think most people would switch modes of transportation before it got to that. But when I do this, usually what you do is, you might start here, but instead of stepping them through this, I actually do this.

27:31 - - Yeah, I was going to switch the other extreme after that interval. - Yeah, switch to the other one. And the reason is, is that then, you force them to think about it in a different way. So you’re flipping it, so that then they don’t end up focusing too much on one part of the distribution without thinking about what’s the best case scenario? And that then, when you ideally get to the middle, it really does match what their expectation would be. - So 95%, and I could easily see that being somewhere in the hour-and-a-half to hour-and-15-minute range. So maybe we’ll say that’s maybe 80 minutes.

28:19 - And then like Melissa said, we might now switch here, because here, this one’s sitting at 0%. And we might say, choose, say, 5% fastest time. We don’t actually have our Apache Attack helicopter anymore. - Melissa We’re just a crazy driver. - We’re on the road, we’re driving, but it’s two in the morning, there’s no traffic, and you are not regarding laws particularly well. (laughs) - [Melissa] 15 minutes? - Yeah, I mean you could make it in 15 minutes, easy. You could probably make it in a little bit faster. - [Melissa] Okay, you’re more of a risk-seeker than I am. (laughs) - Yeah, so maybe that takes me, let’s say maybe 12 minutes. And I start connecting the dots. So, if I do there I might jump up, and in the interest of doing this, getting this done, I might jump up to, say, 25%.

29:19 - That’s somebody not thinking about the interquartile range, that’s a pretty typical day. I’m gonna hit some traffic, so I might then say, well, along the road, if I’m driving, it’s more like six, six-and-a-half miles, on a typical day I might be doing, on typical but better than average day, I might be averaging about 40 miles an hour. And what does that work out to, 20 minutes? Something like that, so maybe that’s a 20 minute day, and then I might come back to the other extreme and say, well, what’s the other side of that interquartile range, it’s worse than average. It’s a typical Boston rush hour, I’m going 20 miles an hour. So, four minutes per mile, five miles, six miles, seven miles, whatever.

30:16 - Not doing good math off the top of my head, it’s late in the day. - [Student] It’s 50 minutes. - Yeah, so maybe 40 minutes, and then, I guess one of the really important things here is, we’re estimating that central tendency last. We’re not estimating it first, so if anchoring is a problem, that idea of starting with what you think is the most normal thing, and then moving away from that, we’ve done the opposite, we’ve come from the extremes back in, so the last thing we estimate is a mean. You know, so maybe, maybe that’s 30 minutes. And so now we’ve got a set of data points that describe a cumulative distribution function that now corresponds to our before collecting data just using our personal experience, what we think the probability distribution describing getting to the airport would look like.

31:21 - Like I said we convert this to a PDF, and if I’m using this for some sort of formal Bayesian inference, one of the things I will then realize is that, all of our Bayesian tools like named distributions. So what I might then do is start thinking about, well, what’s the candidate set of probability distributions that conform to being capable of capturing these shapes? Well, I would then say, I definitely wanna deal with zero-bound distributions. I’m not gonna fit a Gaussian to this, I’m gonna wanna deal with something like a Gamma or a Log-normal or a Weibull. Something that’s a zero-bound distribution. I might have a set of four or five, six, probably, distributions that have this nature of being zero-bound.

32:09 - I haven’t done this formally, but I can tell you if I sketch this out, you’re gonna end up with a skewed distribution that’s zero-bound. You can just see it, because this tail was much longer than that tail. And so, I have points here, and I might fit all five or six possible distributions to those points, and see which does best. And I might then end up translating this into something like a Gamma with specific parameters, that then I can use as a prior when I’m constructing my model. - And just to sort of think about this, so this is basically Mike’s expert PDF, or the amount of time he thinks it would take to get to Logan.

33:01 - If I were looking at my central tendency, given the embarrassing amount of times that I’ve traveled to Boston University and not do might (laughs), my average travel time is about 50 minutes because I usually take the TN because I’m coming during rush hour. And so part of it is expert-dependent. What you will likely get is, for a question like this, the distributional form is likely to be the same, but you might have slightly different central tendencies, or how that distributional form is constructed. But if you’re having similarity, that’s where the potentially a good case of doing an expert aggregation. So if we ask five different people to and constructed their PDFs for this, and then aggregated, we’re more likely to get a distribution that is the true, broader range, and something that is probably more likely to the central tendency of that average condition. - Come back to a couple of my experiences in working with modelers is I’ve found that things like this are remarkably common when I go through the process of talking to them about their parameters.

34:27 - There’s lots of ecological models where the parameters we deal with and the processes we deal with are zero-bound and skewed. And when you then look and say, well, why is the whole literature we’re using using symmetric Gaussian priors? And that also that the variances are wide. If you went to the literature right now and said, what would the statistician have done? A statistician would have done normal, mean, zero, variance of 10 to the sixth. (laughs) And you’re like, really? (laughs) There’s a 50% probability that it’s negative one million minutes (laughs) to get to the airport? This is, the absurdity of some of the default priors we use for when you actually start thinking about a problem, you realize that you really do have a lot more constraint on problems than the default priors that you slap on if you’re not stopping to think. That said, given enough data you will always overwhelm your priors, but it can be really handy to spend just a, even if it’s just five minutes, kind of internally eliciting what you think about a problem can often give you much more reasonable constraints on what parameter of values would you believe if you got back.

35:53 - So if you run a statistical model, and then it comes back and you get an estimate of a slope or an intercept and you go, “That’s impossible!” And the question would be, then why did you assign it prior probability, if you a priori didn’t believe that result was possible, why did you give it non-zero probability? (laughs) And that can help constrain models, and if we come back to thinking about forecasting, and the types of forecasting problems we’re gonna face, one of the things I said yesterday is that I do expect there are gonna be certain types of forecasting problems that will be chronically data-limited. Things like emerging infectious disease, invasive species. Problems where there’s no way we could have enough information yet, and those are ones where elicitation’s gonna be really essential to producing that first-order initial forecast that we can then update as data does become available. - Yeah, so when you think about this, there’s sort of the two things that we’ve talked about. So, one is, how can I as an expert in my own right, or with my team of modelers, really think about this in a rigorous way, both to add information a priori that helps us to build better models, but also, so that when you give output, you understand when it might be too narrowly constrained because the way that the model is constructed is only looking at a piece of the larger system that you care about, so that then, as you think about it, and you think about that output, you can contextualize what those results mean, because if you guys aren’t doing it, there’s people that are going to be using those, and potentially using them incorrectly, because even if you’re doing a full uncertainty characterization, it’s looking at a very narrow piece of the larger system that’s being informed for a decision.

37:54 - You have to contextualize those model results, and this kind of process can help you to have some of the language to put that into context. And in some cases, when you are in data-limited situations, bringing in someone who has rigorously done expert elicitation, so you can have some of the data to rigorously parametrize your model and predict some of those phenomenons, has been proven time and time again as an effective way of being able to sort of fill in the gaps until we can collect the data, and in some cases, we will never be able to collect the data because it’s unethical. So, in some cases, there’s literally no way we will be able to do some of the things that we want to be able to predict unless we use elicitation. - And to pick up my last thoughts on this idea of contextualization, having pushed process modelers through elicitation on their own models, I’ve found, on more than one occasion, the discovery of bugs in the model. I’ve seen them have to go look up the units, on the parameters.

39:06 - I’ve seen them have to dive into the code and make sure they understood how the parameter was used, and then come to the realization that there’s not, that the way they wrote down things didn’t actually make sense to them once they stopped to try to put constraints on the problem. I think I’ve also seen that that process of going through and eliciting what they believe to be plausible parameters a priori really made them stop and think about the process. So even if the prior gets overwhelmed, you’ve stopped and understood the process better, you understand your model better. This isn’t always terribly critical with a simple linear model, but anytime you’re dealing with a non-linear model, a process-based model, a mechanistic model, stopping to do that elicitation actually will give you, will make you stop and really understand the model you’re working with, which has actually been a really valuable exercise. .