Bland-Altman

Mar 9, 2021 09:00 · 2232 words · 11 minute read

The Bland Altman plot and its analysis is used to compare two measurements of the same variable. In a nutshell, it is a method comparison technique that is suitable for the studies to compare a new piece of measurement or equipment that is typically cheaper, faster, safer, or smaller with the so-called gold standard or reference measurement that may or may not provide the true value. The technique has been developed in a series of papers by Martin Bland and Douglas Altman since their first paper in 1983.

00:47 - Imagine that you want to compare a newly developed method against a gold standard measuring body temperature in degree Celsius. In this case, when we say “compare” we mean to compare their repeatability, which can only be understood by collecting at least one measurement per method on a same subject or sample and do this for many subjects or samples. In that sense t-test is not appropriate because we’re not interested in the overall difference between the two methods.

01:28 - Okay then, how about correlation? Can we assume that two methods agree with each other when two methods measuring the same outcome correlates well with each other? So here are the fictitious temperature data using the old and new method. So let’s draw a scatterplot and calculate the correlation coefficient using Jamovi. So here are the temperature data. So the first column represents the temperature measurement from the old method and the second column represents the temperature measurement from the new method.

So let’s make a scatterplot. and so the old temperature should go to x-axis and the new temperatures go to y-axis and let’s just add a linear regression line see what it looks like. Okay so this is the scatterplot between the old and new temperatures and looks like there’s quite a strong relationship between the two measurements.

02:53 - But before we do that, let’s actually run the descriptive statistics to see mean and now let’s check…

03:13 - so old and new okay so the mean measurement is, for the old temperature, is 36. 4 and new temperature the mean is 36. 5 so there’s a quite small difference 0. 1 difference and the normality is okay because both variables, the P-value for the Shapiro-Wilk test is greater than alpha 0. 05 so we know that we can run the Pearson’s correlation coefficient and it looks quite strong because these dots are actually aligned quite closely on this straight line and see if the correlation is statistically significant too.

So to do so, we need to run the correlation analysis. So we want to flag significant correlations confidence interval and that’s it.

04:33 - Move them together. So as we expected, the correlation coefficient is really high, Pearson’s r is almost. 98 and this correlation is highly significant. it’s less than 0. 00001 so it is very significant so they are highly correlated.

05:07 - As you can see, the r-value is very high meaning that the two measurements are closely related. However, a high correlation is not sufficient, even though necessary, to say that the two methods agree with each other too for several reasons. First what r measures is the strength of a relation between two variables, not the agreement between them. We will have a perfect agreement only if the points in the figure lie along the what is called an identity line or 1:1 line shown in red broken line here.

In fact, we will have a perfect correlation as long as the points lie along any straight line. For example, we can have a perfect line like this so say like this line and all the dots, let’s just imagine that they are just all on this line.

06:40 - and so now the r should be one because they’re just perfectly correlated. However, are they perfectly agreeing with each other just because they are perfectly correlated? No because if you look at the line for the temperatures lower than 36. 5 degrees Celsius here right so these values right the readings from the new methods are always higher than those from the old method right so if you look at this one that it is actually oops greater than 36 cells degrees Celsius right and this one is two for 30.

  1. 5 it is actually slightly over 36 degrees Celsius and this pattern is reversed for the temperatures higher than 36. 5 degrees Celsius now where the readings from the new method are always lower than the old counterpart and so for here and so the 37 is actually not reaching 37.

08:18 - We can see that even though these green dots are perfectly correlated they do not agree with each other and in addition, as we learned previously, a change in scale of measurement in any or both variables would not affect the correlation but it certainly affects the agreement. Therefore it is quite possible that a set of measurement pairs produces high correlation with poor agreement.

09:01 - Therefore correlation analysis is not a good choice of analysis when it comes to comparing two methods. Instead Bland Altman in 1983 suggested to use a different type of scatterplot where x-axis represents the mean of the measurements from each method and the y goes the difference values between the measurements. So let’s just calculate those differences and average score and do a scatterplot on Jamovi Now it’s easy to calculate the columns of difference and the average using Jamovi using compute function.

So the difference. We’ll just name it the difference and that is simply the difference between old temperature and new temperature. This way, we can calculate the difference easily and the average so we need another compute variable so that’s average and that’s just average between these two. So you just add them up and divide this by two. That way you just calculated the average column. Now if we do the scatterplot between these two. So difference goes to x-axis scatterplot so difference oh no the difference goes to the y-axis then the average goes to the x-axis.

11:11 - This is basically the Bland Altman plot and you do not have the regression line on here. In fact, and as you can see from the slide, the actual Bland Altman plot is much more complicated than this so Jamovi actually has the specific module and for the Bland Altman analysis so let’s just use that module to create the Bland Altman plot. So here the module is so go to Jamovi library and if you just scroll down and here is Bland Altman method comparison.

So just click install. Voila. So now you can use Bland Altman. So click Bland Altman and Bland Altman Analysis. So we do not need to calculate the difference and average this time with Bland Altman module. You just move the method one, method two then it’ll calculate, it’ll generate the statistics and the plot for you. So here the bias. So we’ll just go back to the slide to explain what these are.

13:05 - So Bland Altman stress the need of assessing two aspects of agreement: how well the methods agree on average and how well the measurements agree for individual. If one method reads lower than the other for half of the subjects but higher than the other for the other subjects, then overall, the average discrepancy, the difference between measurements on the same subject may cancel each other out and it will be close to zero even though the discrepancy for individuals being high.

Average agreement (I meant to say difference, sorry!) or the bias so that is the bias. So d represents the difference and bar represents the mean. So it is actually a mean difference, bias, and it is defined as following so d is just the individual difference so the y location of individual dot and then you just add them all up and then divide it by the number of samples then you will have the mean difference which is bias.

14:20 - you can actually run a one sample t-test to find out if there’s any statistically significant bias or not. In fact, this in a black line here, this represents the line of no difference. it’s where the zero is, zero difference. But the mean bias in our case is actually 0. 1, I think it was 0. 107,  which is about like a 0. 11. So that is the mean bias. So there’s kind of a bias of 0. 1 degrees Celsius from no difference and we’d like to know if this is just a kind of random difference or consistent difference so statistically significant difference.

In fact, these two dotted lines and then the blue shaded region around the mean bias, mean difference, that represents the 95% confidence interval of this bias and then as we can see. this no difference line is not included in this 95% confidence interval of bias which means that the bias of 0. 1 degrees celsius is actually statistically significant something that you cannot just ignore. So the Bland Altman plot will tell you how much bias the new method has from the old method or the reference method statistically.

16:12 - What that means is that because the bias has a negative sign so it was a negative 0. 01 at 0. 1 that means the new method consistently underestimates the temperature compared to the old method. Here we’re back to Jamovi and let me show you how to actually run the t-test on this bias and if this bias is actually statistically significant or not. So the Bland Altman and if you just go all the way down to Bland Altman Raw Statistics, then it’ll show you the menu.

17:00 - So just move method one, method two then it’ll run the t statistics for you, the one-sample t statistics. So t is -3. 2 and the P-value is 0. 008. So given the P-value, it is way less than. 05 that means the bias is actually statistically significant. So it actually gives you the decision automatically. So it is the alternative hypothesis that is being supported. So the true bias is not equal to zero. So that means it has a statistically significant bias even though we do not know if this bias, this much bias, which is this, -.

107, is practically something that should be,  something that we should be concerned or not. So that is actually domain-specific knowledge.

18:16 - So the agreement for individuals is summarised in terms of limits of agreement which involve an examination of the variability of the differences. If the distribution of the differences on the y-axis is reasonably normal and provided that the level of discrepancy does not depend on the level of the characteristic being measured, then the 95% limits of agreement can be computed as the mean of the differences plus-minus the 1. 96 so basically two plus-minus two standard deviations of that difference.

and so this 95% limits of agreement quantify the range of values that are expected to cover agreement for most of the measurements therefore guiding the clinician as to whether methods agree enough so that the alternative method can be used in the actual clinical setting. In our example at least all the differences are within this plus-minus two standard deviations range, the limits of agreement, which is a good sign to begin with. Now the actual range of these two boundaries is half a degree Celsius.

19:50 - As we can see from this table, the difference between these two limits of agreement is in fact half a degree Celsius and the question is whether this range of fluctuation in measurement between the two is acceptable or not, which is to be determined by the expert’s experience in the field. Again, it should be noted that how small the limits of agreement should be to conclude that methods agree sufficiently is a clinical decision, not a statistical decision.

Therefore, no null hypothesis significance testing on the agreement with the Bland Altman analysis is provided and it is not possible to provide a formulaic approach where you can statistically classify agreement into good or poor or to provide guidance on which method to use when this agreement is considerable because this is quite domain-specific and will heavily dependent upon the particular purpose and context for which measurements are being developed.

The main question to consider is whether the largest likely differences are small enough for the particular purpose for which measurements are used and the decision criteria need to be set in advance of the analysis if at all possible.

21:38 - oh and the green area represents the 95% confidence interval for the upper limits of agreement and the red band represents the 95% confidence interval for the lower limits of agreement and these LoAs, so the limits of agreement, this is not confidence interval. This is just a 95% limits of agreement because these are literally standard deviation, not standard error of the mean. So that’s pretty much everything about the Bland Altman analysis and for those of you who are interested in finding out more details of the analysis, I hereby include some of the original references.

Thanks for listening. .