We recommend removing subjectivity and using blind procedures as ways of combatting concerns about observer bias. Some texts recommend using multiple observers and demonstrating high degree of agreement between observers as another approach. We agree, but only if you can then argue that if the observers are biased there is no reason to think that they will be biased in the same way. Imagine Graeme and Nick both sat on opposite sides of the same lecture theatre and graded a bunch of student presentations of their research projects in terms of consideration given to experimental design issues. If Graeme has biases in how he scores (perhaps he is particularly obsessed with blind procedures but pays little heed to sample size issues), the danger is that (because Nick and Graeme have been talking to each other about experimental design for 20 years) in fact they do have similar biases. So they might agree on the scoring they give but both fail to properly reward students who think carefully about sample size. If you replaced Graeme with someone else keen on experimental design that Nick had never met or read anything by, then good agreement between Nick and this new scorer would be more suggestive that observer bias was less of an issue. This is a case where blinding would not really help us to avoid concerns about bias, but we would seek to increase the objectivity of scoring by agreeing clear marking criteria before any of the talks are given.

Further reading

The 2012 open access paper ‘Observer bias in randomised clinical trials with binary outcomes: systematic review of trials with both blind and non-blind outcome assessors’ (Hrojartsoon et al. British Medical Journal, 344, e1119) is a thorough review paper that demonstrates that clinical trials where assessors are not blinded tend to find much stronger differences between treatment groups than those studies that do use blind assessors. Of course, we remember that correlation is not evidence of causation, and it would be worth considering whether the two types of trial are systematically different in other ways. For example, it might be the case that study designs that include blinding are also more careful to randomize patients between groups than those that do not.

Back to top