A recent paper in AMPPS points out that many textbooks for introduction to psychology courses incorrectly explain p-values. There are dozens, if not hundreds, of papers that point out problems in how people understand p-values. If we don’t do anything about it, there will be dozens of articles like this in the next decades as well. So let’s do something about it.
When I made my first MOOC three years ago I spent some time thinking about how to explain what a p-value is clearly (you can see my video here). Some years later I realized that if you want to prevent misunderstandings of p-values, you should also explicitly train people about what p-values are not. Now, I think that training away misconceptions is just as important as explaining the correct interpretation of a p-value. Based on a blog post I made a new assignment for my MOOC. In the last year Arianne Herrera-Bennett (@ariannechb) performed an A/B test in my MOOC ‘Improving Your Statistical Inferences’. Half of the learners received this new assignment, explicitly aimed at training away misconceptions. The results are in her PhD thesis that she will defend on the 27th of September, 2019, but one of the main conclusions in the study is that it is possible to substantially reduce common misconceptions about p-values by educating people about them. This is a hopeful message.
I tried to keep the assignment as short as possible, and therefore it is 20 pages. Let that sink in for a moment. How much space does education about p-values take up in your study material? How much space would you need to prevent misunderstandings? And how often would you need to repeat the same material across the years? If we honestly believe misunderstanding of p-values are a problem, then why don’t we educate people well enough to prevent misunderstandings? The fact that people do not understand p-values is not their mistake – it is ours.
In my own MOOC I needed 7 pages to explain what p-value distributions look like, how they are a function of power, why p-values are uniformly distributed when the null is true, and what Lindley’s paradox is. But when I tried to clearly explain common misconceptions, I needed a lot more words. Before you want to blame that poor p-value, let me tell you that I strongly believe the problem of misconceptions is not limited to p-values: Probability is just not intuitive. It might always take more time to explain ways you can misunderstand something, than to teach the correct way to understand something.
In a recent pre-print I wrote on p-values, I reflect on the bad job we have been doing at teaching others about p-values. I write:
If anyone seriously believes the misunderstanding of p-values lies at the heart of reproducibility issues in science, why are we not investing more effort to make sure misunderstandings of p-values are resolved before young scholars perform their first research project? Although I am sympathetic to statisticians who think all the information researchers need to educate themselves on this topic is already available, as an experimental psychologist who works at a Human-Technology Interaction department this reminds me too much of the engineer who argues all the information to understand the copy machine is available in the user manual. In essence, the problems we have with how p-values are used is a human factors problem (Tryon, 2001). The challenge is to get researchers to improve the way they work.
Looking at the deluge of papers published in the last half century that point out how researchers have consistently misunderstood p-values, I am left to wonder: Where is the innovative coordinated effort to create world class educational materials that can freely be used in statistical training to prevent such misunderstandings? It is nowadays relatively straightforward to create online apps where people can simulate studies and see the behavior of p-values across studies, which can easily be combined with exercises that fit the knowledge level of bachelor and master students. The second point I want to make in this article is that a dedicated attempt to develop evidence based educational material in a cross-disciplinary team of statisticians, educational scientists, cognitive psychologists, and designers seems worth the effort if we really believe young scholars should understand p-values. I do not think that the effort statisticians have made to complain about p-values is matched with a similar effort to improve the way researchers use p-values and hypothesis tests. We really have not tried hard enough.
So how about we get serious about solving this problem? Let’s get together and make a dent in this decade old problem. Let’s try hard enough.
A good place to start might be to take stock of good ways to educate people about p-values that already exist, and then all together see how we can improve them.
I have uploaded my lecture about p-values to YouTube, and my assignment to train away misconceptions is available online as a Google Doc (the answers and feedback is here).
This is just my current approach to teaching p-values. I am sure there are many other approaches (and it might turn out that watching several videos, each explaining p-values in slightly different ways, is an even better way to educate people than having only one video). If anyone wants to improve this material (or replace it by better material) I am willing to open up my online MOOC for anyone who wants to do an A/B test of any good idea, so you can collect data from hundreds of students each year. I’m more than happy to collect best practices in p-value education – if you have anything you think (or have empirically shown) works well, send it my way – and make it openly available. Educators, pedagogists, statisticians, cognitive psychologists, software engineers, and designers interested in improving educational materials should find a place to come together. I know there are organizations that exist to improve statistics education (but have no good information about what they do, or which one would be best to join given my goals), and if you work for such an organization and are interested in taking p-value education to the next level, I’m more than happy to spread this message in my network and work with you.
If we really consider the misinterpretation of p-values to be one of the more serious problems underlying the lack of replicability of scientific findings, we need to seriously reflect on whether we have done enough to prevent misunderstandings. Treating it as a human factors problem might illuminate ways in which statistics education and statistical software can be improved. Let’s beat swords into ploughshares, and turn papers complaining about how people misunderstand p-values into papers that examine how we can improve education about p-values.