We all saw the headlines in the lead up to the election: “Party X is up two percent!” Invariably, stashed away as an afterthought was “margin of error is 3.1 percent”.
What this means in the context of a two percent shift is that there is no statistically significant change. Problem is, margins of error don’t make for good headlines. Hardly clickbait: “No change in polls!”
At DOT, we love data of all shapes and sizes, and we’re troubled when the margin of error is treated like that sad, lonely uncle at a family gathering, just quietly ignored out the back. So why is it so important? It comes down to our attempts to understand uncertainty. Polls are a sample of behaviour. We get a much better indication from a census, much like a general election, where we ask everyone, but that is time-consuming and costly. As a consequence, we take samples. Even with best intentions, we are going to find natural variation from sample to sample. Instead, we use techniques that allow us to calculate what that natural variation is. This allows us to work out if there really is a change between polls.
This is where it gets exciting. To help us understand the variation, we try to explain as much about that variation with the data we have available. A statistical model tells us what we expect to happen, on average. The model also explains as much of the variation in the data as possible. Here, think of variation as plausible scenarios. Obviously what we expect to happen doesn’t always pan out – we can end up with unexpected scenarios.
But, like a jigsaw, if you put enough of the pieces together you can see the picture. Or, more importantly, the departure from the picture. This is one area that people consuming statistical models often struggle with. It’s not just about celebrating when a model prediction gets it right, but also delving into the model to understand why the projection was wrong. This is equally valuable. More importantly, this step helps build understanding and trust about the data and modelling process.
The deviation from the average in our data allows us to paint a richer picture. Think of the streams of data we are all exposed to in our daily lives that we look to make decisions with. We are immersed in a world of uncertainty and risk:
• Who is going to win this weekend?
• How much will that house be worth next year?
• Which TV shall I buy?
• Which customers are going to be interested in my new business?
Dr Paul Bracewell.
What we love about stats is the problem solving that goes into trying to understand how different elements combine to explain what is likely to happen. For example: on average, what is an extra bedroom going to do for the valuation on that property? This richer perspective is derived from the modelling process. Data and analytical techniques are used to derive insight. But, insights are only part of the process.
In my last article, I spoke about action from insight. Part of the requirement for action is trusting the model predictions. But, when it comes to polls, quantitative evidence isn’t used to explain the drivers of the difference from poll to poll. Part of this process is also ensuring that samples are representative of the population of interest. The challenge in driving action from insights can fall down if there is no trust. And a key component for building trust involves communicating results and including a discussion about variation. Let’s drag the margin of error away from the corner.
1. It’s not about the average, it’s about understanding what drives the variation that is of real interest when we are looking to act on data driven insight. Use the margin of error to understand if there really is a difference.
2. Understanding variation is like putting together a jigsaw, except there are no edge pieces. You may need to keep trying different variables and in different order to find a combination that works for you. This process should align with what you are trying to achieve as a business.
3. Building trust in predictive models requires explanations of why the model was both right, and not so right. This comes from interpreting the underlying elements in the model.
Need some help to make your data-driven dreams a reality? Get in touch: [email protected].