Home » Posts » Data and Chance

# What can we really tell from a graph?

A break from the world cup posts today, because I’ve just come across an amusing website.

Correlation – or the lack of it – it one of the most interesting things we can find from statistics.  For years, whether smoking causes cancer was ‘controversial’ but when the results of cancer cases versus smoking were plotted a direct correlation was found.

OK, so there is a small lag in time, but that is explainable..

And that is the point about correlation on a graph. A pattern we can see only means two things are ‘correlated’ if we can explain why that connection might exist.

Its an important lesson in how Statistics must be handled!

And that brings me back to the amusing website, which gives a few comical examples where it LOOKS like there is a connection….   but can there be, really?

Here is an example!

And here is a link to some more!

http://www.tylervigen.com/spurious-correlations

There is a certain kind of question on Maths GCSE papers that confuse some students – What’s this question doing on a MATHS EXAM???

On the other hand, for the student nurses and midwives I taught to GCSE a few years ago, these were their favourite questions.

The reasons were the same – What’s this got to do with Maths!

Let’s look an example, taken from exam papers, so we can see what I mean.

So this is a question about language, right?  Yeah, its  also a Maths question; about how we can get meaningful data.

Can you see what’s wrong with the first survey?

Well consider you have my answer to the survey.  “I bought 1 CD last week for £7. Last year I spent £50”
So which answer do you take?  And how do you record £50?

The second problem is easier to spot.  £50 could go into £30-£50 OR £50-£70

First Problem Overlapping categories

We could make this answer better by noting that a spend less than £10 is not covered.

The second problem is the questions give no idea on time scale. Is Paula asking what I spent last week, or last year?

Second Problem Question is not time specific.

Although the question doesn’t ask for it, we can give an improved question here.

The survey is now clear on what it is looking for and what data we will get.  Do you think this will give an accurate picture of sales through the year?  OK maybe not. We could ask about the whole of 2017?  Why might that not work?

Now we have a good question, but does that mean we will get a good survey? The next part shows us a possible extra potential problem.  Who will we ask?  Can you think of a problem with only asking people in a CD shop? How will that skew our results?