The Nature of Correlation – Part II

Since we now understand some basics about correlation I think it would be interesting to try and understand “How do we initially hypothesize a connection between variables?”  If we analyze something simple like Newton’s second law which says that the force exerted by an object is directly related to its mass and acceleration.  How might such a relationship have been devised initially?  Well this is something that can be easily observed.  You push an object like a

From http://weebly.com

cart and it accelerates.  If you push it harder, it accelerates faster.  If the cart is full of objects it becomes heavier and requires more force to get the same acceleration.  Such observations are the basis for an experiment which can show us the nature of the relationship.  Our experiment might end up only being mostly successful as we might be confused about why the force might change depending on the surface across which we move the object.  Until we understand friction we might not be fully aware of the other forces that are working against us when we make our measurements.  Newton was able to also explain the frictional force as well, but the fact that the Earth is turning on its axis revealed Newton’s second law to be incomplete for Earth and only true in a reference frame that is not in motion.   Newton’s description of force came first and it was only through further experimentation and testing that we came to understand the limitations associated with his relationship between force, mass and acceleration.  So the basis for a correlation comes through a lot of trial and error after some initial observations.  Proper application of the scientific method (adequate sampling in particular) along with result repeatability can demonstrate the correlation between two variables.

Finally it’s important to understand the complexity of relationships that exist.  Just like the force is dependent on the mass and acceleration, most things in this world are not as simple as one cause and one effect.  Most things are complex systems in that one variable may change as a result of several other variables.   The global average surface temperature is a function of the amount of solar radiation hitting the Earth, the concentration and location of greenhouse gases in the atmosphere, the amount of geothermal energy released at the surface, and the amount of energy released through radioactivity.  The last two tend to be fairly small.  Solar radiation is the most important factor then followed by the greenhouse effect.  So if we want to look at how CO2 varies with global temperature, we are never going to get a perfect correlation, but we are going to see a correlation.  And if we understand the role the sun plays in heating, we can easily delineate between which part of the heating is due to changes in solar radiation, and which part of the heating is due to changes in greenhouse gases.  Just like with our example of force, we can determine whether that force is because of a light object traveling at a fast speed or whether it is a heavy object traveling at a low speed.  Our knowledge of the relationship allows us to make that determination.

In social sciences the variables impacting a system can be numerous.  As I’ve argued before gun control is an extremely complex issue.  The number of gun deaths is dependent on the types of laws we have, the number of guns available, quality of mental health care, attitude towards mental health care, income inequality, education (both general education and education about use of guns), the role of the media and politicians fear mongering, culture attitude towards violence and death, and probably more than that.  With these types of issues it’s easy to point to all the other things that could be causes to try to show that changing one variable isn’t going to have an impact.  But neither side is completely valid here, because the argument should really be about the factors that are more important and which ones are less important.   Just because one variable is more important however doesn’t eliminate all other variables from having an influence.  Just like coalition governments where the dominant party can lose power if the other parties combined outweigh them in votes, a dominant variable in a complex system may be outweighed by the combined importance of the other variables.  For instance when it comes to the greenhouse effect, carbon dioxide is not the only important greenhouse gas.  Numerous greenhouse gasses like CFC’s, nitrous oxide, ozone, and methane are released as pollutants and if industrialization continues at the pace it is at, the combined impact of those other gasses may become similar to the impact of CO2, even though individually those gases have a very small effect.  It is important to understand all the variables that are involved and address them, especially when harm is being caused to people, because even a small variable that we can fix might reduce that harm.

As the complexity of a system increases the direct correlations between one variable and another generally decrease.  A correlation of 0.2 might be significant if there are many variables all impacting the state of a system, especially if all those variables might be of similar importance.  People like to keep relationships simple, but by doing so fail to solve problems that are usually far more complex.  This is also why complex systems are some of the easiest for those who don’t really understand it to mislead others.  Climate change is a great example.  A change to the climate system depends on many factors and thus makes it easy for someone to try and emphasize one part to make their argument.  Like the oft used “Carbon dioxide is necessary for plants to grow, how can more be a bad thing?” This ignores the role carbon dioxide plays in the greenhouse effect, ocean acidification, and what happens when plants decompose.

Happiness is often brought about by the simple things in life, but it is also important to remember that there are lots of things happening out there that aren’t so simple.  We are a part of a complex universe.  Part of why we continue to survive better is that we continue to breakdown the complexities of the university into things that we can understand.  Also remember that just because things are complex doesn’t mean that there aren’t those who truly understand the problem and that with patience and effort you can too if you choose.

The Nature of Correlations – Part I

 

The fact that the universe is complex shouldn’t surprise anyone yet one of the things I find surprising how vehemently people try to argue that it’s not. I have already written a blog post about correlation vs. causation, but I’d like to talk about something a little broader than that. Correlations try to demonstrate the relationship between one variable and another. It would be a sensible decision to try and correlate the amount of gun ownership with gun deaths. I don’t think anybody would argue that it might be a relationship. And even if we do find a correlation what does it mean? What is a good correlation, or a bad correlation?

So what can we expect out of a correlation? First we might ask, “Can any two things to ever be perfectly correlated, even if we have a mathematical relationship?” The answer is no. Quite simply for data to be perfectly correlated it would require perfect measurements. Since every measurement always has some error there will always be a less than perfect correlation. The more difficult the measurement the less likely we are to get a good correlation at all. In social sciences measurements may depend on a survey, qualitative and/or subjective observations, and complex sampling techniques. This will all impact the correlation we calculate. Correlations range from 0 (no correlation) to 1 (perfect correlation) and it depends on the quality of the data and the nature of the problem to really determine what correlation is high enough for us to be positive of a relationship, but generally anything about 0.5 is significant. Correlations can also be negative as well. A negative correlation actually implies that as one variable increases, the other decreases, and those are important relationships as well.

From http://www.venganza.org

Now, what happens when we try to correlate things that have no relationship to each other? It turns out we can get all sorts of results. The often cited graphic to the right is often used to show the difference between correlation and causation.  Two variables may appear to be correlated but have no relationship at all even if the results are repeatable.  Correlations turn out to be even a little trickier than that. Some types of data show natural variability that is higher and lower than the average. A couple of examples would be the amount of blood sugar or atmospheric pressure. Now let’s say we correlated those two things. Let’s say I have two years of daily air pressure values. And let’s say I have sampled the amount of blood sugar in my blood every for 30 straight days. The smart thing to do is select the atmospheric pressure values that correspond to the days that I took the sample and correlate them. But what might I find. Well it could be that in those 30 days, a slow moving high pressure system moved in as we approached the end of the period meaning that in general pressure increased. And then perhaps my diet was not so good, or I ate infrequently, irregularly, or had a few big meals one day and then light snacking the next, blood sugar will fluctuate. I might find that there is a negative correlation between pressure and blood sugar. But what would happen if I did the experiment for another 30 day period. Depending on the specific conditions I might find a positive correlation. But what I’d find, that over time, after doing many of these repeated tests is that I would have no correlation. The average of all the correlation calculations I perform will be essentially zero. This speaks to the importance of having a larger sample size and whether or not results are repeatable for the same experiment. Had I taken two years worth of pressure data and 2 years worth of blood sugar data, I would have found zero correlation. Furthermore even if I did find some correlation, my results would not be repeatable over many studies. The media often exaggerates findings from experiments that try to say link certain foods or cell phones to cancer. Variables that naturally fluctuate may randomly show a correlation, but further experiments reveal no correlation. But the media usually just picks up on the one study that found a correlation.

To further illustrate this point, take a look at the following two images.  The one on the left is a large sample of raw data trying to correlate to variables.  If these two variables were correlated we would see the data points approximate a line.  Maybe a straight one, maybe a curved one, but definitely some sort of trend would be apparent.  You are right if you think that whatever these two variables are they have no relationship two each other.  Next to it is the same data set except that some of the data points are missing.  Now it looks like there might be a correlation between these two variables.  Sure I have selectively chosen in this case which ones to eliminate, but it is possible that a random selection of some of the data points could be these same data points on the left.  Thus it is possible that we might think we’ve found a correlation when we didn’t.  Once again the more data we take and the more we try to repeat the experiment we would show no correlation.

LDrXq LDrXq2

In the next part we will investigate how we hypothesize about correlations, and how complex most relationships are.

Correlation vs. Causation

I decided to write a response to one of the many excellent posts written by a fellow blogger.  It became long enough and I thought a worthy enough to be a blog post of it’s own!  If you are interested in the idea of correlation vs. causation you can read his blog here first.

In your last paragraph I was reminded of Dawkins’ argument in the God Delusion when he is talking about miracles.  Since miracles are by definition unique and rare events there is no way to really disprove a divine explanation.  This is of course if the same thing doesn’t keep happening again and again, which if it does, you really don’t have a miracle on your hands anymore.  He uses the example of the one documented miracle in Catholicism in which some 100,000 witness near Fatima, Portugal reported the Sun doing some odd things including zigzagging towards them and crashing to the Earth.  Dawkins argues that in looking for a natural explanation for the event, all of them, including the possibility that all 100,000 people are lying are actually more probable than the laws of physics being thwarted for a group of people in one part of the world (no other people reported seeing anything other than those at the event).  So I think that you are very correct that we the “correlation does not mean causation” argument does not negate a particular postulation for why a correlation exists.   However I would go a step further and say that it is not even an argument in of itself.

It is of course the responsibility of anybody who poses a correlation to provide a reason why such a correlation exists.  Provided you have done that, then the “correlation does not mean causation” response isn’t a logical argument in response to yours.  The person on the other side of the debate must either address why your reasons why are not valid, or must present something else that correlates better and why their reason for x causing y is more probable.  So I think you might be giving a little too much weight to the argument in how much it actually negates a correlation between two variables.

In many areas in science we can say why pretty easily because there are usually physical laws that explain why quite easily, and those things are testable and repeatable.  In social science this may be harder to do.  Especially since it is not always clear what all the variables are.  For instance it is clear that there is a positive correlation between gun deaths (accidental, homicide, and suicides) the amount of guns per capita in a population.  There are plenty of psychological factors of course to consider here on why would a person own a gun or why would someone choose to kill themselves?  There are practical questions like how to we get people to be more responsible about locking up their guns so their kid doesn’t pick it up, how to we make sure that more people remember to store their guns unloaded, how can make guns safer from accidental misfires, and how can we make sure that people who buy again are well trained in how to use it? There are likely even bigger questions like how does income disparity lead to increased crime in general? What are other ways that don’t involve firearms where people can be made safe?  All of these and plenty more are likely part and parcel of explaining gun violence, but that doesn’t change the fact that reducing access to guns would result a lowering of the number of gun deaths.  So making some laws that create a national gun registry, that do better background checks, and limit the type of weapon the general public could buy, would make some sense even though it clearly won’t eliminate gun deaths completely.  If by a counter-argument someone says “correlation does not mean causation” they haven’t actually addressed the argument being made.  They actually have to find an example with all other variables relatively constant between the U.S. and that country, except gun control laws, and show that an opposite correlation exists. i.e. Restrictive gun control laws and increased gun deaths, or high gun ownership and low gun deaths.  And that would be for a country with similar economies, democratic, with a high standard of living, and that doesn’t have mandatory military service in which the high amount of gun ownership isn’t because they keep their piece given to them in the military (Switzerland the example always used here).

So in the classic humorous example that has been around for awhile is that graph between global temperature and the number of pirates.  I can’t just show that graph and say see…look how the number of pirates is impacting global temperature?  I actually have to provide a reason why pirates might impact temperatures.  I can say there is less plundering and razing of towns so the urban heat island effect has increased thus raising global

From Wikipedia.org

temperatures.  Obviously this is a silly argument, but a response of “correlation and causation are different”, while a true statement, does not negate my assertion.  There are many ways to disprove my assertion but pointing out a correlation is not causation does not. Because the truth is, “correlation does not always mean causation” so one has to go past this statement to further argue one’s point.  This is true for many arguments that contain logical fallacies.  You could take the classic argument used against gay marriage.  Well if we let gays marry, pretty soon we’ll have to let people marry their pets.  Well this is of course the slippery slope logical fallacy.  Slippery slope arguments may not be incorrect, but are very often wrong.  So it’s not enough for me to counter your slippery slope argument with “Hey that’s a slippery slope argument”.  I would be quite wrong to think the argument was done, because they could actually be right.  Some events do lead to a chain of events that are far from where things started.  To win the argument I would actually need to argue that there has never been a push for legislation to marry a pet, that if anybody has tried this they were a crazy person, that this is not a psychological drive of human beings as a species, etc.  I could also point to many other marriage related laws or other laws that have not led to a hyperbolic slippery slope situations.

To say that “correlation and causation are not necessarily the same thing” is actually a Straw Man argument (which is fallacious) because the argument assumes a position that you have not taken in the argument.  Correlating variables is a valid method for discovering relationships, and by presenting that correlation, one’s assertion is not that correlation is a valid method, but rather that two variables are related to each other.  And to say two things are correlated doesn’t imply that this is the only important variable, or that even it is the primary or secondary cause of a particular event.  One has simply said there is a relationship and a counter argument must challenge the relationship.  A correlation must be presented along with some sound reasons why there is a correlation, and an argument in response must challenge those reasons.  The art of argumentation isn’t easy and few people can actually argue well. 🙂