Categories

# Correlation and Causation

Correlation suggests that there is a shared relationship between two variables. A scatterplot is used to draw a pattern and describe the relationship between the variables. The variables are often denoted by x and y. In a symmetrical relationship, if one variable (x) is correlated with the other (y), y is correlated with x. Correlation only denotes the association between variables and is often used to reduce uncertainty (Connelly, 2012). Additionally, it can be used to determine the value of one variable and measure the degree of relationship between the variables.

There exist several types of correlation coefficients. They include Kendall, Pearson, and Spearman. Pearson correlation coefficient is commonly used in research studies. The correlation coefficient, which is denoted by the symbol r, is used to measure correlation. A numerical range between +1 to – 1 is used to indicate the direction and strength of the relationship. A negative value denotes a negative relationship (as x increases, y decreases), while a positive value indicates a positive relationship (as x increases, y increases). A value of 0 suggests that there is no association between the variables. It is important to note that a relationship between variables does not signify an underlying cause (Sharma, 2005). One demerit of correlation is it is only restricted to two variables and has no control over a third variable.

On the other hand, causation means one event causes the other to occur. Causation is deemed to occur if variation in the dependent variable succeeds variation of the independent variable (Brown, 2018). Causation is inferred from a well researched experimental design but not from individual tests. A good example is the use of randomization in a controlled experiment. The randomized treatment is offered to one of the variables. The distinction between causation and correlation lies in distribution and relationship. Causation cannot be defined using distribution alone, while correlation can be defined by observing the joint distribution of variables. Examples of causal concepts include confounding, randomization, exogeneity, and disturbance. Correlation concepts include regression, dependence, and likelihood.

## Significance Level

The first step when conducting research is by creating a null and alternative hypothesis. It is important to note that the main objective is not to prove a statement but to disprove a hypothesis. Hence, the need to formulate a question that will attempt to disprove the statement. Since there is no complete assurance of the disprove hypothesis, the researcher must determine the significance level, denoted by the letter alpha. Since the researcher aims to be sure about the outcome, a decimal expression is assigned to determine the level of uncertainty. A significance level of 0.05% means there is a 5% chance that the outcome will be incorrect.

If the p-value is less than the significance level, then the study is deemed to be statistically significant. For instance, a p-value of 0.03 suggests that there is a 3% probability of the null hypothesis being true. A 0.03 p-value, which is less than the alpha of 0.05, implies that the null hypothesis will be rejected. On the other hand, a p-value that is equal to or greater than the alpha is deemed to be not statistically significant.

P-value is calculated using t-test or z- test statistic. The null hypothesis is used to determine the sampling distribution of the test static. Different formulae are assigned for an upper tailed test, lower-tailed test and two-tailed tests.

## Applications

Correlation can be used in policy suggestions when one approach is used to alter the value of the independent variable so that it changes and affects the other variable. Examples of a causal relationship include medicine that causes improvement in health or studying that affects test scores (Sharma, 2005).

The use of causation helps predict the future by underscoring the variables that can cause an effect on others. The accuracy achieved by presenting and predicting the outcome is commendable, especially when combined with existing knowledge. For instance, causation has been actively used in statistics, machine learning and economics to make informed decisions. Algorithms make use of existing knowledge to evaluate the utility of causes and predict an outcome. Other practical applications where causality is used include identifying risk factors for diseases and identifying sentimental changes in e-commerce, just to name a few.

## References

Sharma, A. K. (2005). Text book of correlations and regression. Discovery Publishing House.

Brown, H. (2018). Correlations versus causation. In The Economics of Public Health (pp. 41-55). Palgrave Pivot, Cham.

Connelly, L. M. (2012). Correlations. Medsurg Nursing, 21(3), 171.

## By Hanna Robinson

Hanna has won numerous writing awards. She specializes in academic writing, copywriting, business plans and resumes. After graduating from the Comosun College's journalism program, she went on to work at community newspapers throughout Atlantic Canada, before embarking on her freelancing journey.