If the correlation between two variables x and y is equal to -0.90, which of the following is true?

As mentioned above, the correlation coefficient theoretically assumes values in the interval between +1 and −1, including the end values +1 or −1 (an interval that includes the end values is called a closed interval, and is denoted with left and right square brackets: [, and], respectively. Accordingly, the correlation coefficient assumes values in the closed interval [−1, +1]). However, it is not well known that the correlation coefficient closed interval is restricted by the shapes (distributions) of the individual X data and the individual Y data. The extent to which the shapes of the individual X and individual Y data differ affects the length of the realised correlation coefficient closed interval, which is often shorter than the theoretical interval. Clearly, a shorter realised correlation coefficient closed interval necessitates the calculation of the adjusted correlation coefficient (to be discussed below).

The length of the realised correlation coefficient closed interval is determined by the process of ‘rematching’. Rematching takes the original (X, Y) paired data to create new (X, Y) ‘rematched-paired’ data such that all the rematched-paired data produce the strongest positive and strongest negative relationships. The correlation coefficients of the strongest positive and strongest negative relationships yield the length of the realised correlation coefficient closed interval. The rematching process is as follows:

  1. 1

    The strongest positive relationship comes about when the highest X-value is paired with the highest Y-value; the second highest X-value is paired with the second highest Y-value, and so on until the lowest X-value is paired with the lowest Y-value.

  2. 2

    The strongest negative relationship comes about when the highest, say, X-value is paired with the lowest Y-value; the second highest X-value is paired with the second lowest Y-value, and so on until the highest X-value is paired with the lowest Y-value.

Continuing with the data in Table 1, I rematch the X, Y data in Table 2. The rematching produces:

If the correlation between two variables x and y is equal to -0.90, which of the following is true?

Table 2 Rematched (X, Y) data of Table 1

So, just as there is an adjustment for R2, there is an adjustment for the correlation coefficient due to the individual shapes of the X and Y data. Thus, the restricted, realised correlation coefficient closed interval is [−0.99, +0.90], and the adjusted correlation coefficient can now be calculated.


Page 2

From: The correlation coefficient: Its values range between +1/−1, or do they?

Obs X Y zX zY zX × zY
1 12 77 −1.14 −0.96 1.11
2 15 98 −0.62 1.07 −0.66
3 17 75 −0.27 −1.16 0.32
4 23 93 0.76 0.58 0.44
5 26 92 1.28 0.48 0.62
Mean 18.6 87.0 Sum=1.83
s.d. 5.77 10.32    
  n=5   r=0.46

<p>itur laoreet. Nam risus ante, dapibus a molestie consequat, ultrices ac magna. Fusce dui lectus, congue vel laoreet ac, dictum vitae odio. Donec aliquet. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam lacinia pulvinar tortor nec facilisis. Pe</p> Fusce dui lectus, congue vel laoreet ac, dictum vitae odio. Donec aliquet

ec facilisis. Pellentesque dapibus efficitur laoreet. Nam risus ante, dapibus a molestie consequat, ultrices ac magna. Fusce dui lectus, congue vel laoreet ac, dictum vitae odio. Donec aliquet. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam lacinia pulvinar tortor nec facilisis. Pellentesque dapibus efficitur laoreet. Nam risus ante, dapibus a molestie consequat, ultrices ac magna. Fusce dui lectus, congue vel laoreet ac, dictum vitae odio. Donec aliquet. Lorem ipsum dolor sit amet, conse

congue vel laoreet ac, dictum vit

pulvinar tortor nec facilisis. Pellentesque dapibus efficitur laoreet. Nam risus ante, dapibus a molestie consequat, ultrices ac magna. Fusce dui lectus, congue vel laoreet ac, dictum vitae odio. Donec aliquet. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam lacinia pulvinar tortor nec facilisis. Pellentesque dapibus efficitur laoreet. Nam risus ante, dapibus a molestie consequat, ultrices ac magna. Fusce dui lectus, congue vel laoreet ac, dictum vitae odio. Done

rem ipsum dolor sit amet, consectetur adipiscing elit. Nam lacinia pulvinar tortor nec facilisis. Pellentesque dapibus efficitur laoreet. Nam risus ante, dapibus a molestie consequat, ultrices ac magna. Fusce dui lectus, congue vel laoreet a

gue

onec aliquet. Lorem ipsum dolor sit am

itur laoreet. Nam risus ante, dapibus a molestie consequat, ultrices ac magna. Fusce dui lec

consectetur adipiscing elit. Nam lacinia pulvinar tortor nec facilisis. Pellentesque dapibus efficitur laoreet. Nam risus ante, dapibus a molestie consequat, ultrices ac magna. Fusce dui lectus, congue vel laoreet ac, dictum vitae odio. Donec aliquet. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam lacinia pulvinar tortor nec facilisis. Pellentesque dapibus efficitur laoreet. Nam risus ante, dapibus a molestie consequat, ultrices ac magna. Fusce dui lectus, congue vel laoreet ac, dictum vitae odio. Donec aliquet. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam lacinia pulvinar tortor nec facili

gue

consectetur adipiscing elit. Nam lacinia pulvinar tortor nec facilisis. Pellentesque dapibus efficitur laoreet. Nam risus ante, dapibus a m

lestie consequat, ultrices ac magna. Fu

m ipsum dolor sit amet, consectetur adipiscing elit. Nam lacinia pulvinar tortor nec facilisis. Pellentesque da

sus ante, dapibus a molestie consequat,

ur laoreet. Nam risu

ac, dictum vitae odio. D

fficitur laoreet. Nam ris

lestie

facilisis. Pellentesque dapibus efficitur laoreet. Nam risus ante, dapibus a molestie consequat, ultrices ac mag

facilisis. Pellentesque dapibus efficitur laoreet. Nam risus ante, dapibus a molestie consequat, ultric

dictum

iscing elit. Nam lacinia pulvinar tortor nec facilisis. Pellentesque dapibus efficitur laoreet. Nam risus ante, dapibus a molesti

fficitur laoreet. Nam risus ante,

Fusce dui lectus, congue vel laoreet ac, dictum vitae odio. Donec aliquet. Lorem ipsum dolor sit amet, cons

Step-by-step explanation

ng elit. N

at, ultrices ac magna. Fusce dui lectus, confficitur laoreet. Nam risus ante, dapibus a molestie consequat, ultric

onec aliquet. Loremur laoreet. Nam risus ante, dapibus a molestie consequat, ultrices ac magna. Fusce dui lectus, congue vel laoreet ac, di

lestie consequat, ultFusce dui lectus, congue vel laoreet ac, dictum vitae odio. Donec aliquet. Lore

s a molestie consequat, ultrices ac magna. Fusce dui lectus, congue vel laoreet ac, dictum vitae odio. Donec aliquet. Loremsus ante, dapibus a molestie consequat, ultrices ac magna. Fusce dui lectus, congue vel laoreet ac, dictum vitae odio. Donec aliquet. Lorem ipsum dolor sit amet, co

m ipsum dolor sit amet, consect, dictum vitae odio. Donec aliquet. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam lacinia pulvinar tortor nec facilisis. Pellentesque dapibus effic

Correlation and Causation

What are correlation and causation and how are they different?


Two or more variables considered to be related, in a statistical context, if their values change so that as the value of one variable increases or decreases so does the value of the other variable (although it may be in the opposite direction).

For example, for the two variables "hours worked" and "income earned" there is a relationship between the two if the increase in hours worked is associated with an increase in income earned. If we consider the two variables "price" and "purchasing power", as the price of goods increases a person's ability to buy these goods decreases (assuming a constant income).

Correlation is a statistical measure (expressed as a number) that describes the size and direction of a relationship between two or more variables. A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable.

Causation indicates that one event is the result of the occurrence of the other event; i.e. there is a causal relationship between the two events. This is also referred to as cause and effect.

Theoretically, the difference between the two types of relationships are easy to identify — an action or occurrence can cause another (e.g. smoking causes an increase in the risk of developing lung cancer), or it can correlate with another (e.g. smoking is correlated with alcoholism, but it does not cause alcoholism). In practice, however, it remains difficult to clearly establish cause and effect, compared with establishing correlation.

Why are correlation and causation important?

The objective of much research or scientific analysis is to identify the extent to which one variable relates to another variable. For example:

  • Is there a relationship between a person's education level and their health?
  • Is pet ownership associated with living longer?
  • Did a company's marketing campaign increase their product sales?
These and other questions are exploring whether a correlation exists between the two variables, and if there is a correlation then this may guide further research into investigating whether one action causes the other. By understanding correlation and causality, it allows for policies and programs that aim to bring about a desired outcome to be better targeted.

How is correlation measured?


For two variables, a statistical correlation is measured by the use of a Correlation Coefficient, represented by the symbol (r), which is a single number that describes the degree of relationship between two variables.

The coefficient's numerical value ranges from +1.0 to –1.0, which provides an indication of the strength and direction of the relationship.

If the correlation coefficient has a negative value (below 0) it indicates a negative relationship between the variables. This means that the variables move in opposite directions (ie when one increases the other decreases, or when one decreases the other increases).

If the correlation coefficient has a positive value (above 0) it indicates a positive relationship between the variables meaning that both variables move in tandem, i.e. as one variable decreases the other also decreases, or when one variable increases the other also increases.

Where the correlation coefficient is 0 this indicates there is no relationship between the variables (one variable can remain constant while the other increases or decreases).

While the correlation coefficient is a useful measure, it has its limitations:

Correlation coefficients are usually associated with measuring a linear relationship.


For example, if you compare hours worked and income earned for a tradesperson who charges an hourly rate for their work, there is a linear (or straight line) relationship since with each additional hour worked the income will increase by a consistent amount.

If, however, the tradesperson charges based on an initial call out fee and an hourly fee which progressively decreases the longer the job goes for, the relationship between hours worked and income would be non-linear, where the correlation coefficient may be closer to 0.

Care is needed when interpreting the value of 'r'. It is possible to find correlations between many variables, however the relationships can be due to other factors and have nothing to do with the two variables being considered. For example, sales of ice creams and the sales of sunscreen can increase and decrease across a year in a systematic manner, but it would be a relationship that would be due to the effects of the season (ie hotter weather sees an increase in people wearing sunscreen as well as eating ice cream) rather than due to any direct relationship between sales of sunscreen and ice cream. The correlation coefficient should not be used to say anything about cause and effect relationship. By examining the value of 'r', we may conclude that two variables are related, but that 'r' value does not tell us if one variable was the cause of the change in the other.

How can causation be established?

Causality is the area of statistics that is commonly misunderstood and misused by people in the mistaken belief that because the data shows a correlation that there is necessarily an underlying causal relationship

The use of a controlled study is the most effective way of establishing causality between variables. In a controlled study, the sample or population is split in two, with both groups being comparable in almost every way. The two groups then receive different treatments, and the outcomes of each group are assessed.

For example, in medical research, one group may receive a placebo while the other group is given a new type of medication. If the two groups have noticeably different outcomes, the different experiences may have caused the different outcomes.

Due to ethical reasons, there are limits to the use of controlled studies; it would not be appropriate to use two comparable groups and have one of them undergo a harmful activity while the other does not. To overcome this situation, observational studies are often used to investigate correlation and causation for the population of interest. The studies can look at the groups' behaviours and outcomes and observe any changes over time.

The objective of these studies is to provide statistical information to add to the other sources of information that would be required for the process of establishing whether or not causality exists between two variables.

Return to Statistical Language Homepage

Further information

ABS:


1500.0 - A guide for using statistics for evidence based policy

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

The correlation coefficient is a statistical measure of the strength of the relationship between the relative movements of two variables. The values range between -1.0 and 1.0. A calculated number greater than 1.0 or less than -1.0 means that there was an error in the correlation measurement. A correlation of -1.0 shows a perfect negative correlation, while a correlation of 1.0 shows a perfect positive correlation. A correlation of 0.0 shows no linear relationship between the movement of the two variables.

Correlation statistics can be used in finance and investing. For example, a correlation coefficient could be calculated to determine the level of correlation between the price of crude oil and the stock price of an oil-producing company, such as Exxon Mobil Corporation. Since oil companies earn greater profits as oil prices rise, the correlation between the two variables is highly positive.

  • Correlation coefficients are used to measure the strength of the relationship between two variables.
  • Pearson correlation is the one most commonly used in statistics. This measures the strength and direction of a linear relationship between two variables.
  • Values always range between -1 (strong negative relationship) and +1 (strong positive relationship). Values at or close to zero imply a weak or no linear relationship.
  • Correlation coefficient values less than +0.8 or greater than -0.8 are not considered significant.

There are several types of correlation coefficients, but the one that is most common is the Pearson correlation (r). This measures the strength and direction of the linear relationship between two variables. It cannot capture nonlinear relationships between two variables and cannot differentiate between dependent and independent variables.

A value of exactly 1.0 means there is a perfect positive relationship between the two variables. For a positive increase in one variable, there is also a positive increase in the second variable. A value of -1.0 means there is a perfect negative relationship between the two variables. This shows that the variables move in opposite directions—for a positive increase in one variable, there is a decrease in the second variable. If the correlation between two variables is 0, there is no linear relationship between them.

The strength of the relationship varies in degree based on the value of the correlation coefficient. For example, a value of 0.2 shows there is a positive correlation between two variables, but it is weak and likely unimportant. Analysts in some fields of study do not consider correlations important until the value surpasses at least 0.8. However, a correlation coefficient with an absolute value of 0.9 or greater would represent a very strong relationship.

Investors can use changes in correlation statistics to identify new trends in the financial markets, the economy, and stock prices.

The correlation between two variables is particularly helpful when investing in the financial markets. For example, a correlation can be helpful in determining how well a mutual fund performs relative to its benchmark index, or another fund or asset class. By adding a low or negatively correlated mutual fund to an existing portfolio, the investor gains diversification benefits.

In other words, investors can use negatively correlated assets or securities to hedge their portfolios and reduce market risk due to volatility or wild price fluctuations. Many investors hedge the price risk of a portfolio, which effectively reduces any capital gains or losses because they want the dividend income or yield from the stock or security.

Correlation statistics also allow investors to determine when the correlation between two variables changes. For example, bank stocks typically have a highly positive correlation to interest rates, since loan rates are often calculated based on market interest rates. If the stock price of a certain bank is falling while interest rates are rising, investors can glean that something's askew with that particular bank. If the stock prices of other banks in the sector are also rising, investors can conclude that the decline of the outlier bank's stock is not due to interest rates. Instead, the poorly performing bank is likely dealing with an internal, fundamental issue.

To calculate the Pearson product-moment correlation, one must first determine the covariance of the two variables in question. Next, one must calculate each variable's standard deviation. The correlation coefficient is determined by dividing the covariance by the product of the two variables' standard deviations.

ρ x y = Cov ( x , y ) σ x σ y where: ρ x y = Pearson product-moment correlation coefficient Cov ( x , y ) = covariance of variables  x  and  y σ x = standard deviation of  x σ y = standard deviation of  y \begin{aligned} &\rho_{xy} = \frac { \text{Cov} ( x, y ) }{ \sigma_x \sigma_y } \\ &\textbf{where:} \\ &\rho_{xy} = \text{Pearson product-moment correlation coefficient} \\ &\text{Cov} ( x, y ) = \text{covariance of variables } x \text{ and } y \\ &\sigma_x = \text{standard deviation of } x \\ &\sigma_y = \text{standard deviation of } y \\ \end{aligned} ρxy=σxσyCov(x,y)where:ρxy=Pearson product-moment correlation coefficientCov(x,y)=covariance of variables x and yσx=standard deviation of xσy=standard deviation of y

Standard deviation is a measure of the dispersion of data from its average. Covariance is a measure of how two variables change together, but its magnitude is unbounded, so it is difficult to interpret. By dividing covariance by the product of the two standard deviations, one can calculate the normalized version of the statistic. This is the correlation coefficient.

The correlation coefficient describes how one variable moves in relation to another. A positive correlation indicates that the two move in the same direction, with a +1.0 correlation when they move in tandem. A negative correlation coefficient tells you that they instead move in opposite directions. A correlation of zero suggests no correlation at all.

The correlation coefficient is calculated by first determining the covariance of the variables and then dividing that quantity by the product of those variables’ standard deviations.

Correlation coefficients are a widely-used statistical measure in investing. They play a very important role in areas such as portfolio composition, quantitative trading, and performance evaluation. For example, some portfolio managers will monitor the correlation coefficients of individual assets in their portfolios in order to ensure that the total volatility of their portfolios is maintained within acceptable limits.

Similarly, analysts will sometimes use correlation coefficients to predict how a particular asset will be impacted by a change to an external factor, such as the price of a commodity or an interest rate.