Use the Spearman correlation coefficient to examine the strength and direction of the monotonic relationship between two continuous or ordinal variables. In a monotonic relationship, the variables tend to move in the same relative direction, but not necessarily at a constant rate. To calculate the Spearman correlation, Minitab ranks the raw data. Then, Minitab calculates the correlation coefficient on the ranked data. StrengthThe correlation coefficient can range in value from −1 to +1. The larger the absolute value of the coefficient, the stronger the relationship between the variables. For the Spearman correlation, an absolute value of 1 indicates that the rank-ordered data are perfectly linear. For example, a Spearman correlation of −1 means that the highest value for Variable A is associated with the lowest value for Variable B, the second highest value for Variable A is associated with the second lowest value for Variable B, and so on. DirectionThe sign of the coefficient indicates the direction of the relationship. If both variables tend to increase or decrease together, the coefficient is positive, and the line that represents the correlation slopes upward. If one variable tends to increase as the other decreases, the coefficient is negative, and the line that represents the correlation slopes downward. The following plots show data with specific Spearman correlation coefficient values to illustrate different patterns in the strength and direction of the relationships between variables. The points fall randomly on the plot, which indicates that there is no relationship between the variables. The points fall close to the line, which indicates that there is a strong relationship between the variables. The relationship is positive because the variables increase concurrently. The points fall close to the line, which indicates that there is a strong relationship between the variables. The relationship is negative because as one variable increases, the other variable decreases. It is never appropriate to conclude that changes in one variable cause changes in another based on correlation alone. Only properly controlled experiments enable you to determine whether a relationship is causal.
In these results, the Spearman correlation between porosity and hydrogen is 0.590058, which indicates that there is a positive relationship between the variables. The Spearman correlation between strength and hydrogen is -0.858728 and between strength and porosity is -0.675468. The relationship between these variables is negative, which indicates that as hydrogen and porosity increase, strength decreases. Page 2
Use Simple Regression to plot and model the relationship between one continuous predictor and a response. You can fit a linear, quadratic, or cubic model to the data. For example, an engineer at a manufacturing site wants to examine the relationship between energy consumption and the setting of a machine used in the manufacturing process. The engineer believes the relationship between these variables is curvilinear. Therefore, the engineer performs a simple regression analysis and fits a quadratic model to the data. When to use an alternate analysisLinear Regression and Correlation OpenStaxCollege [latexpage] The correlation coefficient, r, tells us about the strength and direction of the linear relationship between x and y. However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n, together. We perform a hypothesis test of the “significance of the correlation coefficient” to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population. The sample data are used to compute r, the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only have sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r, is our estimate of the unknown population correlation coefficient.
The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is “close to zero” or “significantly different from zero”. We decide this based on the sample correlation coefficient r and the sample size n. If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is “significant.”
Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero. What the conclusion means: There is a significant linear relationship between x and y. We can use the regression line to model the linear relationship between x and y in the population. If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is “not significant”.
Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is not significantly different from zero.” What the conclusion means: There is not a significant linear relationship between x and y. Therefore, we CANNOT use the regression line to model a linear relationship between x and y in the population.
Note
WHAT THE HYPOTHESES MEAN IN WORDS:
DRAWING A CONCLUSION:There are two methods of making the decision. The two methods are equivalent and give the same result.
In this chapter of this textbook, we will always use a significance level of 5%, α = 0.05
Note Using the p-value method, you could choose any appropriate significance level you want; you are not limited to using α = 0.05. But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, α = 0.05. (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook.)
To calculate the p-value using LinRegTTEST: On the LinRegTTEST input screen, on the line prompt for β or ρ, highlight “≠ 0“ The output screen shows the p-value on the line that reads “p =”. (Most computer statistical software can calculate the p-value.)
If the p-value is less than the significance level (α = 0.05):
If the p-value is NOT less than the significance level (α = 0.05)
Calculation Notes:
An alternative way to calculate the p-value (p) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.
THIRD-EXAM vs FINAL-EXAM EXAMPLE: p-value method
H0: ρ = 0 Ha: ρ ≠ 0 α = 0.05
Because r is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.
The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of \(r\) is significant or not. Compare r to the appropriate critical value in the table. If r is not between the positive and negative critical values, then the correlation coefficient is significant. If r is significant, then you may want to use the line for prediction.
Suppose you computed r = 0.801 using n = 10 data points.df = n – 2 = 10 – 2 = 8. The critical values associated with df = 8 are -0.632 and + 0.632. If r < negative critical value or r > positive critical value, then r issignificant. Since r = 0.801 and 0.801 > 0.632, r is significant and the line may be usedfor prediction. If you view this example on a number line, it will help you.
Try It
For a given line of best fit, you computed that r = 0.6501 using n = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?
If the scatter plot looks linear then, yes, the line can be used for prediction, because r > the positive critical value.
Suppose you computed r = –0.624 with 14 data points. df = 14 – 2 = 12. The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is significant and the line can be used for prediction
Try It
For a given line of best fit, you compute that r = 0.5204 using n = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?
No, the line cannot be used for prediction, because r < the positive critical value.
Suppose you computed r = 0.776 and n = 6. df = 6 – 2 = 4. The critical values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is not significant, and the line should not be used for prediction.
Try It
For a given line of best fit, you compute that r = –0.7204 using n = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?
Yes, the line can be used for prediction, because r < the negative critical value.
Consider the third exam/final exam example.
Because r is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.
Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if r is significant and the line of best fit associated with each r can be used to predict a y value. If it helps, draw a number line.
Try It
For a given line of best fit, you compute that r = 0 using n = 100 data points. Can the line be used for prediction? Why or why not?
No, the line cannot be used for prediction no matter what the sample size is.
Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between x and y in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between x and y in the population. The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.
The assumptions underlying the test of significance are:
Least Squares Line or Line of Best Fit: \(\stackrel{^}{y}=a+bx\) where a = y-intercept b = slope Standard deviation of the residuals: \(s=\sqrt{\frac{SEE}{n-2}}.\) where SSE = sum of squared errors n = the number of data points
When testing the significance of the correlation coefficient, what is the null hypothesis?
When testing the significance of the correlation coefficient, what is the alternative hypothesis?
If the level of significance is 0.05 and the p-value is 0.04, what conclusion can you draw?
If the level of significance is 0.05 and the p-value is 0.06, what conclusion can you draw?
We do not reject the null hypothesis. There is not sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is not significantly different from zero.
If there are 15 data points in a set of data, what is the number of degree of freedom? |