Performing linear regression hypothesis testing on lemon imports and crash fatality rates

4/23/2019

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform linear regression hypothesis testing on data of lemon imports and car crash fatality rates. Here's our problem statement: Listed below are annual data for various years. The data are weights in metric tons of imported lemons and car crash fatality rates per 100,000 population. Construct a scatterplot, find the value of the linear correlation coefficient R, and find the P-value using alpha equals 5%. Is there sufficient evidence to conclude that there is a linear correlation between lemon imports and crash fatality rates? Do the results suggest that imported lemons cause car fatalities?

Part 1

OK, this first part of the problem is asking for the null and alternative hypotheses. A null hypothesis by definition is a statement of equality, so we're not going to select Answer option B here because this null hypothesis is not a statement of equality. Of the three answer options that remain, we're going to look at the alternative hypothesis because that's what's different among those three answer options.

What is our alternative hypothesis? Well, when you're dealing with linear correlation, there's three ways this can go. You're looking for linear correlation in general, or you're looking for a specific type of linear correlation, which could be positive or negative. Here in our problem statement, we're asked, "Is there sufficient evidence to conclude that there is a linear correlation?" So we're not looking for positive or negative linear correlation, just linear correlation in general. That means it could be positive, which would be the right side, the right tail of our distribution; or it could be on the left, the negative, the left tail of the distribution. So a two-tailed test means we're going to have not equal to be our alternative hypothesis. And that is Answer option A. Well done!

Part 2

Now the next part asks for a scatter plot. We're going to have to put this data into StatCrunch to do that. Well, actually we could do it in Excel as well, but I just think this is a little easier in StatCrunch. So let me move this window so we can see everything better. Beautiful. Alright. Now our data is here in StatCrunch. I could go to Graph and then on down here to Scatter Plot, but I know I'm going to have to make a linear regression model for the next parts of the problem, and in part of doing that regression analysis I'm going to get a scatter plot. So let's just go ahead and do the regression analysis and get everything all in one shot.

I go to Stat --> Regression --> Simple Linear. Here in my options window, I'm asked to identify the x- and y-variables. Sometimes in the problem they'll specify what they want to be the x-variable. But typically you're not going to get anything like that. So the guiding assumption is that the variable that's mentioned first is your x-variable. Here that's going to be the lemon imports, which means the y-variable will be the crash fatality rates. I want a hypothesis test, so this radio button, it needs to be selected, and it is by default. I make sure these values match what I've got here from the earlier part of the problem, and they do. I hit Compute!, and here's my results window. The scatterplot --- of course, notice here it says "1 of 2". So the scatter plot is going to be on the second page. Well, I come down here and click on this arrow. I get to the second page and wow! There's my scatter plot along with a line of best fit. So yeah, this one's clearly the one we want to select. Good job!

Part 3

Now the third part of this problem asks for the linear correlation coefficient. I come back here to my results window and click back to the first page. My correlation coefficient is right up here at the top. So we're asked to round to three decimal places. I can stick that number in. Notice the negative sign out in front. You have negative correlation. Don't forget that negative sign. Fantastic!

Part 4

Now the next part asks for the test statistic. Notice our test statistic here is a t-score. Well, if I look here at my parameter estimates table, I have a column labeled t-stat. These are t-scores, but notice we have two of them. So which one are we going to be using? Well, the first one corresponds with the intercept of the model, and the second corresponds with the slope. We're going to select the one for the slope because the slope makes a more major contribution to the creation of our model than the intercept does. The intercept just shows us where it's located on the graph, but the slope is, you know, how steep that line is going to be, and that's a bigger part of our model than where it's located on the plot. So we're going to select that t-score for that second item, the slope there in our model. Again, don't forget that negative sign. We're asked to round to three decimal places. Excellent!

Part 5

Next we're asked for the P-value. The P-value is right next door to the test statistic, just as we've seen previously, so we're going to take that same row there for the slope and take its P-value there, 0.016. Good job!

Part 6

Now the next part of this problem is asking us to make a conclusion about our hypothesis test. Our P-value 1.6% is going to be less than our alpha level of 5%. So I select that here in the drop down because our P-value is less than our significance level. That means we're inside the region of rejection. So we're going to reject the null hypothesis. And every time we reject the null hypothesis, there is sufficient evidence. Good job!

Part 7

And now this last part of the problem asks, "Do the results suggest that imported lemons cause car fatalities?" Well, one of the things that we need to clearly understand about correlation is that correlation does not imply causation. All correlation does is say that there's some relationship at work here. It may be a cause and effect relationship, but it's not necessarily so. And in fact, more often than not, when you've correlated two variables, what it means is that those two variables are related to a third variable that's not in your regression analysis. So you can't say that there's necessarily a cause and effect relationship going on here. And that's exactly what we see. If we look at our answer options here, it's Answer option D. Good job!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

1 Comment