Performing linear correlation hypothesis testing for bill totals and tip amounts

10/23/2018

Intro

Howdy! I’m Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we’re going to learn how to perform linear correlation hypothesis testing for bill totals and tip amounts. Here's our problem statement: Listed below are amounts of bills for dinner and the amounts of the tips that were left. Construct a scatterplot, find the value of the linear correlation coefficient R, and find the P-value of R. Determine whether there is sufficient evidence to support a claim of linear correlation between the two variables. Use a significance level of α = 0.01. If everyone were to tip at the same percentage, what should be the value of R?

Part 1

OK, the first part asks me to construct a scatterplot. To do that, I’m going to take my data that we see right here, and I’m going to dump it into StatCrunch. Now my data is here in StatCrunch, and I’m going to resize this window so we can see a little bit better everything that's going to go on.

OK, to make a scatterplot, I could just go to Graph –> Scatter Plot. But I know I'm gonna have to do some linear correlation work to get the rest of the problem out. And I also know I get a scatterplot when I go and do that linear correlation in StatCrunch. So instead of going to Graph, I'm going to come up here to Stat –> Regression –> Simple Linear. Here in my options window, I have to select the data for my X and Y variables. The X variable is typically the one that’s listed first; in this case, it's going to be the bill, and the Y will then be the tip.

Then I want to conduct the hypothesis tests because there is a claim of linear correlation that we’re testing. Notice how we have two hypothesis tests that appear, one for the intercept and one for the slope. We don't really want this. What we want is a test on the linear correlation, which for the population parameter for that will be the Greek letter rho. That’s not we see here, but we're going to leave these defaults alone because these are the default settings that are here inside StatCrunch and they match the settings that we want to test for linear correlation. It could be positive correlation. It could be negative correlation. We don't really know. It's just going to be a test either way.

So we’re going to have the null hypothesis of course will equal zero. The alternative hypothesis will be not equal to zero. And that's what we want. So we come down here and hit Compute! And then I’m going to resize my results window. And notice here how it says up here at the top “1 of 2.” That’s because we’re looking at the first of two pages for our results. The scatterplot is on the second page, and to get there I just press this little arrow button in the bottom right. And lo and behold, here's my scatterplot.

Now in order to make comparison with my answer options much easier, I'm going to change the axis values on my scatterplot. Notice here the Y goes from about 4 to about 16 where here in our answer options the Y goes from 0 to 25. There’s also a similar disparity between the upper and lower limits that we see here on our x-axis. So I going to go ahead and change that by clicking on this little icon with the three lines down in the lower left corner. And then I click on X-axis, and now I can change my x-axis to match the minimum and maximum bounds that we see in my answer options. I do the same thing for the Y. And see, now our axes are the same, we’re actually comparing apples with apples, and so it’s easier to match what we have here in StatCrunch with the correct answer.

So let’s go through and look at these answer options one at a time to see which is the correct answer. I start here with answer option A. And this looks like it might be a possibility here. Looking at this, this is about 35 to about 5, and 35 to about 5, and these points are coming in about the same way. So A looks like a possibility, but let's look at the others just to make sure.

Answer option B? Definitely not; this cluster of points right here is not represented here in our scatterplot. So that’s not going to be it. Answer option C? Again, look at this. If you imagine a line of best fit here; this line’s going to have a negative slope. If you go from left to right, the points are generally trending downward. Here we have a positive slope, so this is going to be right. Answer option D? Again, answer option D has a negative slope to it, so we’re not going to select that. We’re goig to select answer option A. Excellent!

Part 2

Now the second part of the problem asks for the linear correlation coefficient. We already have that here results window; I just flip back to the first page, and the value for R is located right here at the top. So we’re asked to round to three decimal places. Well done!

Part 3

The third part of the problem asks me to determine the null and alternative hypotheses, which as I was saying before we were looking at the options window in StatCrunch, this pretty much set standard. The hypotheses were testing in StatCrunch aren’t what we see here with the rho, but it's pretty standard when you're testing for linear correlation. The null hypothesis in going to be equal to zero. The alternative hypothesis will be not equal to zero. Nice work!

Part 4

Now the next part of the problem asks for the test statistic. So that’s here in my results window. You can see that we have the second to last number in our table is always a test statistic. But here we've got two test statistics: We’ve got one for the intercept and one for the slope. So which one do we want? Generally, you’re going to want the one for the slope, since the slope has more of an influence on whether or not your line of best fit fits your data points than the actual y-intercept. So I’m going to take this value here as the test statistic for my slope. That's the test statistic that I want to put here in my answer field. I’m asked to round to two decimal places. Good job!

Part 5

Next I'm asked for the P-value. Again, I have two P-values here in my table, but I’m going to want the one here for the slope just like I did for the test statistic. Here I’m asked to round to three decimal places. Fantastic!

Part 6

Now the next part asks me to compare the P-value with the significance level. Here we have a P-value of a little over 7%. My significance level alpha is 1%, so were definitely greater than the significance level. And that means we are outside the region of rejection, and when you are outside the region projection, you fail to reject the null hypothesis. And whenever you fail to reject the null hypothesis, there is not sufficient evidence. Fantastic!

Part 7

And now the last part of the problem asks if everyone were to tip with the same percentage, then what is the value of R? Well, if we think about this for a moment, it should become apparent what number we need to put here. If everyone tipped with the same percentage, then that means that each of the X values for all of our ordered pairs will be the same proportion of the corresponding Y coordinate, and that means that that line of best fit is going to go through each of our points exactly. So we’re going to have perfect correlation.

Not only will we have perfect correlation, but the slope of our line is going to be positive, because the more the bill is, the higher the tip. So we’re going to have perfect positive correlation. What R value corresponds with perfect positive correlation? This is going to be the highest value that R can be. Since R by definition is bounded between -1 and 1 inclusive, the highest that R could be is 1. And an R value of one represents perfect positive correlation. So I'm going to submit that as my answer. Fantastic!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you’d like to see. Thanks for watching! We’ll see you in the next video.

3 Comments

Demetria Staley

7/10/2020 10:42:22 pm

Your videos are the best. You have saved my statistic grade!! Thank you!!!

Brandi Phillips

8/6/2020 12:25:29 am

These are great, can you help with more chapter 10 questions regarding Spearman's correlation in StatCrunch and finding critical values?

Jessica Jordan

11/12/2020 10:05:46 pm

This is so helpful. Saved my grade because my teacher doesn't teach. Had no idea how to use statcrunch before these videos.