Blog Archives

Performing the Wilcoxon signed ranks test for earnings by education

4/29/2020

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform the Wilcoxon signed ranks test for earnings by education. Here's our problem statement: The table shows the earnings and thousands of dollars of a random sample of 11 people with bachelor's degrees and 10 people with associate's degrees, and alpha equals 0.05. Is there enough evidence to support the belief that there is a difference in the earnings of people with bachelor's degrees and those with associate's degrees? Complete Parts A through E below.

Part A

OK, first we're asked to write the claim mathematically and identify the null and alternative hypotheses. We know that the null hypothesis is going to be a statement of equality. And here it's pretty much going to be that there is no difference in the earnings. The alternative hypothesis will then be that there is a difference in the earnings. Let's check what we're testing here. I don't see anything about a claim but we're looking for evidence to support the belief that there is a difference in the earnings of people with bachelor's degrees and those with associate's degrees. So there is a difference in the earnings, and that's going to be the claim that we're testing. And that's going to be the alternative hypothesis. Nice work!

Part B

Now Part B asks us for the critical values. We can get the critical values from the distribution calculator in StatCrunch. Our test statistic is a Z score, and so therefore we can get our critical values from the standard Normal distribution because that's where Z scores come from --- the standard Normal distribution. So here in StatCrunch, I'm going to go to Stat --> Calculators --> Normal. Here my alternative hypothesis is there is a difference. So that difference could be less than or greater than; it could be negative or positive. So that means that we're going to have to select the Between option up here because we've got a two-tailed test. And let's see, we've got a significance level of 5%, so that means the area in between the tails is going to be 95%. And there's my critical values. I'm gonna use this plus or minus sign so that I don't have to type the number more than once. Excellent!

Part C

Now Part C asks us for the test statistic. The test statistic --- well, I would love it if StatCrunch could do this, but for an independent Wilcoxon signed ranks test, that's not going to happen, at least not with the way StatCrunch is coded right now. So that means we're going to have to go the old school route and use the data in Excel.

Here's my data in Excel. Let's do a little bit of housekeeping so we can see a little bit better everything that's going on. First thing I'm going to do is I'm going to center everything. And now we're going to come and make that whole first row bold typeface. Let me go ahead and replace some of these values here. We'll have a sample row here for our first column. So here we're going to list the numbers. I'm going to solve with the bachelors salary is going to be Sample 1, and then my associates salary is going to be Sample 2. No, we don't need you anymore. Now we're just going to relabel you as salary. OK, now we can see better everything that's going on.

So now in the next column I want to put my rankings, but I've got to sort this list first. So let's go up to Data --> Sort. I want to sort by salary from smallest to largest. Now we can put our rankings in the preliminary rankings. We'll start with just one, two, three, and so on and so forth. There we go. Now we've got 21 values in our data set.

And so now we want to look to see if there are any same salaries. And there are, so we've got to break ties, which means we've got to make adjustments to our rankings. So let's see, the first tie is right here, so that average is 4.5. So that changes the rankings of those to 4.5. And then these two are the same, so we have to replace those. And let's see, what else did we get? Those three are the same, so we're going to replace those. And the next two are the same, so we replaced those. And then these two are the same, so we're going to replace those. And that's it. So now the ones that have unique values to them, we're just going to have to bring those over, and there. We're done. So now we've got all of the proper rankings that we have for each of our values.

Now I'm going to resort my list, this time by sample number. So now I've got all the samples here, and our value is going to be some of the smaller samples. So we see Sample 1. It has a sum of 143, but there's 11 here. The other sample has 10 values, and its sum is 88. So this is going to be our R-value 88. n1 will be 10, and then n2 will be 11 . So to start off with our old school calculation here, here's our formula. And we've got 88 for an R-value, n1 is 10 and n2 is 11. So we substitute these values into our expression, and then we start to simplify. So we get to a point where we can put stuff in a calculator and out comes our test statistic, which we're asked to round to two decimal places. Excellent!

Part D

Part D says, "Decide whether to reject or fail to reject the null hypothesis." We can do that. Simple enough. We've got our test statistic here. We've got our critical values here . So if I come back here to StatCrunch, we've got the critical value here in the tails and they're bounded by --- excuse me, the critical values are bounding the tails. And we've got our test statistic, -1.55, so that's going to put us right about here. So our test statistic is here in between the tails outside the region of rejection. So because it's between the critical values and outside the reason of rejection, we're going to fail to reject the null hypothesis. Good job!

Part E

Now Part E wants us to interpret this conclusion of our hypothesis test in the context of the original claim. So we failed to reject the null hypothesis. That means it could potentially be true, and failing to reject the null hypothesis means that this statement "There is no difference in earnings" is potentially true. So there's not enough evidence to support the claim. Well done!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

0 Comments

Performing contingency table hypothesis testing to evaluate baseball player birthdays

4/28/2020

2 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform contingency table hypothesis testing to evaluate baseball player birthdays. Here's our problem statement: An author argues that more American-born baseball players have birth dates in the months immediately following July 31st because that was the age cutoff date for non-school baseball leagues. The table below lists months of births for a sample of American-born baseball players and foreign-born baseball players. Using a 5% significance level, is there sufficient evidence to warrant rejection of the claim that months of birthdays of baseball players are independent of whether they are born in America? Do the data appear to support the author's claim?

Part 1

OK, I can't be really sure, but I think the author that they're looking at here is Malcolm Gladwell, and they're making reference to his popular book Outliers. I've actually read Outliers. In the table --- he actually talks about hockey players, but you know, he extends what he's saying about hockey players to all sports. And it's the same idea that whether or not you succeed, your chances of success are going to depend upon when you are actually born. And he extends that idea to a whole lot of other things to success and finances and all those kinds of stuff. And it's kind of an interesting theory to look at. So we have a problem here that's similar to the theory that he has in his book.

Anyhoo, the first part of this problem asks us to identify the null and alternative hypotheses for a contingency table test. The null hypothesis is just going to say that we're equal to the status quo, and the status quo is whatever we claim it to be. So here the claim is that months of birthday, months of births --- let me start over. The claim is that months of births of baseball players are independent of whether they are born in America. So the months of births are independent of where they're born. And that means the alternative will be the compliment of that, that the months of births are dependent on where they're born. So let's see, that's going to be this option here. Good job!

Part 2

Now the next part asks us to identify the test statistic, and the test statistic we're going to get out of StatCrunch. So let's take this data table here and dump it into StatCrunch. I'm dumping the data into StatCrunch. OK, here we go. So now inside StatCrunch, I'm going to go to Stat --> Tables --> Contingency --> With summary. Here underneath the Columns option in my options window, I want to select every column except for the first one; I want to select all the columns with numbers in them. So I'm going to select this first one and then scroll down to the bottom. And then while I'm holding the shift key on my keyboard, I left click on that last option there. And see, now I've selected every option but the first one. And that's what I want, because this first column doesn't have actual numbers in it, so we don't want to select it.

Then here under Row labels, I want to make sure I select a column with numbers. And it doesn't matter which one I select, it's gonna --- you know, it doesn't matter which one you select as long as it has numbers in it. So let's just select January. I like January. It's birthstone is the garnet. (I think that's right.) And I like the kind of red color of the garnet, so I'm going to select January. But you can select a different month, and let's say you were born in April. OK, go ahead and select April. That's fine. It's doesn't matter which one you select.

Now we're ready to go. And here we have our results. And right here we have what we want for our test statistic. And we're asked to round to three decimal places. Fantastic!

Part 3

Now the next part asks for the P-value. Easy peasy! We've already done that. The P-value here is right next door to the test statistic. We're asked to round that to three decimal places. Good job!

Part 4

Now the last part of this problem is asking us to state the final conclusion. And to do that we're going to compare our P-value with our significance level. So here we've got a P-value of just under 5%. 5% is our significance level. So we're just barely inside the region of rejection, and that means we're going to reject the null hypothesis. It doesn't matter whether you're close to the boundary or far away from the boundary. In is in, and out is out. We're in, so that means we're in. Reject the null hypothesis. And every time you reject the null hypothesis, there is always sufficient evidence. Nice work!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

2 Comments

Performing a sign hypothesis test of median heights

4/24/2020

0 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform a sign hypothesis test of median heights. Here's our problem statement: Use a 5% significance level to test the claim that there is a difference between the actual and reported heights in inches for 12 to 16 year old boys. The data is listed in the table below. Let mu1 denote the mean of the first variable and mu2 denote the mean of the second variable.

Part 1

OK, so here we've got the first part of our problem, and we're asked to find the null and alternative hypotheses. So the null hypothesis will always be a statement of equality. And the alternative hypothesis typically reflects the claim. Here our claim is that there's a difference between actual and reported heights, so that means we're going to have not equal to as our inequality sign. And that combination of equal-not-equal-to is found here. Good job!

Part 2

Now the second part wants us to find the test statistic. You may be thinking since we have actual data here that we can just dump this into StatCrunch and then, you know, have StatCrunch perform the hypothesis test for us. Unfortunately, the sign hypothesis test feature of StatCrunch works only if you've got less than 25 for a sample size. So we actually have more than that for sample size here because, if you look, we've got one, two, three, four, five, six, seven, eight, nine, ten columns, three rows. 10 times 3 is 30. So StatCrunch will not calculate our test statistic for us. And that's why we can't actually use StatCrunch for this dataset.

We can, however, use Excel to kind of accelerate the “old school” way of doing this. So let's go ahead and just dump this data in Excel, and I actually have that right here. So here we have the data here in Excel. So the first thing I need to do is get my differences so I can count the number of positive signs and negative signs. So I'm just going to take the first value and subtract it from second value, so the first minus the second.

Then I'm going to take this formula, and I'm going to copy it all the way down. I could drag it, but dragging is really useful only if you've got, like, a few cells to drag it to. I've got more than just a few, so I'm going to copy this by pressing Ctrl+C on my keyboard. And then using my arrow keys, I'm going to go down to the bottom of the list by pressing Ctrl while I hit the down arrow. That takes me to the bottom of the list. So I want these copied cells to end here.

So now I'm going to put my --- see I've got my, I've got that cell selected there --- and now I'm going to go and press Ctrl+Shift on my keyboard while I press the up arrow. It takes me to the top of the list. Now these cells that are shaded are going to be the ones where I want to copy my formula. So now I press Ctrl+V to copy and, look, it's all there for me. When I press Ctrl+Down it takes me --- wow, it's the bottom of my list. And so now here, right down here, I'm actually doing my count. So let's count positives first, and then let's do negatives.

So here I've got the counting of the number of the positives, and I'm going to use the COUNTIF function to do that. COUNTIF says we're going to select where we want to do our counting, but we only want the computer to include a cell in the count if it meets certain criteria. And here we're going to have the criteria be that the number is positive. So I open my parentheses, and now I need to select my range. And I do that by going up to this next cell up here and then Ctrl+Shift+ [on my keyboard] Up arrow takes me to the top of the list. There's my range. I put in a comma so I can put in the next element of the function. Now it's asking for the criteria. Typically we put the criteria in quotation marks. And we want these to be positive numbers; that's going to be greater than zero. And I close my parentheses, and now there's my formula. I hit Enter, and it automatically counted all the positive numbers for me.

I'm going to do the same thing with the negative numbers so we can get a quick count of them. And you know, if it we're just a handful of numbers, I wouldn't mind just counting them myself, but I got more than just a little handful here.

Alright, so we've got 18 and 10; that adds to 28. We've got a couple of zeros in the list. Here's one here. And if we scroll up a little bit, we can see the second one here. So zeros, of course, are not included in our counts because we don't really want those included. So now I've got 18 positive and 10 negative. Now I got my summary stats that I can use to actually calculate my test statistic with.

And to do that, we're going to go back to our handy dandy z-score formula. So here we're going to see that X is the less of those two numbers. So that's going to be 10. And N is going to be the sum of those two numbers, which is 28. So now we substitute those into our formula and --- well, that should actually be a 28 there. Well, that's a typo. And then, yeah, so we got that number fixed right here. And then we just simplify that expression. Punch it out on the calculator, and here comes our test statistic: -1.32. Nice work!

Part 3

Now we're asked to find the P-value. And here StatCrunch actually is rather helpful for finding the P-value. So we're going to go back to the --- I mean, because alternatively, we could use the z-score tables, but I'm lazy. I like the 21st century. I want to use technology. And we're going to have to work a little bit anyway to get our P-value because we've got a two-tailed test.

So here in StatCrunch, I want to go to Stat --> Calculators --> Normal. Here in my calculator I want to select the Between option because, as you see here, we have a two tail test here. I'm going to put in my test statistics. So on the negative side -1.32, and I want to put in all those decimal places that I had before. So let me move this over so I can stick all that in and get 1.322875. And I put the positive version of my test statistic here. And now StatCrunch has calculated the area in between the tails. The P-value is the area in the tails. So I want to take this number, and in my calculator I'm going to subtract that from 1. And there's my P-value. I’m asked to round to four decimal places. So let's see, that brings me out to there. Excellent!

Part 4

And now the last part asks, "Is there sufficient evidence to support H1?" Well, supporting H1, or the alternative hypothesis, is the same thing as rejecting the null hypothesis. Can we reject the null hypothesis? Well, we've got a P-value of almost 19%. It's well above our significance level of 5%, so we're outside the region of rejection. We fail to reject the null hypothesis, and therefore we fail to support the alternative hypothesis. Good job!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

0 Comments

Identifying and interpreting a linear correlation P-value using its definition

4/3/2020

0 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to identify and interpret a linear correlation P-value using its definition. Here's our problem statement: For a data set of weights in pounds and highway fuel consumption amounts in miles per gallon of 10 types of automobile, the linear correlation coefficient is found, and the P-value is 0.004. Write a statement that interprets the P-value and includes a conclusion about linear correlation.

Solution

OK, here, uh, the statement they want us to write is already mostly written. We just have to fill in a few of the blanks. The first part of the statement says, "The P-value indicates that the probability of a linear correlation coefficient that is at least extreme as something percent." Well, this first part is the definition of the P-value itself. And so when they give you the P-value and they're asking you for, again, the P-value, but here notice the percent sign. They want you to take this decimal and convert it to a percent.

That's easy enough done. Just move that decimal place over two places to the right. So 0.004 becomes 0.4; this is a very low value. And of course when the P-value is low, that indicates that there's statistical evidence for a linear correlation. And that's really all there is to it. Fantastic!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

0 Comments

Performing the Wilcoxon signed ranks test for earnings by education

Intro

Part A

Part B

Part C

Part D

Part E

Performing contingency table hypothesis testing to evaluate baseball player birthdays

Intro

Part 1

Part 2

Part 3

Part 4

Performing a sign hypothesis test of median heights

Intro

Part 1

Part 2

Part 3

Part 4

Identifying and interpreting a linear correlation P-value using its definition

Intro

Solution

Author

Archives

Stats

Company

Support