Performing contingency table hypothesis testing to evaluate baseball player birthdays

4/28/2020

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform contingency table hypothesis testing to evaluate baseball player birthdays. Here's our problem statement: An author argues that more American-born baseball players have birth dates in the months immediately following July 31st because that was the age cutoff date for non-school baseball leagues. The table below lists months of births for a sample of American-born baseball players and foreign-born baseball players. Using a 5% significance level, is there sufficient evidence to warrant rejection of the claim that months of birthdays of baseball players are independent of whether they are born in America? Do the data appear to support the author's claim?

Part 1

OK, I can't be really sure, but I think the author that they're looking at here is Malcolm Gladwell, and they're making reference to his popular book Outliers. I've actually read Outliers. In the table --- he actually talks about hockey players, but you know, he extends what he's saying about hockey players to all sports. And it's the same idea that whether or not you succeed, your chances of success are going to depend upon when you are actually born. And he extends that idea to a whole lot of other things to success and finances and all those kinds of stuff. And it's kind of an interesting theory to look at. So we have a problem here that's similar to the theory that he has in his book.

Anyhoo, the first part of this problem asks us to identify the null and alternative hypotheses for a contingency table test. The null hypothesis is just going to say that we're equal to the status quo, and the status quo is whatever we claim it to be. So here the claim is that months of birthday, months of births --- let me start over. The claim is that months of births of baseball players are independent of whether they are born in America. So the months of births are independent of where they're born. And that means the alternative will be the compliment of that, that the months of births are dependent on where they're born. So let's see, that's going to be this option here. Good job!

Part 2

Now the next part asks us to identify the test statistic, and the test statistic we're going to get out of StatCrunch. So let's take this data table here and dump it into StatCrunch. I'm dumping the data into StatCrunch. OK, here we go. So now inside StatCrunch, I'm going to go to Stat --> Tables --> Contingency --> With summary. Here underneath the Columns option in my options window, I want to select every column except for the first one; I want to select all the columns with numbers in them. So I'm going to select this first one and then scroll down to the bottom. And then while I'm holding the shift key on my keyboard, I left click on that last option there. And see, now I've selected every option but the first one. And that's what I want, because this first column doesn't have actual numbers in it, so we don't want to select it.

Then here under Row labels, I want to make sure I select a column with numbers. And it doesn't matter which one I select, it's gonna --- you know, it doesn't matter which one you select as long as it has numbers in it. So let's just select January. I like January. It's birthstone is the garnet. (I think that's right.) And I like the kind of red color of the garnet, so I'm going to select January. But you can select a different month, and let's say you were born in April. OK, go ahead and select April. That's fine. It's doesn't matter which one you select.

Now we're ready to go. And here we have our results. And right here we have what we want for our test statistic. And we're asked to round to three decimal places. Fantastic!

Part 3

Now the next part asks for the P-value. Easy peasy! We've already done that. The P-value here is right next door to the test statistic. We're asked to round that to three decimal places. Good job!

Part 4

Now the last part of this problem is asking us to state the final conclusion. And to do that we're going to compare our P-value with our significance level. So here we've got a P-value of just under 5%. 5% is our significance level. So we're just barely inside the region of rejection, and that means we're going to reject the null hypothesis. It doesn't matter whether you're close to the boundary or far away from the boundary. In is in, and out is out. We're in, so that means we're in. Reject the null hypothesis. And every time you reject the null hypothesis, there is always sufficient evidence. Nice work!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

2 Comments

David Balducci link

7/1/2021 10:14:30 pm

Hi,
Thanks for writing this in-depth review.
In fact, I found the information I was looking for in your article. It’s really helpful.
I have benefited from reading your article.
Thank you very much

Juanita James link

2/11/2022 02:17:12 pm

How do I sign up for homework help. I failed intermediate statistics but then I found you. I passed elementary statistics and now they put me back to intermediate. I NEED YOUR HELP.