Homework Help

Distinguishing between statistical and practical significance for a gender selection study

7/24/2020

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to distinguish between statistical and practical significance for a gender selection study. Here's our problem statement: Determine whether the results appear to have statistical significance and also determine whether the results appear to have practical significance. I n a study of a gender selection method used to increase the likelihood of a baby being born a girl, 1,918 users of the method gave birth to 940 boys and 978 girls. There is about a 20% chance of getting that many girls if the method had no effect.

Solution

OK, so what we need to do here is fill in the right selections for each of the different blanks in this statement that's been written for us. And the first thing we're going to do to evaluate what's going on is calculate what do we actually have in our samples? So the proportion of girls in the sample is going to be the number of girls divided by the total number of births. So here I'm getting almost 51%. So there's a 20% chance that this extra 1% bump I'm getting is completely by chance, which for statistical significance is not a whole lot going on there. So we'd say that this does not have much statistical significance because the 20% chance of getting that randomly is actually fairly low. If this number were greater than 50%, then we would probably say that it does have statistical significance, but here 20% is relatively low. So we're going to say it doesn't have statistical significance.

And for an extra 1% bump, you're employing this method? You know, I would say that not many people would be using this procedure because it doesn't really give you that much of a bump over the 50%. I mean, we're asked to round to the nearest integer, so that would give us 51%. And so we would say that this method doesn't have very much practical significance because you're only getting an extra 1% bump from using the methodology. I mean, that's not a whole lot more guarantee of success for the outcome that you're looking for. Nice work!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching. We'll see you in the next video.

4 Comments

Using bootstrap methods to construct a mean confidence interval estimate

6/26/2020

1 Comment

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use bootstrap methods to construct a mean confidence interval estimate. Here's our problem statement: Use the Geyser eruption duration times in seconds and the accompanying 200 bootstrap samples to complete parts A through C below.

Part A

OK, Part A wants us to use the bootstrap method with 200 samples to find a 95% confidence interval estimate of the population mean. So to do that, the first thing we need to do is take these bootstrap samples and we need to stick them in StatCrunch. And it takes a while for this to load up because, yeah, there literally are so many samples. You've got 200 samples, and each of those samples contains 15 values. So there's a lot of data here, but that's not unlike what we see in a real world situation. So I'm going to put those here in StatCrunch. And I'm going to resize this window a little bit to help us to see a little bit better everything that's going on in our problem statement here.

And now here in StatCrunch, I'm going to calculate the mean value for each of these 200 samples. The reason why I'm calculating the mean is because that's the parameter that we're looking at for in the population; we're looking for the population mean. So I want to calculate the mean for each of these different samples. To do that, I go to Stat --> Summary Stats --> Columns. Here I'm going to select all the columns that I have. So I select one of them, and then I just --- on my keyboard, I'm going to select Ctrl+A; that selects all the different samples. I just need the mean. So I'm just going to select just that one to come out. I hit Compute!, and now here are all 200 mean values for each of the different samples.

I need to sort this list from smallest to greatest, from lowest to highest. And I do that by clicking on this little double arrow up here. So now it's sorted automatically from lowest to highest. Now I have to figure out where's the actual cutoff between the numbers I need and the numbers I don't, because I want a 95% confidence interval. That's going to be the 95% of the numbers that are in the middle of this list. So that means there's going to be some numbers that are lower, some numbers that are higher than what we actually include in our confidence interval.

So where are the cutoffs? Where's the upper and lower limits? Well, to do that, I'm going to take my 95% confidence interval, and I'm going to subtract that 95% from 100%. That gives me 5%. I divide that by 2 because half of this is on the left and half is on the right of my "distribution." So now I multiply that by the 200 --- whoops --- and that gives me 5. So I want to go down five values --- one, two, three, four, five. And I want to take these two numbers here. The cutoff value that I'm looking for is the average of these two values here. So in my calculator, I'm going to take the first number and I'm going to add it to the second number. And then I divide by 2. There's my lower limit. And so I'm going to put that number here in my answer field.

And I'm going to do the same thing for the other end of the list. So I scroll down to the bottom and I count up one, two, three, four, five. So now I'm going to want this number and the number that comes before it. So I'm going to average those two numbers out and that's going to give me the upper limit that I need for my confidence interval. Stick that number in here. Fantastic!

Part B

Now Part B wants us to make a similar 95% confidence interval estimate using the T distribution. To use the T distribution, I can't be using all 200 of these samples; I just need one. And that's why they give you this one sample up here in the problem statement. So the first thing I'm going to do is I'm going to swap out samples in StatCrunch. And again, I'm going to resize this window so I can see better what's going on.

Now for this one sample of 15 values, I'm going to calculate the mean and the standard deviation. To do that, I go to Stat --> Summary Stats --> Columns, select the column, so the only columns that got data here, then I want the mean and the standard deviation. The reason why I want both of these numbers is the mean value is going to give me the center point for my confidence interval. The standard deviation is going to help me calculate the margin of error, which is the distance between the center point and the ends of the confidence interval, the lower and upper limits that I'm asked to provide here in my answer fields.

So to get the lower and upper limits, I'm going to take my standard deviation, put that in my calculator here, and I'm going to divide that by the square root of the number of values in my sample, which here you can see is 15. So I'm gonna divide that by the square root of 15. And now I've got to multiply this number by my T-score. To get the T-score, I have to go to my T calculator. So I go to Calculate --> Stat --> Calculators --> T. Here are my degrees of freedom; it's one less than the number of values I have in my sample. I want to go and hit the Between option up at top because we've got a portion on the left and a portion on the right that's not included in our confidence interval. Here for the percentage, that's going to be the level of my confidence, which is 95%. [Hit] Compute!, and here's my T score, 2.1448. So back in my calculator, I take this number that I calculated previously, and I'm going multiply that by 2.1448. This then is my margin of error.

So if I come back here to StatCrunch, get my mean value here. So the center point, 238.2, I need to subtract from that the margin of error. So I take this --- if I make it negative and then add it to the mean, that's the same effect of subtracting it from the mean --- and there's my lower limit. I want one decimal place. I gotta re-type that. And that we'll do the same thing with --- to get the upper limit. So first I'm going to go back and get my margin of error, and I add that to my center point, which is the mean value. And there's my upper limit. Excellent!

Part C

And now Part C asks us to compare the results. So if we look, our upper limits here in this case are exactly the same here --- excuse me --- the lower limits are exactly the same. The upper limits, they're not exactly the same, but they're not that far apart either; they're actually pretty close to each other. So we could say that the results are reasonably close to one another. Well done!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

1 Comment

Recognizing a balanced design in two-way ANOVA hypothesis testing

5/9/2020

1 Comment

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to recognize a balanced design in two-way ANOVA hypothesis testing. Here's our problem statement: Researchers randomly select and weigh men and women. Their weights are entered in the table below so that each cell includes five weights. Is the result a balanced design? Why or why not?

Solution

Well, here we have our samples here in the table. And we notice we have the same number of samples for each of the cells in the table. This is what it means when it's talking about balanced design. So here we've got five samples, and here we've got five samples. You don't have any more samples for any combination of row factor or column factor than you do for any other. And that's what it means by balanced design. So we see we have five samples for each of the cells in our table. So that means that we do have a balanced design. Excellent!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

1 Comment

Performing the Kruskal-Wallis test for chest deceleration measurements

5/7/2020

0 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform the Kruskal-Wallis test for chest deceleration measurements. Here's our problem statement: Use the following listed chest deceleration measurements (in g, where g is the force of gravity) from samples, small, midsize and large cars. Use a 5% significance level to test the claim that the different size categories have the same median chest deceleration in the standard crash test. Do the data suggest that larger cars are safer?

Part 1

OK, the first part of this problem asks for the null and alternative hypothesis. With the Kruskal-Wallis test, it's pretty much standard. The null hypothesis is going to be that all of the means or median values are going to be the same. And the alternative hypothesis is going to be that at least one of them is different, so they're not all the same. And let's see what we got here. So we want equal medians, not all equal. This looks good. Excellent!

Part 2

Next we're asked to compute the test statistic. StatCrunch makes this super easy. So I'm going to dump my data here into StatCrunch. Let's resize this window so we can see better what's going on. OK, here in StatCrunch, I go to Stat --> Nonparametrics --> Kruskal-Wallis. Here in my options window, I'm going to select my columns, and that's all there is to it. Here's my test statistic. I'm asked to round to three decimal places. Excellent!

Part 3

The next part asks for the P-value. We've already got that calculated. It's right next door to the test statistic here in the results window. Were asked to round to four decimal places. Good job!

Part 4

And now we're asked to state our conclusion and answer this question: "Do the data suggest that larger cars are safer?" Well, are we going to reject or fail to reject the null hypothesis? Well, we've got a P-value of just under 5%. Our significance level is 5%, so we're just inside the region of rejection. But it doesn't matter whether you're in a little bit or way in; in is in. So we're going to reject the null hypothesis, and we're going to say that there is sufficient evidence. And what are we having sufficient evidence for? Rejecting the claim samples are from populations with the same median, because that's what this says right up here. We're rejecting the null hypothesis that says they all have the same median value. Nice work!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

0 Comments

Recognizing the effect of data transformations on two-way ANOVA results

5/6/2020

0 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to recognize the effect of data transformations on two-way ANOVA results. Here's our problem statement: The accompanying data shows a sample of pulse rates in beats per minute that were categorized with two factors: age bracket in years and gender. The data were used to illustrate the way to use a two-way ANOVA. How are the results affected in each of the following cases? A) the same constant is added to each sample value; B) each sample value was multiplied by the same non-zero constant; C) the format of the table is transposed so that the row and column factors are interchanged; D) the first sample value in the first cell is changed so that it becomes an outlier.

Part A

OK, so the first part of this problem, Part A, wants us to consider if we transform the data by adding the same constant to each data value, how does that affect the ANOVA results? Will the test statistic change? Will the P-value change? Well, if you consider the way that the test statistic is actually calculated, you come to understand that you know it's based on differences between individual data values and the mean value and then you square those differences. So you know adding the same constant to each sample value is going to shift everything on your number line. But the differences between the mean value and each individual value is still going to be the same. So you know when you square those differences you're going to get the same numbers coming out. So that's not going to change your results at all. Nice work!

Part B

Now Part B wants us to consider a data transformation in which we multiply each sample value by the same non-zero constant. Well, this again is going to not have an effect, because when you multiply each data value with the same non-zero constant, in essence what you're doing is you're keeping the proportions of the squares of the differences the same. So again, that's not going to change anything. Nice work!

Part C

Next, Part C wants us to consider if we transpose the row and column factor so that their data are interchanged. So the row data becomes the column data, the column data becomes the row data. Well again, you know the calculation for your values there in your ANOVA table. And the division that you make to get this, the test statistic, it's kind of similar to what you see with the linear correlation coefficient where it doesn't matter if you swap the X and Ys; you're still going to get the same value out for R. It's the same sort of thing here. You're still going to get the same values that you use to divide to get your test statistic out. So the test statistic is going to stay the same. And of course, if the test statistic stays the same, so is the P value; that's not going to change either. So yeah, you can swap data around all you want. That's not going to change anything. Nice work!

Part D

And now the last part, Part D says, "Choose the correct answer if we just change the first data value in the first cell so that it becomes an outlier." Well, OK, now we're going to see some changes here because, see, before in each of these three instances we were changing all of the data the same way. Now we're going to change just one data value. We'll leave the rest of it alone. Now that's good. Now we're going to see a difference with that. And just as we saw with wide swings in our linear correlation coefficient with the introduction of an outlier in our data set, we're going to see the same thing here. If you put an outlier in your data set, that's going to radically shift the value of the test statistic. And hence also the P-value is going to change too.

So let's see what we got here. Yeah, this one looks good. "Both the test statistic and P-value will most likely change because outliers can dramatically affect and change the results of an ANOVA." Yeah, that's going to be the one they want, but let's check the other options just to make sure. "The P-value and mean will only change by a very small amount because ANOVA is robust against outliers." That's definitely not true. "The P-value will be approximately 1 minus the previous P-value." Where in the world did that come from? I don't know. OK, so I'm pretty sure with the answer we got up here. Nice work!

And that's the end of the problem. And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

0 Comments

Performing the runs test for randomness in law enforcement fatalities using StatCrunch

5/5/2020

0 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform the runs test for randomness with law enforcement fatalities. Here's our problem statement: Listed below from left to right and then top to bottom are numbers of law enforcement fatalities for 20 recent and consecutive years. First, find the mean, identify each value as being above the mean (A) or below the mean (B). Then test randomness above and below the mean using alpha equals 0.05. Is there a trend?

Part 1

OK, here we have our data. And the first part of the problem is asking for the mean of these data. So let's go ahead and dump that data into StatCrunch. So here we have the data now loaded into StatCrunch. So we're going to go to Stat --> Summary stats --> Columns. Here in my options window, I select the column where my data can be found. And then I want to select the mean. Now I'm ready to go get it. And here it is. We're not asked to round to any number of digits. In fact, the instructions specifically say do not round. So I won't. Nice work!

Part 2

Now the next part wants us to determine the null and alternative hypotheses. This is pretty much set for runs tests for randomness. So here we're going to have an alternative --- excuse me, a null hypothesis that says the data are going to be in a random order. And then of course the alternative hypothesis will be the alternative to that, which is that the data are in an order that's not random. Good job!

Part 3

Now the next part wants us to find the test statistic. And to do this we need to figure out what our sample sizes are. So let's go ahead and do the categorization that was mentioned here in the promise statement . So every one of these values that is above the mean, I'm going to categorize with an A. And everyone that is below, I'm going to categorize with B. So now I just need to go through and categorize each one of these values in turn.

Well, 158 is just barely above the mean and 157 just a little bit below the mean. This is the kind of thing that a computer is really adept at doing . And that's why, you know, I wish that some of this functionality had been programmed into StatCrunch, because, I mean, it's really not that hard to do the --- I don't think it would be that hard to do, and you wouldn't have to do all this manual labor. I mean, come on, it's the 21st century.

OK, so here we've got all of our categorizations done. Now we just got to do the counting. So how many As do we have? One, two, three, four, five, six, seven, eight, nine, ten As. And we've got one, two, three, four, five, six, seven, eight, nine, ten Bs. 10 and 10. So 10 is below 20, so both of our sample sizes are less than 20. And we've got a 5% significance level here, so that means we can use the number of runs as our test statistic. So how many runs do we have? Well, let's find out here. We've got one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve; I see 12 runs. So that's going to be my test statistic. Excellent!

Part 4

Now we're asked to determine the P value and --- Oh, OK, here I just noticed something here. I've got an error in my --- do I have an error? It looks like, yeah, I do have an error. 163 is actually above the mean value, so I need to replace that with capital letter A. OK, so that's going to change my counts to 9 and 11. We're still --- both numbers are still less than 20, so we're good here with the test statistic, but now we got got to get the P-value. And to get the P-value from StatCrunch, we first have to calculate our Z score test statistic. And to do that we've got to do this manual hand calculation, the old school way looking at all of these. Oh my gosh! It's just a behemoth of an equation.

Anywho, let's get to it. So let's see. What are our values? We got g, which is the number of runs; that's 12. And then we've got our sample sizes, which are going to be 9 and 11. Then we just substitute those values into our equation, and we simplify, and we punch into a calculator and out comes our Z score test statistic of 0.510807. And the decimal just keeps going on and on and on and on and on and on.

So now I can take that number, come back into StatCrunch, and I'm going to select Stat --> Calculators --> Normal. Here in my normal calculator, I want to select the Between option because we need a two tailed test. And then here I just put in that Z score test statistic that I just calculated. And I'm not going to take all those decimal places; four should suffice. Whoops, it helps to put in the right number. Then I just press Compute!, and out comes the area in between the tails. The P-value is the area inside the tails, so I need to take that number, and I'm going to subtract it from one. So one minus the area in between the tails gives us the area outside the tails. And this --- or excuse me, the area of the tails, and this is our P-value right here, which we're asked to round the six decimal of, excuse me, three decimal places. Nice work!

Part 5

Now the next part wants us to determine a conclusion for our hypothesis test. With such a high P-value, we're going to be well above our significance level. So that means we're outside the reason of rejection. Therefore, we failed to reject the null hypothesis. Whenever we fail to reject the null hypothesis, there is not sufficient evidence.

But what is there not sufficient evidence of? What is there not sufficient evidence for? Well, we failed to reject the null hypothesis, so there's not sufficient evidence to side with the alternative hypothesis, which we see here is saying that the data are not in random order. So that's what we're going to put here. They're in an order that's not random. Fantastic!

Part 6

And now the last part of this problem asks, "What do the results suggest?" Well, if we can't conclude that the data are in a random order, then we're supposing that the data are in a random order. That's what we --- that's why we failed to reject the null hypothesis, because it's potentially true. It's potentially true the data are in a random order. And so if there's a random order, that means there's no trending. And if there's no trending, that means the values are scattered above and below the mean value. Nice work!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

0 Comments

Performing Spearman's rank correlation test for audience impressions

5/4/2020

1 Comment

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform Spearman's rank correlation test for audience impressions. Here's our problem statement: The table below lists the numbers of audience impressions in hundreds of millions listening to songs and the corresponding numbers of albums sold in hundreds of thousands. Does it appear that album sales are affected very strongly by the number of audience impressions? Use the significance level of alpha equals 0.05.

Part 1

OK, the first part of this problem is asking for null and alternative hypotheses and for a rank correlation test. That's pretty much going to be set. The null hypothesis is going to say there is no correlation, so it's going to be equal to zero. And the alternative will be that there is correlation, so the correlation coefficient will not be equal to zero. We want to get a population parameter because we always use population parameters with our null and alternative hypotheses. So we don't want to choose a sample statistic, and that means we're going to select this answer option here. Nice work!

Part 2

Now the next part asks us to find the value of the correlation coefficient. To do that, we're going to take our data and dump it into StatCrunch. OK, here we are in StatCrunch. And now I'm going to go to Stat --> Nonparametrics --> Spearman's correlation. Here in my options window, I'm going to select the columns where my data can be found, and I'm just going to hit Compute!, and here it is. There's my correlation coefficient. Notice the negative sign out in front. We want to be sure to include that. Fantastic!

Part 3

Now the next part asks for critical values, and critical values are obtained from a table where we've got less than 30 for sample size. And here we definitely have less than 30 for sample size. So n is equal to 9. So we've got nine sample pairs, and we're asked to use the significance level of 5%. So if I look on the table here for sample size of 9 and 5%, I'm going to get 0.7. So I've got two critical values, so I'm going to use my plus or minus sign so I only have to type the number in once. Well done!

Part 4

And now the last part asks us to resolve the hypothesis test. Well, here we've got a test value --- a test statistic, rather, of -0.181, and that's going to be between 0.7 and -0.7. So therefore we're outside the region of rejection, and we're going to fail to reject the null hypothesis. And every time we fail to reject the null hypothesis, there's insufficient evidence. But that's not how these are worded here. So let's see what we got. Yeah, this one says there appears to be a correlation between the number of audience impressions and the number of albums sold. And this one says there does not appear to be a correlation. So which is it?

Well, we failed to reject the null hypothesis, which means it's potentially true. And here the null hypothesis says there is no correlation. So it's potentially true that there is no correlation. So there does not appear to be a correlation is what we're going to answer. Well done!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

1 Comment

Performing proportion hypothesis testing on vehicles with front license plates

5/2/2020

1 Comment

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform proportion hypothesis testing on vehicles with front license plates. Here's our problem statement: Test the given claim. Identify the null hypothesis, alternative hypothesis, test statistic, P-value, and then state the conclusion about the null hypothesis as well as the final conclusion that addresses the original claim. Among 2072 passenger cars in a particular region, 241 had only rear license plates among 307 commercial trucks. 45 had only rear license plates. A reasonable hypothesis is that commercial truck owners violate laws requiring front license plates at a higher rate than owners of passenger cars. Use a 10% significance level to test that hypothesis. A) Test the claim using a hypothesis test. B) Test the claim by constructing an appropriate confidence interval.

Part A1

OK, so Part A wants us to conduct a hypothesis test, and the first part of Part A asks us for the null and alternative hypotheses. So here we're going to look at not selecting answer option D because the null hypothesis here is not a statement of equality. Of the three answer options that remain, we need to look at the alternative hypothesis. To do that, we go back and look at the claim that's being made. And the claim is that commercial truck owners violate laws requiring front license plates at a higher rate than the owners of the passenger cars. So trucks are going to be greater than cars, but which is 1and which is 2. Well, if you read the problem statement, the cars are mentioned first, so they're going to be the first proportion. And then the trucks are going to be the second proportion. So trucks are greater than cars. That means 2 is greater than 1, which means 1 is less than 2. And the alternative hypothesis that says 1 is less than 2 is going to be this one right here. Well done!

Part A2

Now the next part of Part A wants us to identify the test statistic. And to do that, we need to whip out StatCrunch. So I'm going to pop out StatCrunch here. And we'll resize this window to give us a better view of what's going on. OK, here in StatCrunch, I want to go to Stat --> Proportion stats (because we're dealing with proportions) --> Two sample (because we have two samples) --> With summary (because we don't have any actual data).

Here in the options window, we need to put in some statistics for our samples. The first sample is the one that was mentioned first, which is the passenger cars. So the number of successes --- we're going to consider a success having only a rear license plate, so I take that number right there from the problem statement --- 241. And I'm going to put in the total number of observations, which is the total number of cars --- 2072. I do the same thing with the trucks. And now down here under Hypothesis test, I want to make sure that this matches what we have earlier for our null and alternative hypotheses. Notice the format is written differently, but that's OK; they're algebraic equivalents. If I just take an add p2 to each side, I get the same thing. It's listed right over here. So I want to make sure that symbol is the same as this symbol, and now it is. And so I'm just going to leave that zero alone because that makes these two algebraic equivalents.

Now I'm ready to go and get my test statistic. And here we see the test statistic right here, second to last value there in that results window table. That's good. I'm asked to round to two decimal places. Nice work!

Part A3

Next we're asked to identify the P-value. The P-value is right next door to the test statistic; it's that last value there in that results window table. And I'm asked to round to three decimal places. Good job!

Part A4

Now the fourth part of Part A says, "State the conclusion about the null hypothesis as well as the final conclusion that addresses the original claim." Well, if I go back and compare my P-value with my significance level --- and let's see, where do we have our significance level? I'm looking, I'm looking, I'm looking. Wow, I don't see where it --- oh, it's right here. Duh! Right in front of you --- 10% significance level. So if I come down here, I look at 10% significance level. My P-value is 6%, so we're under the significance level, which means we're inside the region of rejection. Therefore, we're going to reject the null hypothesis. And whenever you reject the null hypothesis, there's always sufficient evidence. Excellent!

Part B1

Now Part B asks for a confidence interval from the same data. So I could go back through all those motions again, but I'm lazy. So I'm going to come back up here, click on Options --> Edit, and then down here I'm gonna switch this radio button to Confidence interval. And they don't specify a confidence level, so we have to determine the appropriate one. If we've got a 10% significance level, that would mean alpha is 10%, but we've got two samples. So I've got to subtract two alpha, and that's going to give me an 80% confidence level.

Now I got my upper and lower limits for my confidence interval, and I can place those in here. We're asked to round to four decimal places. It's making me count today! There's my lower limit. Now I get to put in the upper limit. Fantastic!

Part B2

And now the second part of Part B asks us to interpret the confidence interval, which we can see here. We look at their confidence interval. It does not contain zero, and so, because it does not contain zero, that means one of these proportions is always going to be bigger than the other. Since the entirety of the confidence interval is in the negative region of our number line, this difference is always going to be negative. So that means p2 is always going to be greater than p1. And that was the actual claim that we were making, because 2 corresponds with the trucks, 1 corresponds with the cars. And so, 2 being greater than 1 means that the trucks are going to be greater than the cars, which means their rate of noncompliance is higher than the rate of the owners of the passenger cars.

So because the confidence interval does not contain zero, there is a significant difference between the two proportions. Because there's a significant difference, that means we can reject the null hypothesis, because the null hypothesis is a statement of equality. And every time we reject the null hypothesis, there is sufficient evidence. Excellent!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

1 Comment

Performing the Wilcoxon signed ranks test for earnings by education

4/29/2020

0 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform the Wilcoxon signed ranks test for earnings by education. Here's our problem statement: The table shows the earnings and thousands of dollars of a random sample of 11 people with bachelor's degrees and 10 people with associate's degrees, and alpha equals 0.05. Is there enough evidence to support the belief that there is a difference in the earnings of people with bachelor's degrees and those with associate's degrees? Complete Parts A through E below.

Part A

OK, first we're asked to write the claim mathematically and identify the null and alternative hypotheses. We know that the null hypothesis is going to be a statement of equality. And here it's pretty much going to be that there is no difference in the earnings. The alternative hypothesis will then be that there is a difference in the earnings. Let's check what we're testing here. I don't see anything about a claim but we're looking for evidence to support the belief that there is a difference in the earnings of people with bachelor's degrees and those with associate's degrees. So there is a difference in the earnings, and that's going to be the claim that we're testing. And that's going to be the alternative hypothesis. Nice work!

Part B

Now Part B asks us for the critical values. We can get the critical values from the distribution calculator in StatCrunch. Our test statistic is a Z score, and so therefore we can get our critical values from the standard Normal distribution because that's where Z scores come from --- the standard Normal distribution. So here in StatCrunch, I'm going to go to Stat --> Calculators --> Normal. Here my alternative hypothesis is there is a difference. So that difference could be less than or greater than; it could be negative or positive. So that means that we're going to have to select the Between option up here because we've got a two-tailed test. And let's see, we've got a significance level of 5%, so that means the area in between the tails is going to be 95%. And there's my critical values. I'm gonna use this plus or minus sign so that I don't have to type the number more than once. Excellent!

Part C

Now Part C asks us for the test statistic. The test statistic --- well, I would love it if StatCrunch could do this, but for an independent Wilcoxon signed ranks test, that's not going to happen, at least not with the way StatCrunch is coded right now. So that means we're going to have to go the old school route and use the data in Excel.

Here's my data in Excel. Let's do a little bit of housekeeping so we can see a little bit better everything that's going on. First thing I'm going to do is I'm going to center everything. And now we're going to come and make that whole first row bold typeface. Let me go ahead and replace some of these values here. We'll have a sample row here for our first column. So here we're going to list the numbers. I'm going to solve with the bachelors salary is going to be Sample 1, and then my associates salary is going to be Sample 2. No, we don't need you anymore. Now we're just going to relabel you as salary. OK, now we can see better everything that's going on.

So now in the next column I want to put my rankings, but I've got to sort this list first. So let's go up to Data --> Sort. I want to sort by salary from smallest to largest. Now we can put our rankings in the preliminary rankings. We'll start with just one, two, three, and so on and so forth. There we go. Now we've got 21 values in our data set.

And so now we want to look to see if there are any same salaries. And there are, so we've got to break ties, which means we've got to make adjustments to our rankings. So let's see, the first tie is right here, so that average is 4.5. So that changes the rankings of those to 4.5. And then these two are the same, so we have to replace those. And let's see, what else did we get? Those three are the same, so we're going to replace those. And the next two are the same, so we replaced those. And then these two are the same, so we're going to replace those. And that's it. So now the ones that have unique values to them, we're just going to have to bring those over, and there. We're done. So now we've got all of the proper rankings that we have for each of our values.

Now I'm going to resort my list, this time by sample number. So now I've got all the samples here, and our value is going to be some of the smaller samples. So we see Sample 1. It has a sum of 143, but there's 11 here. The other sample has 10 values, and its sum is 88. So this is going to be our R-value 88. n1 will be 10, and then n2 will be 11 . So to start off with our old school calculation here, here's our formula. And we've got 88 for an R-value, n1 is 10 and n2 is 11. So we substitute these values into our expression, and then we start to simplify. So we get to a point where we can put stuff in a calculator and out comes our test statistic, which we're asked to round to two decimal places. Excellent!

Part D

Part D says, "Decide whether to reject or fail to reject the null hypothesis." We can do that. Simple enough. We've got our test statistic here. We've got our critical values here . So if I come back here to StatCrunch, we've got the critical value here in the tails and they're bounded by --- excuse me, the critical values are bounding the tails. And we've got our test statistic, -1.55, so that's going to put us right about here. So our test statistic is here in between the tails outside the region of rejection. So because it's between the critical values and outside the reason of rejection, we're going to fail to reject the null hypothesis. Good job!

Part E

Now Part E wants us to interpret this conclusion of our hypothesis test in the context of the original claim. So we failed to reject the null hypothesis. That means it could potentially be true, and failing to reject the null hypothesis means that this statement "There is no difference in earnings" is potentially true. So there's not enough evidence to support the claim. Well done!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

0 Comments

Performing contingency table hypothesis testing to evaluate baseball player birthdays

4/28/2020

2 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform contingency table hypothesis testing to evaluate baseball player birthdays. Here's our problem statement: An author argues that more American-born baseball players have birth dates in the months immediately following July 31st because that was the age cutoff date for non-school baseball leagues. The table below lists months of births for a sample of American-born baseball players and foreign-born baseball players. Using a 5% significance level, is there sufficient evidence to warrant rejection of the claim that months of birthdays of baseball players are independent of whether they are born in America? Do the data appear to support the author's claim?

Part 1

OK, I can't be really sure, but I think the author that they're looking at here is Malcolm Gladwell, and they're making reference to his popular book Outliers. I've actually read Outliers. In the table --- he actually talks about hockey players, but you know, he extends what he's saying about hockey players to all sports. And it's the same idea that whether or not you succeed, your chances of success are going to depend upon when you are actually born. And he extends that idea to a whole lot of other things to success and finances and all those kinds of stuff. And it's kind of an interesting theory to look at. So we have a problem here that's similar to the theory that he has in his book.

Anyhoo, the first part of this problem asks us to identify the null and alternative hypotheses for a contingency table test. The null hypothesis is just going to say that we're equal to the status quo, and the status quo is whatever we claim it to be. So here the claim is that months of birthday, months of births --- let me start over. The claim is that months of births of baseball players are independent of whether they are born in America. So the months of births are independent of where they're born. And that means the alternative will be the compliment of that, that the months of births are dependent on where they're born. So let's see, that's going to be this option here. Good job!

Part 2

Now the next part asks us to identify the test statistic, and the test statistic we're going to get out of StatCrunch. So let's take this data table here and dump it into StatCrunch. I'm dumping the data into StatCrunch. OK, here we go. So now inside StatCrunch, I'm going to go to Stat --> Tables --> Contingency --> With summary. Here underneath the Columns option in my options window, I want to select every column except for the first one; I want to select all the columns with numbers in them. So I'm going to select this first one and then scroll down to the bottom. And then while I'm holding the shift key on my keyboard, I left click on that last option there. And see, now I've selected every option but the first one. And that's what I want, because this first column doesn't have actual numbers in it, so we don't want to select it.

Then here under Row labels, I want to make sure I select a column with numbers. And it doesn't matter which one I select, it's gonna --- you know, it doesn't matter which one you select as long as it has numbers in it. So let's just select January. I like January. It's birthstone is the garnet. (I think that's right.) And I like the kind of red color of the garnet, so I'm going to select January. But you can select a different month, and let's say you were born in April. OK, go ahead and select April. That's fine. It's doesn't matter which one you select.

Now we're ready to go. And here we have our results. And right here we have what we want for our test statistic. And we're asked to round to three decimal places. Fantastic!

Part 3

Now the next part asks for the P-value. Easy peasy! We've already done that. The P-value here is right next door to the test statistic. We're asked to round that to three decimal places. Good job!

Part 4

Now the last part of this problem is asking us to state the final conclusion. And to do that we're going to compare our P-value with our significance level. So here we've got a P-value of just under 5%. 5% is our significance level. So we're just barely inside the region of rejection, and that means we're going to reject the null hypothesis. It doesn't matter whether you're close to the boundary or far away from the boundary. In is in, and out is out. We're in, so that means we're in. Reject the null hypothesis. And every time you reject the null hypothesis, there is always sufficient evidence. Nice work!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

2 Comments

<<Previous

Intro

Solution

Intro

Part A

Part B

Part C

Intro

Solution

Intro

Part 1

Part 2

Part 3

Part 4

Intro

Part A

Part B

Part C

Part D

Intro

Part 1

Part 2

Part 3

Part 4

Part 5

Part 6

Intro

Part 1

Part 2

Part 3

Part 4

Intro

Part A1

Part A2

Part A3

Part A4

Part B1

Part B2

Intro

Part A

Part B

Part C

Part D

Part E

Intro

Part 1

Part 2

Part 3

Part 4

Author

Archives

Stats

Company

Support