Intro Howdy! I'm Professor Curtis of Aspire Mount Academy here with more statistics homework help. Today we're going to learn how to construct and use a proportion sampling distribution table. Here's our problem statement: A genetics experiment involves a population of fruit flies consisting of three males named Alex, Bart, and Christopher and one female named Debbie. Assume that two fruit flies are randomly selected with replacement. Part A OK, Part A of this problem says, "After listing the possible samples and finding the proportion of females in the sample, use a table to describe a sampling distribution of the proportion of females." OK, so the first thing we need to do here is make a table of the possible outcomes in our sample space and then look at the proportions of females in each of those individual outcomes. We can then sort that information in order to produce our probability table. It would probably be easier to do this in Excel because Excel has much better sorting functionality then StatCrunch. So let's just go ahead and use Excel for that. And let's just list the different possible outcomes in our sample space. So we're selecting two fruit flies from among the four that are in the population, and we're selecting with replacement. So the first one we select goes back in to possibly be selected again. So let's just say the first one we pick out is Alex, we put them back in, and then we pick him again. Or we could pick Alex for the first one and then Bart for the second one, or Alex for the first one and then Christopher for the second one, or we could pick Alex for the first one, and Debbie for the fourth one. Our second one --- really fourth one here in the series. So ... that's the possible subset. And I look at the pattern here. So I've got the first one repeated four times, and then I've got each one listed in sequence. So I could actually, if I wanted to, I could just repeat that pattern three more times. Whoops. One for each of the individual fruit flies. And then here I'm just going to put in B four times, and then C four times, and then D four times. This is the pattern that we've got established here, right? So now I look at each row. Each row is a sampling, and I'm going to say, "OK, what's the proportion of females?" That's what we're looking for here, the proportion of females in each one of these samples. Well, Alex is male. In fact, the only female we've got, here is Debbie. So the proportion of females here is going to be 0%. Zero. Zero. Here 50% is female --- Alex is male, Debbie's female, one half is 50%. And I just go through and mark the others the same fashion. So whoops, that's 100%. So now I've got all my, oh, I've got all my proportions out. So now I'm going to select everything and I'm going to come up here to Data and select Sort. I'm going to sort on column C, smallest to largest. Boom, baby! So now it's really easy to get what I need because all I got to do is --- notice zero is the first number, zero here, the first number in our table. We've got nine zeros out of 16 total. So the probability is the part over the whole; the part is 9, the whole is 16. I do the same thing for each of the numbers in sequence. So I've got six .. for 50%. 6 over 16 can actually reduce to 3 over 8. And then of course there's only one value for that last option there. So I check my answer. Fantastic! Part B Now Part B says, "Find the mean of the sampling distribution." The mean is actually best found in StatCrunch. We could actually do it here in Excel, but it's --- it's just that I'm lazy. So let me go ahead and open up StatCrunch, pop the window out so we can actually move around and do something with it. Alright, so here in StatCrunch, I'm going to actually transfer these numbers over here. So it's 0, 5, 1, and then I'm going to actually label these. We could label these Proportion --- oh, I got my caps lock on --- Proportion and Probability. Right. The probability here is 9 over 16, so ... will this calculate it for me? Oh damn. Ugh! Stupid computer. Where's my calculator? Hey, there's my calculator. The actual number here --- 0.5625. And three eighths? I should probably know that one. But again, I'm lazy. And of course, one sixteenth. Gotcha. OK, now I've got my probability table here in StatCrunch, so now I can just go up to Stat --> Calculators --> Custom. The values are the proportions, and the weights are the probabilities. E voila! 25%. It's so easy. Now, nice work! Of course, you know, I can actually get the same thing in Excel. I mean, if you really wanted, I can come back here in Excel, I could actually put in that probability. This would actually calculate it for me. Ooh, yeah, I like that, 9 divided by 16. Oh yeah, baby! 1 divided by 8, and 1 divided by 16. Oh yeah, baby! So here in Excel, what we would do is I'm going to take, and I'm going to multiply each of those proportions by its corresponding probability, and then copy of that down so I can get the same thing there. Then I'm going to sum that up. And there's my mean. Again, a little calculation intensive. I'm lazy. I like StatCrunch to kind of do everything for me. But if you wanted to stay in Excel, there's the way you would actually get the same answer. Part C Now Part C asks, "Is the mean of the sampling distribution from Part B equal to the proportion of --- population proportion of females? If so, does the mean of the sampling distribution of proportions always equal the population proportion?" Well, you should recall from lecture --- well, if you're watching, you know, Aspire Mountain Academy lecture videos, you'll know you should know this. If you're not, well, I hope that your instructor pointed this out to you. It is in the textbook.
There is, you know, a listing of population parameters that are biased and a listing that are unbiased. And proportion is one of those population parameters that is an unbiased estimator. So we would expect that the mean of the sampling distribution would be the same as the population parameter, in this case, proportion. And we see that that's so; the population has only for members to it. There's four fruit flies in the population. Three of them are male, one of them is female. 1 out of 4 is 25%. So yeah, the mean of the sampling distribution does equal the population proportion, and that's because the proportion is an unbiased estimator. So it looks like this answer option is the one we want. Excellent! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below. Let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.
1 Comment
Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform mean hypothesis testing on a politician's claim of survey results. Here's our problem statement: Assume that adults were randomly selected for a poll. They were asked if they favor or oppose using federal tax dollars to fund medical research using stem cells obtained from human embryos. Of those polled, 483 were in favor, 395 were opposed, and 120 were not sure. A politician claims that people don't really understand the stem cell issue and their responses to such questions are random responses equivalent to a coin toss. Exclude the 120 subjects who said that they were unsure, and use a 1% significance level to test the claim that the proportion of subjects who respond in favor is equal to 50%. What does the result suggest about the politician's claim? Part 1 OK, that was a mouthful. Let's get into this. So the first part of our problem asks us to identify the null and alternative hypotheses. The null hypothesis is always a statement of equality, so we're not going to select Answer option C. And then among the three answer options that remain, we can select the correct one by choosing the correct alternative hypothesis. The alternative hypothesis typically reflects the claim unless the claim has some sort of semblance of equality to it, in which case we'll take the compliment. Here the claim --- we look at the problem statement --- it says we're testing the claim that the proportion of subjects who respond in favor is equal to 50%. So the proportion is equal to 50% is the claim. But that is the null hypothesis, because equality by definition belongs to the null hypothesis. So we have to take the compliment of that. The compliment of being equal to is being not equal to. So we want Answer option A. Nice work! Part 2 Alright, the next part of this problem asks us to identify the test statistic. To do that, we're going to have to run a hypothesis test. And the easiest way to do that, for me anyway, is to go into StatCrunch. So I open up StatCrunch, and I'm going to pop this pop out button here so that the window pops out of the window with the problem statement. And now I can move this around. I can resize the window, and I can do all sorts of wonderful little things with this. OK, so to do our hypothesis test, we're going to go to Stat --> Proportion Stats (because we're dealing with proportions) --> One Sample (because we have only one sample) --> With Summary (because we don't have actual data, just summary stats). Number of successes --- well, what is the success? We're testing the claim that the proportion of subjects who respond in favor is equal to 50%, so responding in favor is going to be our definition of success. How many were in favor? Well, here it says 483 were in favor. So I put that up here. Number of observations --- this is the total. So I'm going to pull out my --- hey, where did my calculator go? Guess I'll have to --- let me look for my calculator. Oh, there it is. It magically appeared. OK, so here's the calculator. We're going to take the total 483 we're going to add it to the 395 that were opposed, and we don't have to include the 120 because it said to exclude the 120 who said they were unsure. So just those two together. That gives me my total, which doesn't look right. 483 plus 395 is 1273? Uh, I don't think so. Let's try this again. 483 plus 395 is 878. That looks better. I don't know what happened with that. I'll have to check this out. OK! So that's all we need there. And I'm running a hypothesis test, and these fields match. We selected over here for our null and alternative hypothesis. So I'm all set to go. And here in the table, second to last number in that table as always, is my test statistic. And I'm asked to round to two decimal places. So that is what I'm going to do. Excellent! Part 3 Now the next part asks for the P-value. The P-value as always is next to the test statistic, so the last value there in my results window. Fantastic! Part 4 "Identify the correct conclusion." Well, our significance level is 1%. The P-value we have is three tenths of a percent. So the P-value is less than the significance level. That means we're inside the region of rejection, and that means we're going to reject the null hypothesis. Every time you reject the null hypothesis, there is always sufficient evidence. So this is the answer option we want. Well done! Part 5 And now the last part of this problem asks, "What does the result suggest about the politician's claim?" Well, look here. We rejected the null hypothesis. What's the null hypothesis? The null hypothesis says that the proportion is equal to 50%. Well, we're rejecting that, meaning we're saying this is not true. It's something other than 50%. But the politician --- what did the politician claim? The politician claimed that responses to such questions are random responses equivalent to a coin toss. A coin toss is 50/50 heads or tails. So the politician is claiming that the proportion is 50%, but our hypothesis test resulted in a rejection of that claim. And so therefore we're saying the politician doesn't know diddly squat, which for most politicians is actually pretty spot on.
So let's see here. What are our answer options? "The results suggest the politician is doing his best?" Eh, I don't think so! "The results suggest the politician is correct." I don't think so. "The results suggest the politician is wrong." Oh, yes, I love it! It's ... it's ... it's so right to be so wrong. Yes. And finally "the results are inconclusive." No, they're very conclusive. So here we're going to check our answer. Well done! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just as I want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find the regression equation and best predicted value for bear chest size. Here's our problem statement: The data show the chest size and weight of several bears. Find the regression equation, letting chest size be the independent (or x) variable. Then find the best predicted weight of a bear with a chest size of 48 inches. Is the result close to the actual weight of 430 pounds? Use a significance level of 5%. Part 1 OK, the first part of the problem asks us for the regression equation. To do this, I'm going to go ahead and take this data, and I'm going to dump it into StatCrunch. StatCrunch! Yes! We love StatCrunch. Yes. Alright, here we go. I'm going to resize this window so we can see everything's just a little bit better. Actually, let's do it this way. Right. I'll get you first because I don't really see you as much as I need to see you. Alright, here we go. Regression equation --- so here's my data. I'm going to click on Stat --> Regression --> Simple linear. The x variable is usually the one that's mentioned first. But just to make sure, let's check out the problem statement. It says to make chest size the independent (or x) variable, and so that is the one that's mentioned first. So we're going to select that the y variable or the dependent variable which is, of course, the other variable that we have to select from, and that's all I need to do. Hit Compute!, and StatCrunch will do everything for me as far as the heavy lifting goes. Here is my regression equation, right up here at the top. It's kind of jumbled among tons lots of stuff, so it's a little harder for me to see the numbers. So I go through and look at the parameter estimates table down here. Notice these numbers here are the same numbers that you see up here, so I just go ahead and just use the numbers here from the parameters estimate table. Don't forget your negative signs. If you have a negative sign there, don't forget to put that in. And we want round to one decimal place. Oh, it did not like that! What did I do wrong? Oh, I typed in the wrong number. Looking at the results, I'm looking at the wrong number. It would help to look at the right number! Waking up, making sure that you people out there in YouTube Land are awake! Alright, that gives it to us. Nice work! Part 2 Alright, now the next part of this problem asks, "What is the best predicted weight of a bear with a chest size of 48 inches?" Well, the first question we need to ask when looking for a prediction is "Can we use the regression equation? Will the regression equation give us a reliable estimate or prediction?" So to do that, we need to compare R squared values with the critical R squared value. So if I click on this link right here, I have a table of critical R values. Our sample size here is 6, so that's going to be looking at this column here. And I believe --- yes, right here, a significance level of 5%. So I want to look at the value in this first column for the row where we've got 6, and I get 0.811. That's the bar that we have to clear in order to use the regression equation. Over here, the regression equation, my R squared value is 0.995. That's outstanding! That's practically 1. It's hard to get much better than that. And yet look at when we compare with the critical R value, 0.811, we're actually greater than that. So we're clearing the bar. It's like a hurdle or a pole vault jump, and we're trying to clear the bar. And we've cleared the bar because 0.99 is greater than 0.81. So that means the regression equation is good to be using for predictions. To use this for a prediction, I can either plug it in myself and do it old school style with my calculator, or I can come up here to Options, click on Edit, scroll down here to Prediction of y, put in my x value for the prediction (which here it says it wanted a chest size of 48), and a significance level of 5% so that matches here at the level of 95% confidence, hit Compute!, and I'm going to expand this out. And if you scroll down to the bottom, look at this! It actually calculated it out for me. So there's my predicted value right there, which is a whopping 468 pounds. Wow. Round to one decimal place. And that's a --- that's a big bear! That's a big anybody! Jeez. Look at that! Fantastic! Part 3 Alright, is the result close to the actual weight of 438 pounds? Well, I don't know. Anything over 400 is pretty much all in the same category, I would think. But the difference is a good 30 pounds. 30 pounds! So they're probably not in the same neighborhood. So I would say the result is close --- no the result is not very close. I like that one; the result is very close. No, the result is exactly the same? Let's go with --- let's go with Answer option B. Excellent!
And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find the best nonlinear regression model for bacteria growth. Here's our problem statement: In a carefully controlled experiment, bacteria are allowed to grow for a week. The number of bacteria are recorded at the end of each day as shown below. Construct a scatterplot and identify the mathematical model that best fits the given data. Part 1 OK, the first part of this problem is asking us to choose the correct scatterplot. So you've got four options here, and in order to select the right one, we need to make the scatterplot. But we're going to have to make a regression model anyway to finish out the problem, and we get the scatter plot on the way to making the regression model. So let's just go ahead and make the regression model and then we've got everything we need to go through the different answer options that we're going to be asked to provide. I find it easier to work this type of problem in Excel then StatCrunch. So I'm going to dump my data here into Excel, and why it keeps pulling that up I don't really know. Oh! Really? I don't want to do that. OK, let's do this. Oh, and then I'll bring you down here so you can see little bit better what's going on. OK, so here I've got my data. I don't need to do any data transformations to it. I can just take the data as is. I'm going to select the data here, and then I'm going to come up to Insert, and then I want us to click on that scatter plot icon here. And this first scatterplot with no lines is perfect for our purposes, so we're just going to choose that. Now I'm going to come down here into the plot, and I'm actually going to select --- left click on one of these data points. And then while my mouse is still there --- and notice I haven't moved it at all --- I'm going to right click so I get this menu. And I'm going to select Add trend line. The first thing I want to do is scroll down here to the bottom and click on the boxes for Display equation and Display R squared value on the chart. So notice what that gives us here. We're looking at the different chart here. So now I've got different size --- back on my regression line. OK. So now we've got different models that we can just thumb through, and this is why it's really easy to do this in Excel rather than StatCrunch. The moving average --- of course, we're not going to be using that. That's never going to be one of the options that we're looking at. So we just basically go through the different model options and look for the greatest R squared value here. The Linear is 0.755. I click on Exponential, I get a much better R squared value, 0.99908. That's almost 1; that's near perfect. I don't think we're going to get anything better than that, but just to make sure, we can flip through and make sure that none of those R squared values are higher. They're not, so I'm going to go back to Exponential. And now that the model is actually made, I can click inside the text here for the equation, and I'm going to select everything inside there by clicking on Ctrl+A on my keyboard. And then I'm going to come back up here to Home and increase my font size so I can see that better. If you try increasing the font size before you choose between the different models, you'll find that this actually stays stagnant. It's not going to change when you change the model itself. So you'll defeat the purpose of finding the greatest R squared value. So you have to find them the right model first and then increase the font size. It's just the way it works. Now we've got what we need. And I come back here, and I'm going to resize my window. It's a little smaller because now I've got everything that I need right here. So the correct scatter plot that we're going to be selecting is Answer option D. Excellent! Part 2 And now the next part asks us for the regression equation, which we already have because we've made the model. So notice here, the form of the model. Here our x is actually part of the exponent. So we want the option here that puts the x in the exponent. This x is not in the exponent; it's on the same level as the coefficient. Here this x is in the exponent. This x is not in the exponent; it's actually the base. Notice it's on the same level with the equal sign whereas here the x is a small subscript that appears on a level above where the equal sign is. So this is the model format that we want.
The first number you can just slip in here for that first coefficient. We want to round to three decimal places, it says. So we need to put that in. And then here, it's really tempting to just stick in this number that you have here in the exponent. But that's not the correct number, because notice the base here is e. And e actually has a value, 2.71, and the decimal goes on for pretty much forever. So we need to actually calculate what this is. And to do that, I'm going to bring up my handy dandy calculator. And I'll put in that exponent, get my e(x) function. And this is the number that we need to put in. Again, it says round to three decimal places. So that's what I'm going to do. Nice work! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find the sample size needed to estimate a confidence interval on a population mean. Here's our problem statement: An IQ test is designed so that the mean is 100 and the standard deviation is 20 for a population of normal adults. Find the sample size necessary to estimate the mean IQ scores of statistics students such that it can be said with 99% confidence that the sample mean is within 6 IQ points of the true mean. Assume that σ = 20, and determine the required sample size using technology. Then determine if this is a reasonable sample size for a real world calculation. Part 1 OK, to get the sample size for a confidence interval, the easy way to do that is to go into StatCrunch. Notice there's no icon here for me to click on to dump data into StatCrunch. All the data, so to speak, that I need is actually here in the problem statement. So I'm going to go up here and click on Question help and then click StatCrunch. And if I click on this arrow button here, it'll actually pull the window out of the window where the homework problem is listed so I can actually move this around and I can resize it as needed, which is actually really helpful when you're making a video. It's also helpful when you're actually working the problem out because you're going to get an answer here, and then you can just transfer it over without having to cover up anything. Anywho, here we go to get the sample size. I'm going to go to Stat --- do I go to Z Stats or T Stats next? Well, I'm going to go to Z Stats because remember the key question is "Do we know what the population standard deviation is?" In this case, we do know what the population standard deviation is. They tell us right here in the problem σ = 20. That is the problem --- the population standard deviation. And so we're going to use Z Stats. We have only one sample, and then I want to click down here on Width sample size because we're using the width of a confidence interval to calculate sample size. Down here in my input fields, I'm going to select the confidence interval we want is 99%. Our standard deviation we said here was 20. The width is going to be the width of the confidence interval that we're looking for, and that is twice the margin of error. So the margin of error we have here is 6 IQ points. So that means my width is going to be 12, because 12 is twice 6. I hit Compute!, and it gives me my required sample size. So no need to do that asinine little hand calculation, although you could do it that, you know, old school way if you want to. But hey, I prefer joining the 21st century. I check my answer and I'm told, "Well done!" Part 2 And there's one more part to the problem, and it asks us, "Would it be reasonable to sample this number of students?" Well, this is the minimum required, and it's not a very large number. So yeah, I mean, you could reasonably --- well, I mean, one person could reasonably sample 74 people. So yeah, it's a fairly small number. It should be pretty simple to get out. Fantastic!
And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to apply one-way ANOVA hypothesis testing to bicycle lap times. Here's our problem statement: A certain statistic constructor participates in triathlons. The accompanying table lists times in minutes and seconds he recorded while riding a bicycle for five laps through each mile of a three mile loop. Use a 5% significance level to test the claim that it takes the same time to ride each of the miles. Does one of the miles appear to have a hill? Part 1 OK, the first part of this problem asks us to determine our null and alternative hypotheses. With one-way ANOVA testing, the null hypothesis is always going to be that all of our population parameters are the same; they're equal to each other. The alternative hypothesis is always going to be that at least one of those population parameters is going to be different. This is pretty much set in stone, the way that it is. So when you got one-way ANOVA testing, you know that this is the way it's going to turn out. Well done! Part 2 Now the next part of this problem asks us to find the F test statistic. We can do this by taking the data and dumping it into StatCrunch. So here's my data. I have it here in StatCrunch. I'm going to resize this window so we can see a little bit better what's going on. Notice here how they took the times for the different laps, and they basically broke it out, dividing minutes from seconds into different columns. This is very useful for us. There's a note here that says we need to convert everything over to either minutes or seconds. And if you had the minutes and the seconds in the same column, that is what you would need to do because you've got two different units for time, and the test doesn't work unless all the units are consistent. So you'd have to convert everything to one or the other. Here where the minutes and seconds are actually separated out, we don't need to do that because notice all of the minutes are 3 minutes. And because all the minutes are the same, that means, up to 3 minutes, all the times are the same. They're equal, so we don't even need to consider that. We just need to consider the portion that's not equal, which are the seconds. So because all the minutes are 3 minutes, we can get away with just using the seconds columns for our test to conduct the test. We're going to go to Stat --> ANOVA --> One way. Here I'm going to select the columns that have all the seconds in them. And to select more than one column, I hold down the Ctrl button on my keyboard while I select additional columns. And that's all I need to do. I hit Compute!, and here's my results window. And down here in my ANOVA table is my F statistic, which is what I'm looking for. I'm asked around to four decimal places. Nice work! Part 3 Next we're asked to find the P-value. The P-value is here in the ANOVA table right next to our test statistic. Well done! Part 4 What is the conclusion for this hypothesis test? Well, we were asked to use a 5% significance level, and 5% definitely compared with the P-value, the P-value of --- what is that? Like 1% of 1%? That's definitely less than 5%, so we're going to be inside the region of rejection. And therefore we reject the null hypothesis. Every time we reject the null hypothesis, there is always sufficient evidence. Good job! Part 5 And now the last part of the problem asks us, does one of the miles appear to have a hill? Well, we rejected the null hypothesis. What does that mean? Well, go back to the null hypothesis. The null hypothesis says that everything is the same, that they're all equal. We're rejecting that, which means this is not true. At least one of the three is going to be different from the others.
We come over here to our results window and look at the mean values that are listed for each of the different miles. Notice Mile 3 looks significantly longer than Mile 1 and Mile 2. Mile 1 and Mile 2 are not exactly the same but in the same neighborhood. This is in a neighborhood entirely different. So it looks like Mile 3 may, indeed, have a hill. So let's look at our options here. "Yes, data suggest the third mile appears to take longer and a reasonable expectation is that it has a hill." Good job! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to apply goodness of fit hypothesis testing to horse race pole [post] positions. Here's our problem statement: The table below lists the frequency of wins for different post positions in a horse race. A post position of 1 is closest to the inside rail, so that horse has the shortest distance to run. Because the number of horses varies from year to year, only the first 10 posts positions are included. Use a 5% significance level to test the claim that the likelihood of winning is the same for the different post positions. Based on the result, should betters consider the post position of a horse race? Part 1 OK, the first part of this problem asks us to determine the null and alternative hypothesis. What we're conducting here is a goodness of fit test, because we've got these positions here for the different horses, and they were testing the claim of the likelihood of winning being the same for each of the different posts positions. So because the claim is that everything is the same, and we've got more than just two elements that we're looking at here, that's going to indicate goodness of fit hypothesis testing. When you have goodness of fit hypothesis testing, the null hypothesis is always the same. And it is that everything is equal; everything is the same. So here we're going to look at the different options and select the one that has what we want in it, which is that everything is the same. "Wins occur with equal frequency" --- that's the one we want. The alternative hypothesis will then be one of two things: it either conforms to some given distribution, or, in this case since we're not asked to conform to a distribution, the alternative hypothesis will be that at least one of your elements is different from the others. So at least one post position has a different frequency. Fantastic! Part 2 Now the second part asks for the chi square test statistic. So to do this, I'm going to take the data and dump it into StatCrunch. OK, here's my data. It's in StatCrunch. I'm going to resize this window so we can see everything a bit better. Now to conduct the actual test, I'm going to go into Stat, and then come down here to Goodness of fit --> Chi square test (because we're looking for a chi square test statistic). You can see down here that I need to put in values for the observed and expected values. Here in the observed, I want to select the frequencies that we've seen with the different posts positions. Expected --- I can either put in a column that has values from a given distribution or, since that's not the case here, I'm going to select All cells in equal proportion. And I'm ready to hit Compute!, and there is my chi square test statistic, which I'm asked to round to three decimal places. Well done! Part 3 The next part asks us to calculate the P-value, which we've already done. It's right next to the test statistic there. That last value there in the results window rounded to four decimal places. Conveniently, that's what I'm given here in my answer window. Fantastic! Part 4 What is the conclusion for this hypothesis test? Well, we were asked to use a 5% significance level to test our claim. Our P-value over here, 10%, is greater than 5%, so we're outside the region of rejection, and therefore we fail to reject the null hypothesis. Every time we fail to reject the null hypothesis, there is insufficient evidence. So we're going to want this answer option here. Nice work! Part 5 And the last part of this question asks, "Based on the results should betters consider the post position of a horse race?" Well, what do we conclude from our hypothesis test? We failed to reject H naught, which means that H naught could be true. What was H naught? Well, scroll back up here, and we see that H naught, our null hypothesis, is that wins occur with equal frequency. Well, if they're occurring with equal frequency, and this statement is potentially true, then it shouldn't be a consideration when you're placing your bet. So here I'm going to select No. Well done!
And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find the best nonlinear regression model for a subway fare. Here's our problem statement: Construct a scatterplot and identify the mathematical model that best fits the data shown below. Let x represent the years since 1959 with 1960 coded as x = 1, 1973 coded as x = 14, 1986 coded as x = 27, and so on. Let y represent the subway fare. Part 1 OK, the first part is asking for a scatter plot. And we know the second part --- by looking at the problem statement --- we know the second part is going to ask us to make the actual model that best fits the data. And I find that that's actually easier to do in Excel. So I usually click on here on Open in Excel. Now I already have the data loaded in Excel, and here's the data from the problem loaded in Excel already. Notice here that the problem statement asks us for coded years for our model. So even though they're giving us the actual calendar years that everybody thinks in, they're asking us to make a model in coded years. Now realistically, you would never make a model in coded years, because anyone that would then use that model to make a prediction would need to think in coded years, because that's how you would need to input the data into the model. Nobody thinks in coded years. And so a more user friendly model would be one which you actually use calendar years that everybody thinks in. So my conclusion for the problem here is that the only reason they're making you use coded years is because they're trying to add some complexity to it so they can provide more rigor to the assignment, which to me is just absolutely ridiculous. I mean, let's just keep it simple, stupid. But since they forcing us here to use coded years, we need to make our model with coded years. And the easiest way to do that is to come over here, and I select this column so I can then right click on my mouse and then insert a column. The reason why I want to insert a column is because Excel normally considers the values here on the left column that we select to be the variables for the horizontal axis and then the values here in the right column for the vertical axis. So we don't want to add it here right after the information we're given because then everything's in reverse from the way Excel normally thinks. So I'm gonna put my coded years in here. Then I'm going to expand this out so I can see the whole title. And now to make the coded years, notice what we're doing is for each one of these years, we're subtracting out our zero year, which is 1959. So 1960 minus 1959 gives me the 1 that I'm supposed to be using as a coded year. So here on the cell, if I just type an equal sign and then select that cell next to it, then minus my zero year of 1959, I press Enter, and voila! There is a coded year. Now notice if I take the cursor here on my mouse, and notice how when I put it over the right bottom corner that cell, notice how it changes to a smaller plus sign. If I leave it on a smaller plus sign and I push down the left button of my mouse and hold that button down while I select the cells below, I extend that same formula down to these cells so I don't have to type the formula into each of those individual cells; it's automatically copied and calculated for me. So now I have got --- if I go here and select these cells, here are the ones I want to select from my model. So once I have those selected, I come up here to Insert, and then I'm going to click on this button here for the scatter plot. And I want this first scatter plot here that has just the points and nothing else. So there's my scatter plot that I made. And if I look at my answer options here, the one that seems to be the best fit is Answer option D. Nice work! Part 2 Now the second part asks us for the actual regression model. This is where Excel comes in really handy. You can do the same thing in StatCrunch, but StatCrunch is really clunky for this sort of thing. And unless you have a little cheat sheet that lets you know which model to make or, you know, which type of model to make, which equation type you use for your model, you're going to have to make each one one at a time. Excel is really good for this because we don't have to do any of that.
So what I want to do here in Excel is I'm going to click inside the plot area, and we'll click on one of these data points here. And then I'm going to right click, and I'm going to select Format data series. Actually I want to select to Add trendline. So if I move my cursor over here --- so now what I want to do is scroll down here to the bottom and click on Display equation and chart and Display R squared value on chart. So notice how those values came up here. OK, now what I want to do is go back up to the top, and Linear is selected first by default. So we're going to compare R squared values here with a different model type. Exponential gives me a much better R value. You see that's 0.9709, so this is the number to beat. No, that's a lower number. That's a lower number. That's a lower number. That's a lower number. And you're never going to need to use Moving average. So it looks like Exponential is the one we want to use. It has the best R squared value to it. I click inside that label there and I select everything inside the label and then I go back to home. I look at my font size, I can blow that up so I can see that a whole lot better. OK, so now over here in my answer fields, I need to select the right model type and the Exponential function. If you look at the model type here, see, the x is here in the actual exponent and there's a coefficient out front. So that looks like this model type here. And so I put the numbers in. So the coefficient is going to go here, and I'm asked around to three decimal places. And then for the number here, it's tempting to just put this number in here. But notice this is an e raised to this power. So I've got to whip out my calculator and calculate that value out --- 0.067, and I have to use that as an exponent, e to that power. So here's the button on the calculator for that. And this is the number I need to put into my answer field. Good job! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. |
AuthorFrustrated with a particular MyStatLab/MyMathLab homework problem? No worries! I'm Professor Curtis, and I'm here to help. Archives
July 2020
|
Stats
|
Company |
|