Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to construct a time series plot. Here's our problem statement: The company data represent the percentage of recent high school graduates graduated within 12 months before the given year end who enrolled in college in the fall. Construct a time series plot and comment on any trends. Part 1 OK, to construct the plot, we need actual data. So we're going to click on this link here, and here's our data. Now we're going to put that data into StatCrunch. OK, we've got the dated loaded here in StatCrunch. I'm going to resize this window so we can see everything a lot better. And now making the graph in StatCrunch is pretty easy. I just go up to Graph --> Index Time Plot. In the first area, I'm asked to select the columns that I want to graph, and this is what's represented on the y axis. So you can see here with our answer choices, the percent enrolled is on the y axis. So I'm going to select that there. Then down below, see how there's a format for the x and y axis, and you've got different options that you can go with? We actually want Time because our data is separated into years, so we're going to select Years under Type. And the starting year — the first year of our data is 1989, so I'll put that in. And then each data point proceeds increments of every one year. So we're just going to leave that default value there. And that's all we need to do. I press Compute!, and here is my time series graph. Now I just look for the answer option that best represents what I have here in StatCrunch, and clearly that's going to be answer option A. Excellent! Part 2 Now the second part of the problem asks us to comment on any trends. There's three answers that we have to choose from, so let's see what we're looking at here. Generally, if you look at the graph here, the trend starts out low, comes up high, and there's a down spot here and then it comes back up high again. So if you look at all of the data points and try to imagine a line of best fit going through all of the data, that line would be going upward as we go from left to right. So the general trend, even though there's some lows is increasing from left to right or increasing with time because from left to right we're increasing with time. So let's see which answer option best matches that. It looks like answer option A. Well done!
And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.
0 Comments
Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to interpret a frequency table. Here's our problem statement: Refer to the table summarizing service times (seconds) of dinners at a fast food restaurant. How many individuals are included in the summary? Is it possible to identify the exact values of all the original service times? Part 1 Okay, so the first part of his problem is asking us for the total number of individuals included in the summary. We have our frequency table here, so to get the total number, we just add up the number that are in each of the different categories or classes. So to do that, I'm going to whip out my calculator and just add in the frequency counts for each of the categories together and give me the total that's in the summary. I put my answer here in the answer field. Excellent! Part 2 And now the second part of the problem asks, "Is it possible to identify the exact values of all the original surface times?" Well, the only information we have is the frequency counts that are in each category or class. So for this first category where we've got 60 to 119 seconds, we have a frequency count of 7. That tells us seven of the times in the total dataset are somewhere between 60 and 119. We don't know exactly where they are. We don't exactly what they are. All that we know is that there are seven of them within this range. So all seven of those data points could be 60. They could all be 65. They could all be 70. They could all be a hundred. Or hey, maybe there's three of them that are 70 and four of them are 100. That's another possibility.
I mean, if you start thinking about it, you see there are endless numbers of possibilities for where these seven data points could lie within this range. So without knowing more information about the specific data point, we'd have no idea where those seven points are. So no, the frequency distribution doesn't tell us the exact value; it could be any value within those class limits. And that's going to be this answer here. Nice work! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching. We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today, we're going to learn how to find and use expected frequency for goodness of fit hypothesis testing. Here's our problem statement: Refer to the data in the accompanying table for the heights of females. Complete parts (a) through (d) below. Part A OK, Part A is asking us to “enter the observed frequencies in the table below.” So here they’ve got different categories or classes for height, and then we’re asked to fill in the different frequencies. If we look at our data in our table here, notice how we’ve got females mixed in with males. So the gender variable is a dummy variable where you’ve basically got two options: One option is a zero and the other is a one. Here the ones are males, so the zeros must be females. We’ve got to sort through all this data to get just the heights for the females; we weren’t asked for the heights of the males, just the females. Notice also here the height data is actually unsorted itself too. So we’ve got two different sortings to do, and so to do that, I want to open the data in Excel. So the thing is I want to actually use Excel because we could do it in StatCrunch but StatCrunch is really clunky, and especially when it comes to sorting data. The sort feature in StatCrunch only lets you sort one level at a time whereas with Excel, it will let you sort multiple levels at the same time. And that’s what we want because it makes our job a little easier. So I’ve already pre-loaded the data here into Excel. And what I want to do now is actually sort this data out. So to do that, I’m going to come up to menu here — I’m coming off screen a little bit so you can’t see, but I’m selecting Data. And then I want to select Sort. And then here in the sort dialog box, I first want to sort by gender so we can get all the males out of the way. And then I’m going to add a level so that within the females I’m going to actually sort by height. This will actually help us to count to get the frequencies that we need to fill in our table here for our answer fields. So I hit OK, and now everything is automatically sorted. The other thing that is nice about Excel is that it makes counting really easy. And that’s all we’re doing with frequency is we’re getting counts of measurements that fall within each of these different classes or categories. So we want to count the number of data points that are less than 155.15; that’s our first category or class here. So I’m going to select that first cell with my data point here in my data, and then I’m going to scroll down to where I get the — let’s see, 155.15. So 155.15 is going to be every data point up to this one. So I’m going to hold down the Shift key on my keyboard and press the left key on my mouse. So now I’ve selected all those data points that are less than 155.15. And if I look down here at the bottom of my Excel window, I see that the count here is 20. So that made the counting super easy for me. I just put in a 20 there. And now I’m going to do the same thing for each of the different other classes, so I’m going to select the next cell here, and go down to 161.75. So I scroll down to . . . 161.75 would be there, and 41 is the count. Now, let’s see, 168.35. That would be up to there. The count is 34. “Greater than 168.35" — so this is the last category, so I’m going to go up to the last female data point, which is this one right here. Beyond that you get all the male data points. So this is my last category count — 19. Excellent! Part B Now, Part B says, “Assuming a normal distribution with mean and standard deviation given by the sample mean and standard deviation, find the probability of a randomly selected height belonging to each class.” So the probabilities are going to come out of our distribution calculator in StatCrunch; that’s the easiest way to get this. We want a normal distribution, and the mean and standard deviation are coming from our sample data itself. So first I’m going to get the sample mean and standard deviation and put those values into StatCrunch. To do that, the first thing I’m going to do is get rid of all the male data points here because we don’t need them. So I’m just going to scroll down here, select the row, delete all that. Now down here under the height column, I’m going to put in an AVERAGE function, I’m going to select all those data points, then close my parenthesis — there’s my average. I’ll put the standard deviation just below it, and there’s my standard deviation. So this is what I need to put into StatCrunch. So the easiest way for me to get into StatCrunch is just to put my data in, although once I get my data here into StatCrunch, see, the first thing I’m going to do is get rid of my data because I don’t need the data; I just need StatCrunch. So let’s move this down so we can see a little bit more what we’re doing. Alright, so here we’ve got the data in StatCrunch, and I’m just going to clear that out, and I’m going to clear you out. So now I just want my distribution calculator. I want the Normal distribution, and the mean and standard deviation are going to come from Excel. So if I move this over here, I can stick in my mean value that I calculated in Excel and the standard deviation value, also from Excel. Notice I’m typing in all the numbers I have. OK, that’s great. So now I’ve got this, and I can get rid of that. And to get the probability, I want less than 155.15, so less than is here. I just need to put in this random variable 155.15 — there’s my probability. I’m asked to round to four decimal points. And I’m going to do the same thing for each of the other four categories. The next one is between two values, so I’m going to select the Between option in StatCrunch. And then here we’re going to select 155.15. Here we’re going to select 161.75. There’s my probability. And then I just do the same thing for the columns that remain, again rounding to four decimal places. And I want greater than, so I go back to my Standard option, change this to greater than, and this one’s 168.35. And there’s my final probability. Good job! Part C Now, Part C says, “Using the probabilities found in Part B, find the expected frequency for each category.” This is pretty simple. All we have to do is get the total number of frequencies and then multiply it by each respective probability for each class. So I can add these numbers up, or if I want to be lazy — yeah, I want to be lazy! — so I’m going to go back to Excel, recognize that this first row is taken up with the column headings, so I’ve got 115 minus the one for the column headings is 114. So my sum is 114. If you want you can add these four numbers up, you’ll get the same thing. So 114 is what I want to multiply by each one of these respective probabilities to get the expected frequency counts. So I’ll pull out my handy dandy calculator here, and let’s move you down a little bit. OK, so, we have 114 times the first probability for the first category here is 0.1971. So there’s my expected frequency for the first class or category. And I just do the same thing with the numbers that remain. So 114 times the next probability gives me the next expected frequency. And I’m just going to finish this out here. Oops! That’s the wrong number. There’s the number I want. Excellent! Part D Now Part D asks for a hypothesis test, and there’s different parts to this, so let’s take a look. The first section in Part D says, “Identify the null and alternative hypothesis for this test.” For goodness of fit, it’s always going to be the same thing. Your null hypothesis will be everything’s equal. The alternative is going to be at least one of them is different.
But you’re not just looking at one part; you’re looking at both parts together. There’s a part in Part A, and then there’s a corresponding part in Part C, because you’re looking at observed frequencies and expected frequencies. And so what we’re saying is that the observed and the expected should be the same. That’s the null hypothesis. And then at least one of those categories, it’s not going to be the same; it’s going to be different. So if I look back here at my answer options, I’m seeing . . . this one — answer option D. So for each class, the answer from Part A equals the one from Part C. And then the alternative is for at least one of them the answer is not equal. So that’s what I want to select. Good job! Now the next part asks for the test statistic. This part of the problem typically drives students bonkers until you understand that, the way they give this out, it’s easy to get it into StatCrunch, but you can’t just stick the data in. The data is in the wrong format because the data is just the raw data. StatCrunch needs, to do the goodness of fit test, is frequency counts. So you’ve got to put in the observed and the expected frequency counts. And you’ve already tabulated those up. Here’s your observed frequency counts, and here’s your expected frequency counts. So all I need to so is go back into StatCrunch and then put those numbers in, and then I can run my goodness of fit test. So here’s my StatCrunch window back, and I’m going to clear this distribution calculator out. So here in the first column, I’m just going to label this Observed. And then the next one I’m going to label Expected, so I can tell them apart. Now I’m going to come up here, and I’m going to copy these numbers from Part A up here in my Observed column. So I’ve got — whoops! — 20, 41, 34, and 19. We’re going to do the same thing for my Expected column. Make sure they’re in the same order so your answer can come out right. Now I’ve got the data here in StatCrunch. This is the frequency counts. This is what we need for goodness of fit testing. And it’s really easy. Come up to Stat –> Goodness-of-fit –> Chi-Square Test. The observed is the Observed; the expected, Expected; come down here and hit Compute!, e viola! There’s out test statistic right there in the results window. So it’s the next to last number there in that first table. How many decimal places do I want? Three? That’s going to be 0.764. Nice work! “Identify the P-value.” Well, the P-value is right there next to the test statistic. It again wants three decimal places. Excellent! And now “state the final conclusion that addresses the original claim.” Well, our P-value is going ot be well above any significance level that we’re going to want to test for. Here our significance level is 1%. Here we’ve got 85.8%, so we’re definitely outside the region of rejection, therefore we fail to reject. Whenever we fail to reject, there is not sufficient evidence. Well done! And that's how we do it at Aspire Mountain Academy. Feel free to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is just boring or doesn't want to help you learn stats, then go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you’d like to see. Thanks for watching! We’ll see you in the next video! Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today, we're going to learn how to use an ANOVA table for hypothesis testing. Here's our problem statement: A sample of colored candies was obtained to determine the weights of different colors. The ANOVA table is shown below. It is known that the population distribution are approximately normal and the variances do not differ greatly. Use a 0.025 significance level to test that claim that the mean weight of different colored candies is the same. If the candy maker wants the different color populations to have the same mean weight, do these results suggest that the company has a problem requiring corrective action?
Part 1 OK, the first part of this problem asks, “Should the null hypothesis that all the colors have the same mean weight be rejected?” Well, we have the ANOVA table here, and notice how here at the end we have our P-value, so we can use this and compare this with our significance level and determine the result of the test. So a P-value of 0.6260 is definitely greater than our significance level of 0.025. Therefore, we can’t fit the area of the P-value into the area of the significance level, and we are therefore outside the area of rejection. Therefore we are going to fail the null hypothesis. So we should not reject the null hypothesis because the P-value is going to be greater than our significance level. Excellent! Part 2 Now, the second part of this problem asks, “Does the company have a problem requiring corrective action?” Well, here in the problem statement it says that “the candy maker wants the different color populations to have the same mean weight.” That is the null hypothesis, that all of the colors of the same mean weight. We failed to reject the null hypothesis, which means it could be true. And if it’s true, then the candy maker is getting what the candy maker wants. And so therefore there is no problem requiring corrective action.
So the answer is going to be No, no corrective action is required because — let’s see here. It is likely that the candies do not have equal mean weights. No, it is likely that they do. So we’re going to select answer A, even though it’s got this awkward double negative — “not likely that the candies do not have equal mean weights”! That’s like saying, yeah, because they do have the same weight. Excellent! And that's how we do it at Aspire Mountain Academy. Feel free to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is just boring or doesn't want to help you learn stats, then go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you’d like to see. Thanks for watching! We’ll see you in the next video! Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today, we're going to learn how to use goodness of fit for hypothesis testing of the best day for quality family time. Here's our problem statement: A random sample of 773 subjects was asked to identify the day of the week that is best for quality family time. Consider the claim that the days of the week are selected with a uniform distribution so that all days have the same chance of being selected. The table below shows goodness-of-fit test results from the claim and data from the study. Test that claim. Part 1 OK, the first part of this problem is asking us to determine the null and alternative hypotheses. For goodness of fit testing, that’s pretty much going to be the same thing every time. The null hypothesis is going to be that everything is the same. So in this case, all days of the week have an equal chance of being selected. The alternative hypothesis will always be that at least one of those will be different. So at least one day of the week has a different chance of being selected. Good job! Part 2 Identify the test statistic. Well, we work so many problems that by the time we get to Chapter 11, we’re pretty much in the habit of OK let’s get some data or some numbers, put them in StatCrunch, let StatCrunch chew some numbers, and spit out an answer. But the answer that we’re looking for is already given to us here just below the problem statement. It asks for the test statistic, and so here is our test statistic. There’s a number: 3021.822. So we just put that number here in the blank. Excellent! Part 3 This next part of the problem is exactly the same thing. It asks us to identify the critical value. The critical value is, again, listed up here in the results from some technology display that was already done by somebody. So all we have to do is copy the number over. Fantastic! Part 4 And now the last part of the problem asks us to state the conclusion. In this case, we’re going to compare the test statistic and the critical value. Well, here’s the critical value, which marks the boundary region in the tail of our distribution that is the critical region or region of rejection. Here’s our test statistic. It’s well within the right tail of our distribution, and so therefore we are going to be inside the region of rejection. Therefore we reject the null hypothesis. Whenever we reject the null hypothesis, there is sufficient evidence. And so, because the null hypothesis says everything is the same and we’re rejecting it, we are by default “accepting” the alternative, which says that at least one of the days is different, so it does not appear that all days have the same chance of being selected. Fantastic!
And that's how we do it at Aspire Mountain Academy. Feel free to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is just boring or doesn't want to help you learn stats, then go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you’d like to see. Thanks for watching! We’ll see you in the next video!
Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today, we're going to learn how to find the best nonlinear regression model for stock market index values. Here's our problem statement: Listed below are the annual high values, y, of a stock market index for each year beginning with 1990. Let x represent the year, with 1990 coded as x = 1, 1991 coded as x = 2, and so on. Construct a scatterplot and identify the mathematical model that bets fits the given data. Use the best model to predict the annual high value of the stock market index for the year 2007. Is the predicted value close to the actual value of 11,655? Part 1 OK, so first we’re asked to construct a scatterplot. To do that, we need to make the actual model itself. Notice here in the problem statement how we’re using coded years. So we have to use coded years to make our model. And the data set that they give us, as you can see here, doesn’t have coded years. So we have to actually make that transformation. So let’s go ahead and do that first. I click this icon here so that I dump my data into StatCrunch. So here we are in StatCrunch. I’m going to resize this window so we can see everything a little bit better. Excellent! Now we’re done with you. So here in StatCrunch, we can transform these values into coded years. And to do that, I’m going to go up here to Data –> Compute –> Expression. I’m going to build my expression. And then notice here in the problem statement is says that 1990 is coded as x = 1, 1991 is coded as x = 2, so we’re basically, you know, saying that 1989, which is the year before 1990, is going to be x = 0. So 1989 is our zero year. So that’s what we’re going to need to subtract from each of our year values in order to make coded years. So I select the column for the years and add it to my expression, and then I’m going ot subtract out the zero year (1989), press Okay. You can label the column whatever you want. I typically leave this blank because the default is to go and label the column with the actual expression that was used to transform the data. And I like that; I like knowing what the data is and what the data came from, so I just go ahead and leave that blank. I press Compute! So now I got a new column here with the coded years. Now I’m ready to make my model. And the way they’re intending for you to work this is you’re going to use this data to make each of the general types of nonlinear models that you’re talking about in this section. That’s like five different models that you have to make! And then you have to compare P-values and adjusted R-squared values. And, you know, if you have to do it that way, then I guess you could do it that way to figure out what the best model is. But I find that it’s much easier if I just use a reference sheet. So I’m going to show you here a little tool that I developed. This is a reference sheet that you can use for answering these nonlinear regression equation questions that you get on your assignment, and it’s basically two tables. So the first table up here tells us what model we need to make, and the second table tells us, you know, how to manipulate the options in StatCrunch so we can get the numbers we need to put in our answer fields. So up here at the top, we look to see what the general model is going to be. And you can actually get this reference sheet if you go to the website and you can look to where the blog post is. If you’re watching this on YouTube, you know, just click the link on the description, and it’ll take you to the blog post there on the website, and then down below the viewing window there for the video you can see a link to download for free this actual reference sheet. Before you use it, if you . . . if you’re not in my class, then you’re probably not going to be using this in a testing situation, in which case, you’re just going to have to work the problem so many times that you understand that, when you see this type of application, it means you make this type of model. And you’re going to have to work the problem so many times that you remember the steps. That’s all I can give you. Now if you’re in my class, yeah, I’ll let you use this on a test because, I mean, the class isn’t about trying to make you expert model makers. It’s just giving you kind of a brief look at, understanding, cursory look at what’s the process for model making just to give you that general sense of appreciation for how it’s done. You know, I don’t mind you using a reference sheet like this on a test if you’re one of my students. If you’re somebody else’s student, well, you’re probably not going to get it. But at least this will help you work your homework problems, am I right? So the first thing we do is we look at this first table, and we’re looking for the application here in this area that matches what we’re looking at in the problem statement. So if we go back to our problem statement here, we can see that we’re talking about a stock market index. So I’m going to go back to this reference sheet, and I’m going to look for where it says “stock market index.” And I can look through all the different applications here, and I see it right here — stock market index. So that tells me I need to make a quadratic model. So I don’t have to make all five of these models to know that the quadratic one is the best. That’s really handy. And then of course the general form as you can see is listed here. Now I can go down to the second table, which is the data transformation table. This tells me how to use StatCrunch to get the answer I need to put in my answer fields. So, again, we’re making the quadratic model. Here’s the general form that we want to use. To get there in StatCrunch, this is the regression option that we want to select. So we want Polynomial, so in StatCrunch, I’m going to go up to Stat –> Regression –> Polynomial (because that’s what the table told me to select). And here I’m going to select my x- and y-variables. Remember to use the coded years for your X. I take the Y. Poly order here is 2; that’s what the table here is telling me to say. It says in the option window, I want to make sure — there’s nothing I need to do, no change I need to make in the options window, but it says to make sure that Poly order equals 2. And we see that it does. So we got everything we need, so we hit Compute! And out comes our results window. We’re looking for the scatterplot, so if I hit this little arrow over here in the corner, there’s my scatterplot with my line of best fit. Wow, that looks really great. So now I just look here at my points, and it’s pretty obvious that answer option A is going to be the one that matches. If I want, I can use these options here to blow up the graph, and make sure it looks similar. We’re looking OK. So answer option A is going to be what I select. Nice work! Part 2 Now the next part asks for the equation for the best model. We know it’s the quadratic equation. But if you come back here and look, see the general form here? So now I want to pick the answer option in StatCrunch that matches this general form: a-x-squared plus b-x plus c. So as I look at my answer options, that’s going to be answer option A that matches the general form. Now I selected the right one. To get the answers that I put here in my coefficients, again I go back to my table, and it says in the results window, it says, “a equals x-squared, b equals x, c equals Intercept.” So this a-b-c matches what you see over here in the general form: a, b, and c. And notice that matches the order of the answer fields that I need to put in here in my answer. So it’s going to be a, b, and c. And those numbers, it says, comes out of the results window. This is from the parameters table: x-squared, x, and intercept. So I come back here to StatCrunch, and notice I got here in my parameters table x-squared, x, and intercept. So these numbers here are what I need to put in my answer fields here. I’m asked to round to three decimal places. So here the first value is going to be this x-squared value here; that’s going to be 1-2-5-point — rounded to three decimal places, that’s going to be 3-5-2. Next, notice we have a negative sign here, so I’m going to have to carry that one through — 44-4-point-9-6-6. And then the last number coming up here — 3-4-2 — excuse me, 3-2-point-9-5. Good job! Part 3 And now the last part of the question asks to use the best model to predict the high value for the stock market index in the year 2007. I can make predictions with the model. I can actually, you know, look at this equation, and actually write it out, and punch it out on my calculator, or I can have StatCrunch do it for me. So go back to your options window, scroll down here and see where it says “Prediction of Y.” You put in a value for X, and it will calculate that for you in the regression equation.
But remember — you used coded years for your model. This is why I hate coded years, because in order to use the model, you have to put in a coded year. So we can’t just put in 2007. We have to change that to a coded year. And we do that by subtracting out the zero year. So here in my calculator, I take 2007, subtract out my zero year (which was 1989), and I get 18. So 18 is the number I want to stick in here. I come down here and hit Compute! And then scroll down here. And down here at the very bottom, I see my predicted value 36000, which is a long ways away from 11,655. So that’s much higher than that. So, no, it’s not close at all to the actual value. We want either A or B. A says “dramatically greater.” B says “dramatically lower.” A is going to be what we want. And I stick the value that we get in here rounded to the nearest whole number. Nice work! And that's how we do it at Aspire Mountain Academy. Feel free to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is just boring or doesn't want to help you learn stats, then go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you’d like to see. Thanks for watching! We’ll see you in the next video! Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today, we're going to learn how to find the P-value given the test statistic. Here's our problem statement: Use technology to find the P-value for the hypothesis test described below. The claim is that, for a smartphone carrier’s data speeds at airports, the mean is mu = 18.00 Mbps. The sample size is n = 29, and the test statistic is t = 2.074. Solution OK, finding the P-value here is really easy if you understand one simple concept. And that is that the P-value is the area in the tail of the distribution bounded by the test statistic. And that’s why the test statistic is given here, because it provides the boundary for that area in our distribution.
But which distribution are we going to be using? Well, look at your test statistic. Your test statistic is a t-score. That means we’re going to be using the Student-t distribution. So I’m going to call up StatCrunch here, and inside StatCrunch, I’m going to go to Stat –> Calculators –> T because I want the Student-t distribution. Now here’s my Student-t calculator. The degrees of freedom is one less than the sample size, and that’s why they gave us the sample size here. In this case, it’s 29, so our degrees of freedom will be one less than that, which is 28. And then we need to get this inequality sign here right, and that’s got to match our alternative hypothesis. Well, to get the alternative hypothesis, we have to look at the claim. The claim here is that mean value equals 18. Well, equality by definition belongs with the null hypothesis, so we can’t adopt the claim as the alternative hypothesis, which means we have to take the complement of this. The complement of being equal to is being not equal to, and not equal to means we’re going to have a two-tailed test. So I’m going to come up here in my distribution calculator and select the Between option, because the P-value is actually split between the left and right tail of my distribution. And now I’ve got two test statistics: One is going to be positive, and one is going to be negative. So I’m going to put those values in here. And now I’ve got everything that I need, I go ahead and hit Compute! and out comes the area in between the tails. Remember that in StatCrunch, this Between option is calculating the area in between the tails. But the P-value is the area of the tails, so I have to take the complement of this area that’s between the tails to get the area of the tails. So I call up my little calculator here, take 1 minus this value here, and there is my P-value. I’m asked to round to three decimal places. Good job! And that's how we do it at Aspire Mountain Academy. Feel free to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is just boring or doesn't want to help you learn stats, then go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you’d like to see. Thanks for watching! We’ll see you in the next video! Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today, we're going to learn how to evaluate a multiple linear regression equation based on a given technology output. Here's our problem statement: Consider the correlation between heights of fathers and mothers and the heights of their sons. Refer to the accompanying technology output. Should the multiple regression equation be used for predicting the height of a son based on the height of his father and mother? Why or why not? Solution OK, the first thing we’re going to do is take a look at this technology output. So I’m going to click on this icon here, and out comes the technology output. It looks very similar to what you would see if you were actually making the model in StatCrunch. The advantage of this is that the model has already been made, so all we have to do is evaluate the output to see if it’s something we want to use or not.
There are two main things you want to look at. The first is the P-value. And when you’re looking at the P-value, don’t look over here at the parameter estimates table. You want to look at the P-value not of an individual parameter but of the model as a whole. And the P-value for the model as a whole is found here in the ANOVA table. So we’ve got a P-value that is practically zero. It’s hard to get a P-value better than that, so the P-value looks absolutely excellent. The other thing you want to check is the R-squared, or more appropriately, the adjusted R-squared values. And here we look down here at the bottom of our output and see that our adjusted R-squared value (0.3552) is not something that I would consider to be all that grand. However, the adjusted R-squared value is most useful for comparing models between each other. We’ve only got one model that we’re looking at here. So the main thing that we want to focus on is the P-value because we’ve only got one model that we’re evaluating. So the P-value itself looked pretty good, so let’s go through our answer options to see which of these answer options best matches what we saw with the technology output. We’re definitely — we’re definitely going to do this, so the only answer option that has a particular “Yes” to it is going to be answer option B. But before we select that, let’s go through the other answer options to make sure that we don’t want to select them. Answer option A says “No, because the P-value for the Intercept is not very low.” Well, we’re not looking at P-values for a particular model parameter. We want the P-value for the model as a whole, so that’s not going to work. Answer option C says, “No, because the R-squared and adjusted R-squared values are not very high.” And while that’s very true, this answer option doesn’t say anything about the P-value. And so it’s — the P-value here is what you want, especially when you’ve just one model. So this isn’t going to work for us. Answer option D says, “No, because the P-value for Father is smaller than the P-value for Mother.” Again, these are P-values for individual parameters of the model, and the only one we want to look at is the one for the model as a whole. So this isn’t going to work for us. The answer option we do want is answer option B. Excellent! And that's how we do it at Aspire Mountain Academy. Feel free to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is just boring or doesn't want to help you learn stats, then go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you’d like to see. Thanks for watching! We’ll see you in the next video! Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today, we're going to learn how to find the best regression equation given multiple variables. Here's our problem statement: The accompanying table provides data for tar, nicotine, and carbon monoxide (CO) contents in a certain brand of cigarette. Find the best regression equation for predicting the amount of nicotine in a cigarette. Why is it best? Is the best regression equation a good regression equation for predicting the nicotine content? Why or why not? Part 1 OK, the first part of our problem asks us to find the best regression equation, and notice we’ve got three different answer options to select from. And that’s because we’re looking at three different models. So the first model (answer option A) has just the carbon monoxide content. The second answer option (answer option B) — that model has only the tar content. And the last model (answer option C) has both the tar and the carbon monoxide for variables. So we need to make regression equations for each of these models. And we need to compare values from each of those models. So to make the models, first we need to get the data and dump it into StatCrunch. So to do that, I’m going to click on this icon; this brings up a table with the data. And now I’m going to stick that data into StatCrunch. I’ll resize this window so we can see a little bit better everything that’s going on. And now, to make these first two models, we could go into Stat –> Regression –> Simple Linear. But we know we’re going to have a model with two variables there at the end. So let’s just use the one menu option of going to Stat –> Regression –> Multiple Linear. Here in my options window, I can select my Y-variable. This is what comes out of my regression model, which in this case is the nicotine. And then the first model I want to make has just the carbon monoxide, so I’m going to select that here for my X-variable. There are no interactions with this model. Interactions are where you have more than one variable being multiplied together to make another term in your regression model equation. Here we only have single variables in each individual term, so there’s no variables being multiplied together. And so we don’t have any interactions, so there’s nothing to select here. And these default options here where these boxes are not selected is just fine for us. So we press Compute! And out comes this results window that has the results that we need for evaluating this particular regression model. To help us evaluate the models, what I’ve done is gone to Excel and made a little chart here. So what we can do is copy that information over into Excel for each of the models, and then we can compare in one spot which model is the best and then take the values from that model and stick it into the answer fields that are appropriate for our assignment. So the first thing we’re looking for is the adjusted R-squared value. That’s going to be down here towards the bottom of my results window. The P-value — notice there’s different P-values here in my parameter estimates table. The one that I want, though, is here in the ANOVA table. I want this P-value here in the ANOVA table; that’s the one that I want for the model. And then, just in case we end up selecting this model, so we don’t have to go back and redo all of this, let’s just take the values for the intercept and the slope and stick them here in our table in Excel. Our assignment is asking us to round to three decimal places, so I’m going to take these values out to five decimal places. I want two extra decimal places so that I can avoid rounding errors to put my actual answer into my assignment fields. But I don’t want to incur any rounding errors that come from rounding these values themselves. I don’t want to transcribe the entire number, so in order to avoid that, I just want to shorten this up to transcribe it here. So I’m just going to take two extra decimal places, so that means I want five. There. And there’s the first model. Now to get the second model — OK, notice what we need for the second model. The second model is we’re just looking at the tar. So I’m going to replace the carbon monoxide with the tar. Instead of going through the menu options again in StatCrunch, I’m just going to come up the Options button here, click on Edit, and then I’m just going to switch from CO to Tar. Hit Compute! And now I’ve got a new model. And I can take those values out. So my adjusted R-squared value is down there at the bottom of the screen. And my P-value I get from my ANOVA table. Notice it says we have less than 0.001; that’s for all practical purposes zero. And then I take my intercept value and my slope. And that’s the second model. Now I’m ready to make the third model. I’m going to go back into my options window to do that. Notice the order in which the variables appear. Tar is first, and then comes CO. So I’m going to put those variables in the same order in my regression model so that when I’m transcribing numbers out, it’ll be easier not to get them confused. And here’s my last model. Adjusted R-squared value goes here. P-value goes there. I’ve got an intercept, slope 0.09596, and the last slope value — notice the negative sign there. OK, so now I’ve got my values here. Now I can bring that over here, and we can compare and see what we’re looking at here. We want a high value for adjusted R-squared and a low value for the P-value. Well, looking at the P-values, answer option A has a significantly higher P-value than the other two options, so we’re going to take that, and we’re just going to cross that off our list. So we’re not going to look at that any more. And now we’re choosing between answer options B and C. They have the same P-value, so we look at the adjusted R-squared value. And answer option C has a significantly higher value for adjusted R-squared. So we’re going to select answer option C. If the adjusted R-squared values were reasonably close together, then we would say that adding in this extra variable doesn’t give you that much more benefit from a higher adjusted R-squared value, so it’s not going to make that better of a model. But this is a ten percentage point difference here; that’s pretty significant. So we’re going to say that answer option C is the one that we’re going to want to select. And if I wanted to highlight that, I could do something like that so I can make sure I get the right numbers out. And then I just transcribe my numbers here. So I want three decimal places. I’ve got them rounded to five so I can avoid rounding errors when I’m putting them here in my answer field. So the first value is my intercept, and then I want the first slope, and then I want the second slope. Again, note the negative sign. Good job! Part 2 Now the second part of our problem asks us, “Why is this equation best?” Well, as we just got done saying, we’ve got a high adjusted R-squared value and a low P-value; those are the main two determinants that we’re looking for. The other thing we look for is the number of variables, and though we’ve got more number of variables in this equation than we do in the other two, we’ve got a significantly higher adjusted R-square value that makes adding that extra variable worthwhile. So we want highest adjusted R-squared value, so looking at my answer options, it could be B, or it could be D. I want low P-value, so we’ve got low P-value here, and low P-value here, so that’s good. This says, for answer option B, “removing either predictor noticeably decreases the quality of the model.” And that’s true. If you take that second variable out, notice you get a ten percentage point drop in adjusted R-squared. So that’s a possibility, but let’s check D just to be sure. It says only a single predictor variable is in our equation, and we’ve noticeably got two. So it can’t be D; it has to be B. Fantastic! Part 3 And now the last part of our problem asks, “Is the best regression equation a good regression equation for predicting the nicotine content? Why or why not?” Well, here you want to be looking at your P-value. Here you’ve got the lowest P-value you could possibly have, which is zero. And so that’s going to tell us that, yes, this model will fit our data pretty well.
So we want the answer option that says, “Yes.” So that’s going to be A or D. And A says, “Small P-value indicates good fit.” Answer option D says, “Large P-value indicates good fit.” Obviously that’s not true, so we want answer option A. Good job! And that's how we do it at Aspire Mountain Academy. Feel free to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is just boring or doesn't want to help you learn stats, then go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you’d like to see. Thanks for watching! We’ll see you in the next video! |
AuthorFrustrated with a particular MyStatLab/MyMathLab homework problem? No worries! I'm Professor Curtis, and I'm here to help. Archives
July 2020
|
Stats
|
Company |
|