Blog Archives

Using StatCrunch to find a regression line equation

5/29/2018

Intro

Howdy! I am Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use StatCrunch to find a regression line equation. Here's our problem statement: Use the given data to find the equation of the regression line. Examine the scatter plot and identify a characteristic of the data that is ignored by the regression line.

Part 1

OK, the first thing I want to do is bring up my data set in StatCrunch. Now I'm going to resize this window so we can see everything that's gonna go on here. OK, we're set with that now.

The first part of our problem asks us to create a scatterplot. Inside StatCrunch, I could go up here to Graph and then select Scatter Plot. However, I know that I'm gonna have to make a regression line equation eventually anyway, and I get a scatter plot from the regression analysis. So I'm just going to do that because it requires me to push less buttons.

So to start the regression analysis, I go to Stat –> Regression –> Simple Linear. Here in the options window, I'm going to select my x- and my y-variables. And then I don't need a mess with any of these other settings; they're all gonna be good for me. So I press Compute!, and here in my results window, notice how it says up here 1 of 2. That means this is page 1 of 2 pages total. To get to the second page, I go down to this arrow button here in the bottom right corner, and lo and behold, there's my scatterplot already made for me!

Now in order to select the right answer option from the four that I'm selecting, notice how the axes on my scatter plot here in StatCrunch are different than the ones for the answer options in my problem. I can change the axes here to match, and that'll make it much easier to see which answer option is the right one. So I click on this little three line icon in the bottom left corner. And now I can select each axis independently and change the values here for maximum and minimum so that they match what I see in the problem statement. So I just did it for the X. I do the same thing here for the Y. And there we go. So now we see which answer option is obviously the one we want to pick. It's going to be this one here. Nice work!

Part 2

OK, the second part of our problem asks us for the regression line equation. That's really easy to do. We've already done the analysis here. I just flip back here to the first page, and my regression equation is right here at the top. I find there's a lot of information here at the top that's crammed together, and so in order to get the numbers right, I'm gonna look down here at the parameter estimates table. Notice these numbers here are the same numbers that we find up here in the regression line equation, and everything's laid out a little bit more here. So I'm just going to take the numbers here from the parameters estimate table. You can take it from wherever you want; it's the same number either way.

My instructions say to round the constant two decimal places as needed. The constant is the intercept, so I'm going to round that to two decimal places. And then it says round the coefficient to three decimal places. The coefficient for my x-variable is the same as the slope, so that's to three decimal places. Good job!

Part 3

The last part of the problem says, “Identify a characteristic of the data that is ignored by the regression line.” If we look at the different answer options here, let's examine them one by one. The first one here says there's no trend in the data. Well, if I go back to my scatter plot, there's definitely a trend in the data. Most of the data here fits this regression line pretty well.

“There is no characteristics of the data that is ignored by the regression line.” Well, maybe, maybe not. Let's check out the other answer options. If the others don't pan out, then this one's obviously going to be the one that's right.

The next answer option says the data has a pattern that is not a straight line. Well, most of the data here conforms to your straight line regression line, so that's obviously not true. The last option here says there's an influential point that strongly affects the graph of the regression line. That's definitely true. Look at this outlier point right here. OK, if we didn't have this outlier point, this regression line would dip down a little bit and would better fit the data that we have here in our data set. So this is the answer we're going to want to select. Well done!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

4 Comments

Find the value of the test statistic for hypothesis testing on proportions

5/25/2018

5 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find the value of the test statistic for hypothesis testing on proportions. Here's our problem statement: Claim: Most adults would erase all of their personal information online if they could. A software firm survey of 591 randomly selected adults showed that 60% of them would erase all their personal information online if they could. Find the value of the test statistic.

Solution

OK, we could go old school and drag out this. It's not really complicated, but it's a formula that requires a little bit of work. I'm lazy; I don't want to do that kind of thing, so I'm gonna use technology to get my test statistic. And specifically I'm gonna use StatCrunch. So here I have pulled up here in a separate window StatCrunch. Notice there's no icon here that I can click to get the StatCrunch to show up. So oftentimes when I work these homework problems, I find it helpful to just keep StatCrunch open in a separate window just in case I need it. And this is a good case in point. I can use StatCrunch to get what I'm looking for, and I can avoid that nasty old-school calculation.

So here we are in StatCrunch. First, I'm going to click on Stat; go down to Proportion Stats, because we're dealing with proportions; One Sample because we only have one sample; and then With Summary because we don't have actual data, just some summary stats.

Here in the options window, the first thing I'm asked to do is calculate the number of successes. I can get this from the problem statement. It said 60% of the people would erase all their personal information online if they could. I know the total — 591 — so all I need to do is a little calculation. So in my calculator, I'm gonna put 60% times the total 591. That gives me the proportion I'm looking for — 354.6. We want a whole number, because there's no such thing — well, there could be such a thing as six tenths of a person, but it wouldn’t do anybody any good. So let's only count whole people. We're gonna round this to the nearest whole number. So that would be 355.

Then the total number of observations — the total from the survey, 591 — and then we don't need to mess with any of these other settings because the default values will serve us for what we need. All we want is the test statistic. So these values here are more for determining the P-value than the test statistic. We don't want the P-value, just the test statistics, so these default values are fine for us. I click Compute!, and then here in my results window, we see right here towards the end of the table our test statistic. I'm asked around the two decimal places. Well done!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

5 Comments

Using one-way ANOVA for hypothesis testing and the Bonferroni test

5/22/2018

1 Comment

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use one-way ANOVA for hypothesis testing and the Bonferroni test. Here's our problem statement: The accompanying data are the weights (in kilograms) of poplar trees that were obtained from trees planted in a rich and moist region. The trees were given different treatments identified in the accompanying table. Also shown are partial results from using the Bonferroni test with the sample data. Complete Parts A through C.

Part A

Part A says, “Use a 0.10 significance level to test the claim that the different treatments result in the same mean weight.” The first thing we're asked to do is determine the null and alternative hypotheses for one-way ANOVA. The null and alternative hypotheses are going to be set; they're gonna be pretty much the same regardless of what it is you're actually testing.

The null hypothesis, because it's a statement of equality by definition, will simply mean that all of the parameters that you're looking at are equal to one another. That being the case, we're going to select the option where all of our parameters are equal to each other. The alternative hypothesis will then be that at least one of these parameters is different from the other. I check my answer. Excellent!

Now we're asked to find the test statistic. To do this, I'm going to load the data into StatCrunch so that StatCrunch can do the heavy lifting for me. So here I'm opening my data in a separate window in StatCrunch. I'm going to resize this window so we can see everything a little bit better. And now inside StatCrunch, I'm going to go to Stat –> ANOVA –> One Way.

Here in the options window I'm going to go ahead and select all of the different columns so we get everything in. And once I have my columns selected I don't need to make any other adjustments here in the options window. So I'm going to go ahead and press Compute!, and here's my results window. And right here at the bottom of my results window, we see the ANOVA table that we need. Our test statistic we're going to take right from the ANOVA table. We're asked to round to two decimal places. Excellent!

Our P-value we will also obtain from the ANOVA table. So here I’m asked to round to three decimal places. Fantastic!

Now we're asked to conclude our hypothesis test. Remember earlier in the problem statement we were asked to use a 10% significance level to test our claim. Our P-value is definitely less than 10%. Therefore, we’re inside the rejection region, so we're going to reject the null hypothesis. And whenever we reject a null hypothesis, there's always going to be sufficient evidence. Well done!

Part B

Now Part B asks us, “What do the displayed Bonferroni results tell us?” Well, we have to go back and look at the actual results. So I click on the icon. Here's the Bonferroni results in this table down here. Notice we're comparing three different pairs. So we have the first treatment with the second, the first treatment with the third, and the first treatment with the fourth.

To make the actual comparisons, we're going to be using these significance numbers here on the end of the table. These are actually P-values, so we're going to treat these the same way as we would with any other hypothesis test. If the P-value is greater than our significance level, that means there's not a significant difference between the two treatment groups. Remember that if the P-value is less than or equal to the significance level, then that means we reject the null hypothesis. Rejecting the null hypothesis means that we're rejecting the statement that everything's equal to each other, that there is actual some difference. So in order for there to be some significant difference, we have to reject the null hypothesis, which means we have to be within that region of rejection. And that means having a significance or P-value here that's less than or equal to our significance value.

Well, what significance value do we have for testing our claim? It's 10%. So this first pairing where we have a p-value of 1, that's definitely greater than 10%, so there's not going to be a significant difference between these two treatment groups. Same thing for the second pairing; 0.901 is greater than 10%, so there's not anything there. But here this last one — 0.033 — that's going to be less than 10%. Therefore there is a significant difference when we're looking at these last two pairings here. So I'm going to go ahead and update my answer fields and the drop-down menus with those conclusions. I check my answer. Good job!

Part C

Part C asks us, “Use the Bonferroni test procedure with a 0.10 significance level to test the significant difference between the mean amount of the irrigation treatment group and the group treated with both fertilizer and irrigation. Identify the test statistic and the P-value. What do the results indicate?”

OK, the first thing we're asked to look for is the test statistic. This is different from the test statistic we see here in our ANOVA table, so do not put this in. This is not the test statistic they're looking for, for Bonferroni. You have to make some adjustments, and the adjustment that we made is by using a formula that is reliant upon the T distribution.

So here we have here on the screen — I made slide here in PowerPoint. This is the formula that we need to be using now. The numbers for this formula, they are going to come from the ANOVA table that we have previously seen. So here we have x-bar1, x-bar2 — these are going to come from our ANOVA table, which if you notice here we're looking at the mean amount of the irrigation treatment group and the group treated with both fertilizer and irrigation. So that's these two columns here. And notice we have mean values for those computed here so 0.418 for the irrigation group and 1.666 for the fertilizer and irrigation group.

Then we're asked to find the mean square of the errors here in our denominator. And that's going to come from our ANOVA table, which is actually this number right here. Here's the mean square column, here's the error row, so the mean square of the errors is this number here — 0.17508.

Our sample sizes n1 and n2 we can also get from the ANOVA table. Notice we have a sample size column here for column statistics. And we can just grab those numbers here. So when we do that, out comes the numbers that we put in. Notice how I'm putting in that second grouping first, and the reason being is because they're actually looking for a positive number here. And they don't actually state it, but they're looking for the positive number here, reason being that the Bonferroni test, remember, is a two-tailed type of test. And so there's a positive test statistic and a negative test statistic. And since they're only looking for one number here, the default convention is just to give them the positive number. I wish they'd be more explicit and say that, but seeing as they haven’t, I'm here to help guide you through that.

So we're actually looking for the positive number. And that's why I'm putting in the greater of the two mean values here first so that we can actually get a positive number to come out. We punch this out on the calculator. Here's what we get. We're asked around the two decimal places, so we're looking for in this case 4.72. Fantastic!

Now to find the P-value, I have to go and use actual technology. And the technology of course that I'm going to use is StatCrunch. We have to go back to our actual T distribution to calculate that since this is a T distribution. I'm going to go to Stat –> Calculators –> T. And here my degrees of freedom — what are my degrees of freedom? Well, technically the degrees of freedom will be the total number of samples in the whole set minus the number of pairings that we have. But I find it simpler just to use the ANOVA table. I clear this out of the way so we can see our ANOVA table here. So here it's clear that we've got five samples in each of four columns, so there's 20 samples total, 4 columns — 20 minus 4 is 16. But if you look down here on the error row for degrees of freedom in your ANOVA table, you see that same number 16. And that's why I like to just use the ANOVA table, because it’s a little less math that I have to do. It's just quicker just to grab this number and go with it.

So we want 16 degrees of freedom for our T distribution. So I put that in here. And then for my test statistic I'm gonna actually put that in here for what I calculated previously. Notice here I'm actually — because this is actually less than (or I could make it greater than it actually works too) — so here I've got one tail, but since it's two-tailed I've got the same area on the other side of my distribution. So I really need to multiply this number by 2 to get the P-value that I need to put in my answer field. Well, if I take 0.0001 and multiply by 2, I'm going to get 0.0002. Rounded to three decimal places, that's just zero. Well done!

And now I compare my P-value, which technically I should adjust the P-value before making the comparison with my significance level. I need to multiply this P-value that I just entered into my answer field here by the total number of pairings possible. There's four different groupings, which means there's six different ways to pair these groupings up. I can pair one with two, one with three, one with four, two with three, two with four and three with four; that's six possibilities. So I should multiply this by six before comparing with my significance level. However, since my P-value is already zero, zero times anything is zero. So we just compare zero with 10%. Of course we're gonna reject with a P-value of zero inside the region of rejection. Therefore, I'm going to reject H0. And whenever we reject H0, there's always sufficient evidence. Nice work!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

1 Comment

Finding the sample size needed to estimate a population standard deviation

5/18/2018

2 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find the sample size needed to estimate a population standard deviation. Here's our problem statement: Assume that the sample is a simple random sample obtained from a normally distributed population of flight delays at an airport. Use the table below to find the minimum sample size needed to be 95% confident that the sample standard deviation is within 50% of the population standard deviation. A histogram of a sample of those arrival delays suggests that the distribution is skewed (not normal). How does the distribution affect the sample size?

Part 1

OK, the first part asks us to provide the minimum sample size needed. To do this, we're going to use the table provided. There are actually calculations that you can perform to get this estimated sample size. However, they're very complicated and complex, and so, in the interest of not over burdening the student, the textbook author has elected to provide a table. And so you just read the number from the table.

We're looking for a 95% confidence level, and so here we want the top half of this table here ---that corresponds with 95% confidence. The other alternative listed here is 99%, but we're not going to look at that because we want a 95% confidence level. And then you just look for the range that you want to be in. So here we're asked to find a sample standard deviation within 50% of the population parameter, and so we just look for 50% here. Here's 50%. And then we just read the number right off the table, and that's what we put here in our answer field. Good job!

Part 2

Now, the second part says a histogram of a sample of those arrival delays suggests that the distribution is skewed (not normal). How does the distribution affect the sample size? Well, this estimate that we have here that we put in the previous answer field comes from this table. And the numbers in this table were produced using those complex calculations I mentioned earlier. Those calculations assume that the data are normally distributed.

Here we're seeing a case where our data is not normally distributed; it's skewed. Therefore, the numbers that we have here are going to be an error because they rely on an assumption that's not true. Therefore, the answer option that we want to select is that the estimate that we make is not likely correct because it's based on an assumption that's proven to be not true. Nice work!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

2 Comments

Using one-way ANOVA for hypothesis testing of cigarette filters

5/15/2018

0 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use one-way ANOVA for hypothesis testing of cigarette filters. Here's our problem statement: Refer to the accompanying data table which shows the amounts of nicotine (in milligrams per cigarette) of king-sized cigarettes, 100-mm menthol cigarettes and 100-mm non-menthol cigarettes. The king-sized cigarettes are non-filtered, while the 100-mm menthol cigarettes and the 100-mm non-menthol cigarettes are filtered. Use a 5% significance level to test the claim that the three categories of cigarettes yield the same mean amount of nicotine. Given that only the king-sized cigarettes are not filtered, do the filters appear to make a difference?

Part 1

OK, the first part of this problem is asking us to determine the null and alternative hypotheses. With one-way ANOVA, this is pretty much set in stone. The null hypothesis will always be that all of your population parameters are equal, and the alternative hypothesis will be that at least one of those parameters is different from the others. So I'm going to select those here in the drop down menus for each of these hypotheses. Nice work!

Part 2

Now the next part asks us to find the F statistic. To do that, I'm going to use StatCrunch. To use StatCrunch, I need to get the data in. First I'm going to click on this icon and open my data in StatCrunch, and I'm going to move this window so we can see everything that we're doing.

OK, now inside StatCrunch I have my data. So I go to Stat –> ANOVA –> One-Way. Here in the options window, I'm going to select all of my columns, and I press Compute!. And here we have in the results window the ANOVA table which lists my F statistic. I'm asked to round that to four decimal places. Nice work!

Part 3

The next part asks me to find the P-value, which is right next door to my F statistic in my ANOVA table. Again I'm asked around to four decimal places. Nice work!

Part 4

The next part asks, “What is the conclusion for this hypothesis test?” Well, my P-value here in this example is about 1%. I'm asked to use a 5% significance level. 1% is less than 5%, so I'm within the region of rejection, and therefore I'm going to reject the null hypothesis. And of course when we reject the null hypothesis, there's always sufficient evidence. Fantastic!

Part 5

And now the final part of the problem asks, “Do the filters appear to make a difference?” Well, let's look at the mean values here from our column statistics. We see that the king size and the 100-millimeter menthol cigarettes are more or less in the same ballpark. There's a little bit of a separation there. The filtered non-menthol cigarettes, however — these are really different from the other two. So let's look at our answer options and see what we get.

Do the filters appear to make a difference? Well, I would claim by looking at these mean values that they do make a difference, so answer options A and C we're not going to select. Answer option B (the results are inconclusive) — I don't think the results are completely inconclusive.

So that leaves us with answer option D: “Given that the king-sized cigarettes have the largest mean” — and here we see that they do — “it appears that the filters do make a difference, although this conclusion is not justified by the results from analysis of variance.” That's very true. You're going to need to do some other statistical analysis to justify it. But just based on what we see here, the two filtered varieties do have a slightly lower mean value than the non-filtered king-sized cigarettes. And so we're going to select answer option D. Nice work!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did it or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

0 Comments

Using goodness of fit and Benford's Law for hypothesis testing to detect check fraud

5/12/2018

1 Comment

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use goodness of fit and Benford's law for hypothesis testing to detect check fraud. Here's our problem statement: An investigator analyzed the leading digits from 774 checks issued by seven suspect companies. The frequencies were found to be 222, 150, 180, 77, 66, 57, 48, 35, and 38, and those digits correspond to the leading digits of 1, 2, 3, 4, 5, 6, 7, 8, and 9, respectively. If the observed frequencies are substantially different from their frequencies expected with Benford's law shown below, the check amounts appear to result from fraud. Use a 10% significance level to test for goodness of fit with Benford's law. Does it appear that the checks are the result of fraud?

Part 1

OK, the first part asks us to determine the null and alternative hypotheses. This is the usual first step when we're doing hypothesis testing. Here the hypothesis testing is for goodness of fit. That means we have fixed hypotheses, so no matter what it is that we're looking at, the hypotheses for both the null and the alternative will be fixed.

The null hypothesis is by definition a statement of equality, and so it's always going to be (in terms of goodness of fit) that the data will correspond to some claimed distribution (in this case, Benford's law). So we look at our answer options, and we want the one that says the leading digits are from a population that conforms to Benford's law. It says that it's equal; the null hypothesis is by definition a statement of equality. So we want this statement of equality here.

The alternative hypothesis is then going to be that at least one of those digits does not conform to Benford's law. At least one of the observed frequencies does not conform to the claimed distribution, so we can ignore “at least two.” “At least one has a frequency that does not conform to Benford's law” — that's probably the one we want. “At most three” is not what we want. And “the leading digits are from a population that conforms to Benford's law” — that's also not what we want. So we're going to select this answer here. Excellent!

Part 2

Now the second part asks us to calculate the test statistic. To do that, we're going to put the data into StatCrunch and let StatCrunch do the heavy lifting for us. Here's my data in StatCrunch. Now I’m going to resize this window so we can get a better view of what's going on here. OK, so here we have in the first column the leading digits, and the second column the actual frequencies (these are what we're actually observed), and then in the last column we have the corresponding percentages for Benford's law distribution.

To run the goodness of fit test, I go to Stat –> Goodness-of-fit –> Chi-square (because we're not looking to compare with the normal distribution, so everything that we want to compare with is going to be with the chi-square test). We need to select our observed and expected frequencies. The observed frequencies are what we actually observe in the real world; this is the actual frequency column. In this case, the expected frequencies are what we would theretically expect or anticipate. So in this case, it's going to be the distribution that we're claiming the digits conform to — Benford's law. We actually have a column for that, so I'm going to select it here.

Everything else is just fine for our purposes, so I'm going to hit Compute!, and here is my results window with the test statistic and the P-value right here in this table at the top. It wants me to round the test statistic to three decimal places, so that's what I'll do. Fantastic!

Part 3

And now the P-value as always is in our results window right next door to the test statistic. Here we are asked to provide four decimal places, so again that's what I will do. Excellent!

Part 4

Now, the last part asks us to state the conclusion. Our P-value (55%) is well above our significance level of 10%. Therefore, we are outside the region of rejection, and we will fail to reject the null hypothesis. Because we fail to reject the null hypothesis, that means we don't have enough evidence to say that the leading digits are from a population that conforms to Benford's law, because that is what we see here is our null hypothesis.

Since we fail to reject it, it means that this actually could be true. So we don't really see anything that leads us to believe that we're straying away from Benford's law distribution, so sufficient evidence to warrant rejection of the claim that the leading digits are from a population with a distribution that conforms to Benford's law — well it's probable that we're actually conforming to Benford's law, so that means we're not going to reject that claim. Therefore, we're going to say there is not sufficient evidence. And since it seems like we're conforming to Benford's law distribution, it doesn't appear that we have any instance of check fraud here. I check my answer. Fantastic!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below to let us know how good a job we did or how we can improve. And if if your stats teacher is boring or just doesn't care to help you learn stats, go to aspiremountainacademy.com, where you can find out more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

1 Comment

Finding the best regression model for shock waves from explosives

5/8/2018

3 Comments

Download your free copy of the Stat 101 Nonlinear Regression Reference Sheet (which is used in the video) by clicking on the icon at right.

Stat 101 Nonlinear Regression.pdf
File Size:	55 kb
File Type:	pdf

Download File

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find the best nonlinear regression model for shock waves from explosives. Here's our problem statement: The table below lists different amounts (in metric tons) of explosives and the corresponding value of the Richter scale for the explosions. Construct a scatterplot and identify the mathematical model the best fits the given data.

Part 1

OK, the first part asks us to construct a scatterplot. We can use StatCrunch to do this, so I'm going to dump my data into StatCrunch. I'll resize this window so we can see everything better. OK, now my data is in StatCrunch.

I could just use the scatterplot graphing option inside StatCrunch. But I know I'm gonna have to make a regression model, and the regression model output includes a scatterplot. Since I have to make the model anyway, it's the road I'm going to take. I want to make a regression model and have the scatterplot come out of that regression model results. The question is “Which model type do I need to be making?” Is it linear? Is it exponential? Is it quadratic? Is it logarithmic? Is it a power function? What type of model should we be making?

Without knowing in advance what type of problem goes with which model, you'll actually end up making all the models until you hit on the right one, and so you're gonna have to compare each of the different model types. That would be useful in a class that was designed to teach you to be a good model maker.

This is a class in elementary statistics, and so I don't believe that's entirely appropriate. So what I offer to my students is a reference sheet that tells them which model type they need to make, what's the general form of the model, and how do they get the numbers out of StatCrunch to transform them appropriately so that they put the right numbers into the answer fields for their assignments.

You can download a copy of this reference sheet if you want. Go to aspiremountainacademy.com, find the blog post for this particular homework problem video, and you'll find there a link where you can download your own copy of this reference sheet. You may not be able to use this in a testing situation (unless of course you're in my class; I allow my students to use it during tests). But it will give you guidance on how to formulate the best model so that you can get to the end much quicker. I also go over in the lecture video for this section many more details about how to actually use that reference sheet, so I'd highly recommend you check that out.

For purposes of working this problem, we're just going to go through what we already know if we knew how to use the reference sheet, which I do. The model we need to make here for this problem is a logarithmic model, so to do that, I'm going to go to Stat –> Regression –> Simple Linear, select my x- and y- variables, and then scroll down here to the area that says Transformation. For the X, I want to transform the X for the natural log. I don't want any transformation for the Y. And I want to select this box here for Use original units in graphs. You'll see in a moment why we need to check this box.

I hit Compute!, and here are my results window. If I click on to the second page, here's my scatterplot. The reason why we check that box is because, if we didn't check the box, the line of best fit that we see here in the scatterplot would simply be a straight line, and it wouldn't conform to the type of model that we're trying to make. So by checking that box you get a line of best fit that conforms to the model type you're trying to make. Looking at the scatterplot, it's easy to see which of these answer options is the correct one. Good job!

Part 2

Now the second part asks us to make the actual model, which we can do here. Go back to the first page of my results. For the logarithmic model, I simply take the intercept and slope right off the parameter estimates table and stick them into the model. The general form for a logarithmic model is this one right here, so I select that and then put in my numbers to three decimal places. Fantastic!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or if we can improve. And if your stats teacher is boring or just doesn't care to help you learn stats, go to aspiremountainacademy.com, where you can find out more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

3 Comments

Finding and using a specified multiple regression equation

5/4/2018

1 Comment

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find and use a specified multiple regression equation. Here's our problem statement: The accompanying table provides data for the sex, age, and weight of bears. For sex, let 0 represent female and let 1 represent male. Letting the response (y-variable) represent weight, use the dummy variable of sex and the variable of age to find the multiple regression equation. Use the equation to find the predicted weight of a bear with the characteristics given below. Does sex appear to have much of an effect on the weight of a bear?

Part 1

OK, the first thing we're asked to do is provide the multiple regression equation with these variables in the equation. To do that, I'm going to take my data and dump it into StatCrunch. So here's my data. I click on this icon here so I can open it in StatCrunch.

Now that my data is in StatCrunch, I'm ready to make my model. Since I know specifically what model I want to make, all I have to do now is go to Stat –> Regression –> Multiple Linear (because we're looking at multiple variables in a equation with the linear form). The y-variable is the response, or what comes out of the model. Here that's going to be the weight, which you can see listed here. So for my y-variable, I'm going to select the column of the weight values. The x-variables are these variables here. This is what we input into the model, so I'm going to select each of those variables here. Interactions we don't have, and we can tell we don't have any interactions because, if we did, we would see variables like sex and age being multiplied together. But here we only see sex and age listed separately in separate terms and not multiplied together in the same term. And so we don't have any interactions.

At this point we're ready to make our model, so I press Compute! and here in my results window I have the model that they're looking for. To input those coefficients into my answer fields, I often find it easier to look here in the parameter estimates table. Notice how these numbers here match the numbers than the listed model at the top. So I'm just going to use the numbers here in my parameters estimate table to input the answers into my answer field. I'm asked to round to one decimal place and so that is what I will do. Nice work!

Part 2

The next part asks us to predict the weight of a female bear that is 22 months of age. There is no prediction option when you're using multiple linear regression in StatCrunch. If it were simple linear regression and we only had one x-variable, then, yes, you do have that option. But whoever decided to code StatCrunch did not give that functionality to the multiple linear regression option. So we're gonna have to go old school and actually calculate this out by hand.

Here we have a calculator. I'm going to use this to calculate out the predicted weight for the female bear, which up here at the top we see is 22 months of age. So first I put in my coefficient, which is the Intercept. I’m going to add to that the variable for the sex. Here this is a female bear. And up at the top of the problem statement, we see it says for sex that zero represents female, so that's what I'll put in there. And finally the age which is 22 months. I'm asked to round to the nearest integer.

Part 3

Now the next part asks me to repeat the same calculation for the male bear. That's easy enough to do. Excellent!

Part 4

And how we see the last portion of our problem asks, “Does sex appear to have much of an effect on the weight of a bear? Select the correct choice and fill in the answer box to complete your choice.” If we look back here at the weights that we predicted with our model, notice how the weight of the male is more than twice the weight of the female. So there's definitely a difference between them.

The temptation when they're asking here about which one is more than the other and how much more is to simply subtract these two values, the one from the other. But what they're really looking for is the coefficient for the sex variable, because this is what determines the difference between the weight for the female and the weight for the male. Remember that if the sex is female, then this 82.2 will not even enter into our calculation; it will be zeroed out. But if the bear is male, then the 82.2 gets added in, and that's essentially the difference between the weight of the female on the male bear.

So I need the answer option that says yes, there is going to be an effect. and here it looks like that's answer options A and D. And then the regression equation indicates the predicted weight of a male bear is more than the female, so I want this answer option. And I'm going to put in here the coefficient for the sex variable because that is essentially the difference in the weight. Excellent!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below to let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't care to help you learn stats, go to aspiremountainacademy.com, where you can find out more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

1 Comment

Finding the explained variation, the unexplained variation, and a prediction interval estimate

5/1/2018

2 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find the explained variation, the unexplained variation, and a prediction interval estimate. Here's our problem statement: Listed below are altitudes in thousands of feet and outside air temperatures in degrees Fahrenheit recorded during a flight. Find the explained variation, unexplained variation, and indicated prediction interval. There is sufficient evidence to support a claim of a linear correlation, so it is reasonable to use the regression equation when making predictions for the prediction interval. Use a 95% confidence level with an altitude of 6327 feet. And here we see our data set.

Part A

So Part A asks us first to find the explained variation. To do this, I'm going to place my data into StatCrunch. So here I have StatCrunch. I'm going to resize this window a bit. OK, now we're ready to go.

The explained variation as well as the unexplained variation will come from an ANOVA table. So I need to create my ANOVA table, and to do that I'm just going to use the ANOVA table that comes out of the regression results. So I go up here to Stat –> Regression –> Simple Linear (because the problem statement said we have a claim of linear correlation that is supported). I'm going to select my x- and y-variables and press Compute!

Here we have our results window. If I scroll down here, you can see my ANOVA table for the results. The explained variation is the sum of the squares for the model, so that will be this number right here. So I round that to two decimal places for my answer field. Nice work!

Part B

The unexplained variation is the sum of squares for the error, so that's this number here. Excellent!

Part C

And finally, for an indicated prediction interval, I go to the results window and click on this Options button, and in the drop down menu I click on Edit. Then if I scroll back down here in my options window, I'm looking for this area. This is Prediction of y. So I put in the value of x for which I want to make a prediction. Here we're making a prediction for 6327 feet, but since all the altitudes are expressed in thousand feet, I need to put that in as 6.327.

We're asked for a 95% confidence level. That's the default selection here, so I'm just going to leave that alone. I press Compute!, and out comes my results window. When I go down to the bottom of that window, here's a table. Here's the value that I put in for the prediction. Here's the prediction that comes out of it. And at the very end of that table at the bottom we see the limits for a prediction interval. So I'm going to put those numbers in here. Just round to four decimal places. Nice work!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below to let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't care to help you learn stats, go to aspiremountainacademy.com, where you can find out more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

2 Comments

Intro

Part 1

Part 2

Part 3

Intro

Solution

Intro

Part A

Part B

Part C

Intro

Part 1

Part 2

Intro

Part 1

Part 2

Part 3

Part 4

Part 5

Intro

Part 1

Part 2

Part 3

Part 4

Intro

Part 1

Part 2

Intro

Part 1

Part 2

Part 3

Part 4

Intro

Part A

Part B

Part C

Author

Archives

Stats

Company

Support