Blog Archives

Finding the best predicted value using the best multiple regression model

5/31/2019

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find the best predicted value using the best multiple regression model. Here's our problem statement: The accompanying tables shows results from regressions performed on data from a random sample of 21 cars. The response variable is CITY (fuel consumption in miles per gallon), the predictor or x variables are WT (weight in pounds), DISP (engine displacement in liters) and HWY (highway fuel consumption in miles per gallon). The equation CITY = -3.16 + 0.822HWY was previously determined to be the best for predicting city fuel consumption. A car weighs 2780 pounds. It has an engine displacement of 1.5 liters, and it's highway fuel consumption is 42 miles per gallon. What is the best predicted value of the city fuel consumption? Is that predicted value likely to be a good estimate? Is that predicted value likely to be accurate?

Part 1

OK, so there's two parts to this problem. And in the first part, we're asked for the best predicted value for the city fuel consumption. We're already given the regression equation that is the best. But if we wanted, we can actually click on this icon here and take a look at the different regression equations from which that selection was made. In picking the best regression equation, remember that we're having to balance out three different values. We want the best P value, which is the lowest P value. And technically because the equations that we're looking at have different numbers of variables in them, we want to compare adjusted R squared values and not R squared values because adjusted R squared values account for the differences in the number of variables between equations. And the best adjusted R squared value is typically the highest. And then we look at the number of variables in our regression equation. And typically we want the simplest equation. And the simplest equation will have the fewest number of variables. So there's three different sets of criteria that we're looking to balance between to pick out the right equation.

Now the best equation that was selected to be this last one here. It has only one variable. It's got the best P value. Actually, all these equations here have the best P value, so we're not really using P value in our assessment. But then look at the adjusted R squared value. You notice it's not the highest. The highest is listed here; it's 0.936. Now me personally, if I were selecting this based on my experience working in industry, I would take the 0.936 because look at this regression equation. It's got two variables in it. For a little more complexity, you get a little bit more adjusted R squared value, so it's a little better predictor. So based on my experience in industry, that's what I would select.

That's not how the author of your textbook is thinking though, and the author of your textbook is the one who actually wrote out these homework problems. If I take my calculator here, I can show you what I think he's looking at. What's the percentage difference between these two adjusted R squared values? If I take the first one (0.936) and I subtract out the 0.920 --- so the difference between them is 0.016. If I divide that by the 0.936, see, there's only about a 2% difference between those adjusted R squared values. And so, from the way that the author of your textbook is thinking --- and this is just me thinking, I haven't actually talked to the guy --- but just looking at this 2% difference, so you make your equation twice as complex by adding in an extra variable. But you're only getting 2% benefit in that adjusted R squared value out of it. So from his line of thinking, that's not worthwhile. And that's why I think he's selecting this last equation here to be the best regression equation.

That said, we can actually use that regression equation. It's listed here in the problem statement and we'll just use that to make our prediction. So we have -3.16, and we're going to add to that the second term, which is 0.822 times the highway fuel consumption, which here is listed as 42 miles per gallon. And it says here "Type in an integer or a decimal. Do not round." So I'm going to put that in here. And the units on the fuel consumption will be miles per gallon, the same as the highway fuel consumption. You see here --- fuel consumption is in miles per gallon. So that's what I'm going to put here. Nice work!

Part 2

Now the second part of the problem has some drop down fields. So let's take a look at these. The predicted value is likely to be a good estimate. It's likely to be a good estimate because we used the best regression equation to bring us that estimate, and that regression equation has the best possible P value it could have. It has a really good adjusted R squared value. And it's got the lowest number of variables that you can have in the equation. So it's got the best balance of those three sets of criteria. And therefore it's likely to be a good estimate.

However, that estimate is not likely to be very accurate. The reason why it's not likely to be very accurate is your sample size is really small. You only got 21 cars. So you've got 21 data points in your data set. That's not a whole lot. So because we've only got a small sample size here, that regression equation is not likely to be very accurate. Nice work!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

0 Comments

Determining the sample space size for computer program variable names

5/28/2019

0 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to determine the sample space size for computer program variable names. Here's our problem statement: A common computer programming rule is that names of variables must be between one and eight characters long. The first character can be any of the 26 letters while successive characters can be any of the 26 letters or any of the 10 digits. For example, allowable variable names include A, BB, and M3477K. How many different variable names are possible? Ignore the differences between upper and lower case letters.

Solution

OK, to solve this problem, I'm going to whip out my calculator, and we're going to show you here how we can actually calculate this out, because we're gonna have to look at each of the different possibilities and put them together. So let's just start with the possibility that our character name is one character long. Most students, when they're solving this problem, they start at the other end. They say, "It's between one and eight characters, so I'll just start with the highest eight characters." But as I'll show you here later in the video, that leads to what we call a distractor answer option. And you don't want to pick the distractor because it distracts you from picking the right answer.

So let's just start with assuming you got one character name. So how many possibilities do we have? Well, the first character can be any of the 26 letters of the alphabet. So I've got 26 possibilities. Now what if the character name is two characters long? Well, I got to add in those possibilities. And if the character name is two characters long, I've got the first character [which] can be any of the 26 layers of the alphabet. The second character can be any of the 26 letters or any of 10 digits . So there's 36 possibilities for that second character.

Now what if we have a character name that's three characters long? We've got to add that in. So again, the first character can be 26 characters and then the remaining two characters, both of them can be 36 characters long. So I'll just square the 36 there to give me both of those. And now we're looking at what if the character name is four characters long. So I got 26 for that first character and then I've got 36 for the ones that remain. So let's raise that to the third power because we've got four characters total.

And now you see I'm just going to go through the same process for each of the remaining a possible characters in the character length. So I've got 26 here for the first one times 36 raised to the fourth power that takes care of five characters in the character name. And then we're going to look at six characters. So I've got a 26 in for the first and then 36 for the remaining five. And we're working our way up there. You can see the number here is getting bigger. I've got 26 for the first character, and then for six variable names --- or excuse me, six characters in the variable name, you've got 36 raised to the fifth power. We already had that with the fifth power. So let me go back and put that to the sixth power. Now this is seven variables in the seven characters rather in the variable name.

Something's not right. No, no. We're looking at it here. So I've got 26 times --- something doesn't look right. Uh, yeah, 36 raised to the seventh power, and wow! Hey, yeah, look at this. This is actually the distractor element that I was telling you about before. So this last term here is if we looking at eight characters, right? Because we've got 26 for the first and then 36 for the remaining seven. So this last term by itself is . . . so that'd be, that'd be that. There's the million mark. There's the one billion mark --- 2,037,468,266,496. And notice that's one of your answer options right here. This is a distractor element that I was telling you about before, because this number includes only the possibilities if your variable name is eight characters long, but it doesn't include all these other terms that are accounting for the variable name being less than eight characters long. So when we add all those together, now we get the proper answer, which you can see right here. Good job!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

0 Comments

Finding degrees of freedom, critical values, and a standard deviation confidence interval estimate

5/24/2019

9 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find degrees of freedom, critical values, and a confidence interval estimate for standard deviation. Here's our problem statement: Use that given information to find the number of degrees of freedom, the critical values, and the confidence interval estimate of sigma (the population standard deviation). It is reasonable to assume that a simple random sample has been selected from a population with a normal distribution.

Part 1

OK, here it says we need a 99% confidence level on our interval. Our sample size is 25, and our sample standard deviation 0.26. Using these values, we can actually give everything that we need to give to answer the questions in the problem. So the first part is asking us for degrees of freedom. Degrees of freedom is simply one less than the sample size. Here our sample size is 25, so our degrees of freedom will be 24. Well done!

Part 2

Now the next part asks us for the first of two critical values. The one on the left is what's being asked for here. To get the critical values out, I'm going to load up StatCrunch and access the distribution calculator that is to be found inside StatCrunch. Let's pop that window out, and let's resize it so we can get a better look at what's going on here.

So to access the distribution calculator, I need to go to Stat --> Calculators --> Chi-squared. I know I need the chi-squared calculator because, if I look here at the statistic I'm being asked to calculate, this is chi-squared. So I know I need the Chi-squared distribution. There are two that we're looking for, and so I'm going to have to use the Between option. Degrees of freedom are what we calculated in the first part. That's 24. Here in these spaces are where our critical values are going to show up. So what we need to put in here is the area in between the critical values, and that's the size of the confidence level, which in this case is 99%. So I put 99% in, I press Compute, and here we've got our two critical values. The one on the left (which is the subscript L) is the one we want first. And I'm asked around to three decimal places. Fantastic!

Part 3

And now I'm asked for the other critical value, the one on the right. You can see that with the subscript R. And we can just take that here from the calculator. Fantastic!

Part 4

And now the last part asks for a confidence interval estimate on the population standard deviation. To do that, I'm going to come back here to StatCrunch, and I'm going to go to Stat --> Variance Stats (because that's how we calculate anything with standard deviation inside the StatCrunch application) --> One Sample (because I have only one sample) --> With Summary (because I don't have actual data).

Here in the options window, I'm asked to calculate the sample variance. Well, we're not given variance here, but we are given sample standard deviation. If we remember that variance is simply the standard deviation squared, I can get the sample variance I need by squaring the sample standard deviation. So if I take out my calculator here, I put in that sample standard deviation, and I square it. That gives me this sample variance. So now I can put that number here in this field in my options window. The sample size we're given here in the problem. I want to flip this button for confidence interval, and I want to make sure that the level matches what's asked for in the problem, which is 99%. I hit Compute!, e voila! Here are the lower and upper limits for my confidence interval estimate.

But this is the confidence limit on variance. Notice this is variance. We need the confidence interval estimate for standard deviation. So I have to take these numbers here, and I have to take the square root of variance in order to get standard deviation. So if I just select that and copy it, and I come over here, and I can paste that in, take the square root, and that's my lower limit that I need to put in here. I want two decimal places. That would be 0.19. And the upper limit --- I want to do the same thing with the upper limit. Copy that out, come here, right click, paste that in, take the square root. There's my upper limit. Excellent!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

9 Comments

Finding the positive predictive value of a polygraph test

5/21/2019

3 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find the positive predictive value of a polygraph test. Here's our problem statement: The table below displays results from experiments with polygraph instruments. Find the positive predictive value for the test. That is, find the probability that the subject lied given that the test yielded a positive result.

Solution

OK, so we've got our table of data here, and we're asked to find that probability. It's a conditional probability, the probability that something happens given that something else is happening. To find the conditional probability that they want here, we're going to have to do a little calculation. And typically what we do is we take our data here and we dump it into StatCrunch. But I find that this type of problem is easier to do in Excel.

So I'm going to put my data in Excel. And in fact, I've already got the data loaded up here in Excel, so we can go to town on it. So here's our table that we see here in the problem statement. And the thing I like about Excel is that --- remember that probability is just the part over the whole. Well, I can get the numbers I need for the part and the whole really quickly here in Excel. The conditional probability that we need is really dividing two different probabilities. The first is going to be the probability that both of the events are occurring, and then we divide that by just the event that the one event occurred.

So the first probability we want is the probability that the subject lied. That's from the problem statement here. So that's this column here. And the probability that the results are positive --- so that's this cell here where you see the 40. So 40 is the cell there, and I don't know why that's not showing up differently, but yeah. So 40 is the cell that we're looking for. That's the part.

The whole that we want is everything all together. This is where Excel comes in really handy. So if I select everything, now here's my sum right here. It's summed everything automatically for me right down here. That's going to give me 107. So if I take 40 and divide it by the 107 --- you can do this in your calculator, but hey, Excel is a spreadsheet, and spreadsheets are made for calculation. So I'm just going to put that there. And then that's the first probability.

Now the second probability that we want is the probability of just the one event happening, which is the given event that the test shows a positive result. Well, the positive result is going to be the 20 plus the 40, which gives us 60. So I take the 60 and divide it by the whole (the 107).

And so now we've got the first probability here and the second probability here. So now it's just the one divided by the other. So if I take this first probability calculated and divide it by the second, that's the probability that we're looking for. And it looks like it's an even two thirds. Round to three decimal places. Well done!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

3 Comments

Performing hypothesis testing on soda can fill variation

5/17/2019

0 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform hypothesis testing on soda can fill variations. Here's our problem statement: Workers at a certain soft drink factory collected data on the volumes in ounces of a simple random sample of 23 cans of the soda drink. Those volumes have a mean of 12.19 ounces and a standard deviation of 0.09 ounces, and they appear to be from a normally distributed population. If the workers want the filling process to work so that almost all cans have volumes between 11.93 ounces and 12.53 ounces. The range rule of thumb can be used to estimate the standard deviation should be less than 0.15 ounces. Use the sample data to test the claim that the population of volumes has a standard deviation less than 0.15 ounces. Use a 2.5% significance level. Complete Parts A through D below.

Part A

Alright, Part A says, "Identify the null and alternative hypotheses." So we know that the null hypothesis is by definition a statement of equality, so we're not going to choose Answer option A or Answer option D. To select between Answer options B and C, we look at the alternative hypothesis that typically comes from the claim. And here the claim is that the population of volumes has a standard deviation less than 0.15 ounces. So our sigma, which is the standard deviation for the population, is going to be less than 0.15 ounces. We want Answer option B. Nice work!

Part B

Now Part B says, "Compute the test statistic." We can get the test statistic by performing our hypothesis test in StatCrunch. So I'm going to pull up StatCrunch here. And I'll pop that window out, and then I'm going to resize this window so we can see a little bit better what's going on here. OK, inside StatCrunch, I want to go to Stat --> Variance Stats (because this is how we test standard deviation; it's through the variance) --> One Sample (because I've got only one sample) --> With Summary (because I don't have actual data).

Here in the options window, I'm first asked for the sample variance. Well, look here in the problem statement, and we see that we're not given the variance, but we are given the standard deviation. If I square standard deviation, that gives me the variance. The variance is the square of the standard deviation. So if I pull out my calculator here and I take that standard deviation of 0.09 and I square it, that gives me the variance for my sample, which is 0.0081. My sample size is the 23 cans. My hypothesis test --- I want to make sure that this matches what we have over here, but notice how what we have here in our problem statement, the answer we selected, is a hypothesis test on standard deviation. Here in StatCrunch, we're looking at variance. So in order for these to match up, I've got to take this claimed value and I've got a square it.

So let's do that. I pull back my calculator, and that was 0.15. I square it, and that's the number you want to stick in here for your claimed value for your hypothesis test. I need to make sure that this inequality sign matches here for my alternative hypothesis. And now I'm ready to go. I hit Compute!, and here's my chi-square test statistic, which I can put here in my answer field. Fantastic!

Part C

Now, Part C asks for the P-value, which of course is the last value in that results window table right next door to the test statistic. Good job!

Part D

And finally, Part D says, "State the conclusion." Well, here we have a P-value of less than 1%, and we're comparing that with a significance value of 2.5%. So our P-value is going to be less than our significance level, and that means that we're inside the region of rejection. When you're inside the region of rejection, you reject the null hypothesis. And every time we reject the null hypothesis, there's always sufficient evidence. Good job!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve . And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

0 Comments

Using the Poisson distribution to evaluate malignant tumor data

5/14/2019

1 Comment

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use the Poisson distribution to evaluate malignant tumor data. Here's our problem statement: A rare form of malignant tumor occurs in 11 children in a million, so it's probability is 0.0000114. Cases of this tumor occurred in a certain town which had 10,787 children. Part (a) Assuming that this tumor occurs as usual, find the mean number of cases in groups of 10,787 children. (b) Using the unrounded mean from Part A, find the probability that the number of tumor cases in a group of 10,787 children is zero or one. (c) What is the probability of more than one case? (d) Does the cluster of four cases appear to be attributable to random chance? Why or why not?

Part A

OK, Part A asks for the mean number of cases. So to calculate the mean, I'm just going to whip out my little calculator here. And the mean is just going to be total number of children that we're evaluating, which is 10,787, multiplied by the probability that these children are going to actually have one of these malignant tumors (so 0.000011). There's the mean number of cases; you can see it's pretty small. I'm asked around to three decimal places, so I'm going to do that. Well done!

Part B

Now Part B asks for the probability that the number of cases is exactly zero or one. Notice here in the problem statement it asked us to use the unrounded mean from Part A. So the probability I'm going to get with the Poisson distribution. And to do that I'm going to pull up StatCrunch. And then the unrounded mean we're just going to take directly from the calculator here. So the first step is to load up StatCrunch. And I'll pop that out here. Then we get to resize this window so that it's more visible to see what's going on. Excellent!

Now I'm going to pull up the Poisson calculator by going to Stat --> Calculators --> Poisson. Here's my Poisson calculator. Notice I have to put the mean value in, and that's what the calculated value from Part A is all about. I could just type this in, but I'm prone to transcription error. So I'm just going to make it easy on myself and just right click on there and copy that value. Come over here, and right click again, and Paste, and there's my mean value. Now we want to get the probability the number of cases is exactly zero or one. So that's the same as being one or less. So less than or equal to one gives me --- oh, 99%. So I round to three decimal places, and put that in here. Well done!

Part C

Part C asks for the probability of more than one case, so what I need to do is switch this around. That's the same thing as saying two or more, so greater than or equal to two. And there's my value there. Notice that the number I'm typing in here is the complement of the probability from the first part. And that actually makes sense, because this part here is for the number that's zero or one, and this is for anything that's greater than one. So it makes sense that the probabilities here are complements. So instead of using the calculator here in StatCrunch, I could have just taken this number and subtracted it from 1. Excellent!

Part D

And now Part D says, "Let a probability of 5% or less be very small and a probability of 95% or more be very large. Does the cluster of four cases appear to be attributable to random chance? Why or why not?" Well, we can see that for the four cases out of the 10,787, you know the probability of that happening is going to be less than seven tenths of 1%, which is the probability for more than one case. And it says here anything 5% or less is considered very small. So the probability that we're getting these four cases is going to be very, very small. That said, it actually happened. And when something that is highly unlikely to happen because it has a very small percentage of probability of happening actually happens, then you got to ask yourself, What's going on here? Because that doesn't occur by random chance. That's not attributable to the random nature of life; that's attributable to some machination somebody or something is doing something to create this unlikely event from hap---to happen. So we would say no, it's not random chance because the probability of it happening is really, really small. And that's going to be this answer option here. Fantastic!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

1 Comment

Finding the area under a Normal distribution curve using z-scores in StatCrunch

5/10/2019

2 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find the area under a normal distribution curve using z-scores in StatCrunch. Here's our problem statement: Find the indicated area under the curve of the standard normal distribution, then convert it to a percentage and fill in the blank: About <blank> percent of the area is between z = -1 and z = 1 or within one standard deviation of the mean.

Solution

OK, there's three ways to work this problem. The first is to use the normal distribution calculator in StatCrunch. To do that, I need to call up StatCrunch. I'm going to pop this window out, and then I'm going to resize it so that we can see everything a bit better here. And now in StatCrunch, I go to Stat --> Calculators --> Normal. Here's my Normal calculator in StatCrunch. And notice here I'm looking for the standard normal distribution. The standard normal distribution has a mean of 0 and a standard deviation of 1. And that's the defaults that we see here in StatCrunch. So we're gonna leave those alone.

We're looking for the area in between two z scores. The standard normal distribution has z-scores already encoded into its horizontal --- horizontal axis here. So this is a z-score of 1, and this is a z score of -1. So we want the area in between, and that means we need to select this Between button up at top. And notice the default values that I'm putting in for the boundaries of that area in the middle of my distribution, -1 and 1. These are the very numbers that I wanted. And so the area underneath the curve is 0.68268949. This is the area underneath the curve. So to convert from decimal to percent, notice that we have to move that decimal point over two places. So now I've got 68, and then we want two decimal places beyond that, so 0.27. I check my answer. Well done!

And that's the first way to do it. And it's actually pretty easy to do it this way. A lot of students go this route. The second way to solve this problem is with the z score tables. And uh, yeah, that's a little but more involved. I show you how to do that in the lecture videos. So if you want to go old school like that, you know, feel free. It's a little more involved than what you see here in StatCrunch. Notice in StatCrunch, I just pulled it up and there it was --- so much easier to do this in StatCrunch.

The third way to solve this problem is simply to recognize that what we're looking for here is part of the Empirical Rule. The Empirical Rule says that the area or the amount of data in between 1 standard deviation of your mean is going to be 68.27%. So if you recognize this as the Empirical Rule (and you do have that memorized, right?), then you should be able to just pop the number in without even using StatCrunch --- even easier! So for those of you out there who haven't yet memorized the Empirical Rule, I highly recommend doing that because there's a lot of problems where it actually comes in pretty useful.

And so again, that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

2 Comments

Using the Levene-Brown-Forsythe test for standard deviation hypothesis testing

5/7/2019

0 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use the Levene-Brown-Forsythe test for standard deviation hypothesis testing. Here's our problem statement: The accompanying data table includes weights in grams of a simple random sample of 40 quarters made before 1964 and weights of 40 quarters made after 1964. When designing coin vending machines, the standard deviations of pre-1964 quarters and post-1964 quarters must be considered. Use the Levene-Brown-Foresythe test and a 5% significance level to test the claim that the weights of pre-1964 quarters and the weights of post-1964 quarters are from populations with the same standard deviation.

Part 1

OK, the first part of this problem asks us for the null and alternative hypotheses. We're asked to let the weights of pre-1964 quarters be Population 1 and the weights of post-1964 quarters to represent Population number 2. Well, this is pretty simple with the null alternative hypothesis. The null is always a statement of equality. And we see that among our three options here, we have that same null hypothesis for each answer option. So what distinguishes them is the alternative hypothesis. We're testing the claim that the two populations have the same standard deviation. So that means that the standard deviation is equal and also the variance is equal, but equality belongs to the null hypothesis by definition. So the alternative hypothesis is going to have to come from the complement of our claim, which is that the two are not equal to each other. So we want to select Answer option A. Excellent!

Part 2

And now the second part of this problem asks for the test statistic. And this is the beginning of many, many struggles for students who want to pull their hair out (assuming they have any hair on their head) because this just frustrates them to no end. They see that we're testing standard deviation. So of course they want to run the variance testing through StatCrunch. But notice why that doesn't work. We're asked for our test statistic, which is a t-score. See the T down here. There's no Chi-squared, it's a T. And this is why students are getting this part of the problem incorrect consistently; it's because they're not using the right procedure.

Here in the problem statement it says, "Use the Levene-Brown-Forsythe test." So what is the Levene-Brown-Foresythe test? Well, that's a test where you transform the data using the median value of each of your samples. And then with the transformed data you perform an independent t-test. So that's we're going to do to get our test statistic here for this second part.

Of course the first step to do that is to dump the data into StatCrunch. So here we can dump the data into StatCrunch. I'm going to resize this window so we can see everything just a bit better. OK, now here in StatCrunch, the first thing we need to do is transform our data. And when you're transforming data in StatCrunch, you use the menu option Data --> Compute --> Expression. So now here in my expression window, these things get really picky. Computers are very detail oriented, and I'm prone to error. So I always like to come over here and press Build. I could just type it in, but again, I feel like I'm too prone to error. I'm just going to press Build.

And then the transformation for the data is to take the absolute value of the difference between each individual data value and the median value. So to do that, the first thing I want to do is put it in the absolute value function, which is the first function listed here in the functions list. So I select that function and then press Add Function. Now I want to take --- we're going to transform the first data set. So I take the column for the first --- the first column for the data, press Add Column, and I want to subtract from that the median. Now, the median is another function here in the functions list. So I'm going to scroll down here. So I get to median, I'll select the median, add it to my function, and then I want to select again the same column of data. So it takes the median for that column and subtracts that from each value in the column, then takes the absolute value. I click OK, I click Compute!, and here's a new column here where I have my actual transformed data.

I need to perform the same series of steps to transform the second sample of data. So again, I'm going to go to Data --> Compute --> Expression. I press Build, and I'm going to go through the same steps that I went through before to build the function that's going to transform that second column of data. Now I've got the transformation on my second column.

And I'm going to take these two columns of transformed data, and I'm going to perform an independent t-test. To do that, I go to Stat --> T Stats --> Two Sample --> With Data (because I have actual data here in StatCrunch). The first column of transformed data is my first sample. The second is the second, and we want to make sure that this inequality sign matches the one from our alternative hypothesis. And we see that it does match. So now we're all set. I hit Compute!, and here's my test statistic --- the T-stat, second to last value in that table there. So when I put that value there in my answer field, I'm asked to round to three decimal places. I check my answer. Well done!

Part 3

And now the next part asks for the P-value, which of course is always right next door to the test statistic, the last value there in that results window table. And I'm asked to round to four decimal places. Conveniently, that's the number I have there in the results window. Excellent!

Part 4

And now this last part of the problem asks, "What is the conclusion for this test?" Well, my P-value is a little more than 4%. We're comparing that with a 5% significance level. So the P-value is less than the significance level, which means we're inside the region of rejection. So I'm going to reject the null hypothesis. And every time I reject the null hypothesis, there is always sufficient evidence. Fantastic!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

0 Comments

Evaluating the effect of data transformations on normally distributed data

5/3/2019

0 Comments

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to evaluate the effects of data transformations on normally distributed data. Here's our problem statement: The heights in inches of men listed in the encompassing table have a distribution that is approximately normal, so it appears that those heights are from a normally distributed population. Complete Parts A through C.

Part A

Part A says, "If three inches is added to each height, are the new heights also normally distributed?" To verify that, we're going to have to take the data and transform it. And we can do that in one of two ways. The first way is we could actually get our data set, which we have here, and copy these numbers into Excel, perform the data transformation in Excel (which is very simple), and then copy the data back into StatCrunch so we can verify normality. That's one way to do it. I'm going to show you how to do this entirely in StatCrunch.

So the first thing I need to do is dump the data into StatCrunch. And I'm going to resize this window so we can see everything a bit better. OK, now to transform the data, we want to --- here it says three inches are to be added to each height. So to do that, I'm going to come up to Data --> Compute --> Expression. Here in this options window, I can actually type the expression I want to calculate the new transformed data here in this field. But you know, computers are really picky, and my fingers are kind of fat. So I'm going to just go ahead and select Build.

And here in the build window, what we want to do is we want to enter the expression that the computer needs in order to transform the data. To do that, when you look over here, we're going to add three inches to each of our heights. So the first thing we do was put in the column where our data is located. And notice I have to select a column and press Add Column down below. Just selecting the column doesn't do anything to change my expression or build it. When I press Add Column, notice how it appears here. So now we're starting to build it. So we take the value that's in that first column and we're going to add three inches when I press "+ 3". If I want, I could type that in on my keyboard on my computer; it works just as well. This is the expression we need to add 3 to each of those values. So I'll go ahead and click OK. And then press Compute!. And here's a new column here with my transformed data.

So now we can go ahead and check that for normality and we should know how to do that by now. First, we're going to go up to Graph, and we're going to construct a histogram to look at the shape of our histogram. And yeah, it's, you know, the general trend more or less --- low at the beginning, high in the middle, low at the end. Everything's connected; there's no outliers. So yeah, this is looking more or less OK.

But of course the real definitive view of normality is with a normal quantile plot or QQ plot. So I go to Graph --> QQ plot. I select my transformed data, Normal quantiles on the y axis, hit Compute!, and yep, that's not looking too shabby there. So we're going to say, Yes, we still got a normally distributed data population. Fantastic!

Part B

Now Part B asks, "If each height is converted from inches to centimeters, are the heights in centimeters also normally distributed?" Well, to do that, we have to transform from inches to centimeters, and there's 2.54 centimeters for every one inch. Now if I want, you know, I could come down here and I could click on say, you know, a search engine. I was saying I want to go from, say, like inches to centimeters. Hey here we go. And I find that there for every inch is indeed 2.54 centimeters. Now I could use a conversion calculator like this to convert each of my individual data values, but it's much easier to just let the computer do everything at once for you. And to do that, we need to do another data transformation in StatCrunch.

So back here in StatCrunch, we're going to go to Data --> Compute --> Expression. Back here in the --- in our build window, we're going to go to this build window here by selecting the build button. Then we want to multiply each value in that first column. So I select the first column, Add Column, and then "times 2.54" because there's 2.54 centimeters for every one inch. I go ahead and click OK, Compute!, and here's my transformed data set. I can come back to my histogram, make a new histogram with that newly transformed data, and again, it's looking, uh, kinda iffy, but yeah, we might be OK. Let's check out our normal quantile plot with the new transformed data. Yeah, I'd say it's looking OK. So we're going to select Yes. Fantastic!

Part C

And now Part C asks, "Are the logarithms of normally distributed heights also normally distributed?" OK, let's check that out. So back here in StatCrunch, I go to Data --> Compute --> Expression. Let’s go to my build window? I'm going to select the first column, but wait! Before I select the first column, I need to actually put this inside a logarithmic function. So I need to select the function first. You do that by typing it in, or we can select from the list of functions here in the functions list.

So if I scroll down here, I can actually get down to the logarithms. And you see there are multiple logarithms to select from. I don't know why they have a separate logarithm for log and log10, because when you don't specify the base, by default the base is 10. So log and log10 are going to give you the same thing. I'll just go ahead and select log, Add Function. Notice how it didn't add just by selecting it. I got to select that button for Add Function to get it to show up here. And now I can go ahead and put my column in. And now this is going to take the logarithm of each of the different values in my original data set. I select OK.

And notice how we've not been using this column label field. When you leave this field blank, what it does is it actually gives a title to the column based on the expression that you provided. This is really helpful because notice what we've got here. We've got different columns of transformed data, and the way we tell what's what is with the title here for the columns. But because we left this blank, StatCrunch just went ahead and used the transformation as the title for the column. That's very convenient when you're actually transforming data in multiple columns. So again, we're going to leave this blank. And now we've got our transformed data.

So I can go back and check out my histogram. I have to put the new data in and ... well yeah, that's looking OK. I just feel kind of fuzzy about --- I mean this is quite a bit higher than this. I mean, yeah, I know it's 3 and 1, but still it just --- it just kind of looks awkward to me. Let's go ahead and check the normal quantile plot. That's the real definitive view here. And normal quantile plot --- uh, I don't know. That's looking kinda kinda iffy where you got kind of above, and then it's below, and it's above, and it's below, it's above and it's below, it's above and it's below. You've got that S-sinusoidal pattern, and that's pretty indicative that you don't have a normally distributed data set. So I'm going to come over here and select No. Fantastic!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

0 Comments

Intro

Part 1

Part 2

Intro

Solution

Intro

Part 1

Part 2

Part 3

Part 4

Intro

Solution

Intro

Part A

Part B

Part C

Part D

Intro

Part A

Part B

Part C

Part D

Intro

Solution

Intro

Part 1

Part 2

Part 3

Part 4

Intro

Part A

Part B

Part C

Author

Archives

Stats

Company

Support