Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform a sign hypothesis test of median heights. Here's our problem statement: Use a 5% significance level to test the claim that there is a difference between the actual and reported heights in inches for 12 to 16 year old boys. The data is listed in the table below. Let mu1 denote the mean of the first variable and mu2 denote the mean of the second variable. Part 1 OK, so here we've got the first part of our problem, and we're asked to find the null and alternative hypotheses. So the null hypothesis will always be a statement of equality. And the alternative hypothesis typically reflects the claim. Here our claim is that there's a difference between actual and reported heights, so that means we're going to have not equal to as our inequality sign. And that combination of equal-not-equal-to is found here. Good job! Part 2 Now the second part wants us to find the test statistic. You may be thinking since we have actual data here that we can just dump this into StatCrunch and then, you know, have StatCrunch perform the hypothesis test for us. Unfortunately, the sign hypothesis test feature of StatCrunch works only if you've got less than 25 for a sample size. So we actually have more than that for sample size here because, if you look, we've got one, two, three, four, five, six, seven, eight, nine, ten columns, three rows. 10 times 3 is 30. So StatCrunch will not calculate our test statistic for us. And that's why we can't actually use StatCrunch for this dataset. We can, however, use Excel to kind of accelerate the “old school” way of doing this. So let's go ahead and just dump this data in Excel, and I actually have that right here. So here we have the data here in Excel. So the first thing I need to do is get my differences so I can count the number of positive signs and negative signs. So I'm just going to take the first value and subtract it from second value, so the first minus the second. Then I'm going to take this formula, and I'm going to copy it all the way down. I could drag it, but dragging is really useful only if you've got, like, a few cells to drag it to. I've got more than just a few, so I'm going to copy this by pressing Ctrl+C on my keyboard. And then using my arrow keys, I'm going to go down to the bottom of the list by pressing Ctrl while I hit the down arrow. That takes me to the bottom of the list. So I want these copied cells to end here. So now I'm going to put my --- see I've got my, I've got that cell selected there --- and now I'm going to go and press Ctrl+Shift on my keyboard while I press the up arrow. It takes me to the top of the list. Now these cells that are shaded are going to be the ones where I want to copy my formula. So now I press Ctrl+V to copy and, look, it's all there for me. When I press Ctrl+Down it takes me --- wow, it's the bottom of my list. And so now here, right down here, I'm actually doing my count. So let's count positives first, and then let's do negatives. So here I've got the counting of the number of the positives, and I'm going to use the COUNTIF function to do that. COUNTIF says we're going to select where we want to do our counting, but we only want the computer to include a cell in the count if it meets certain criteria. And here we're going to have the criteria be that the number is positive. So I open my parentheses, and now I need to select my range. And I do that by going up to this next cell up here and then Ctrl+Shift+ [on my keyboard] Up arrow takes me to the top of the list. There's my range. I put in a comma so I can put in the next element of the function. Now it's asking for the criteria. Typically we put the criteria in quotation marks. And we want these to be positive numbers; that's going to be greater than zero. And I close my parentheses, and now there's my formula. I hit Enter, and it automatically counted all the positive numbers for me. I'm going to do the same thing with the negative numbers so we can get a quick count of them. And you know, if it we're just a handful of numbers, I wouldn't mind just counting them myself, but I got more than just a little handful here. Alright, so we've got 18 and 10; that adds to 28. We've got a couple of zeros in the list. Here's one here. And if we scroll up a little bit, we can see the second one here. So zeros, of course, are not included in our counts because we don't really want those included. So now I've got 18 positive and 10 negative. Now I got my summary stats that I can use to actually calculate my test statistic with. And to do that, we're going to go back to our handy dandy z-score formula. So here we're going to see that X is the less of those two numbers. So that's going to be 10. And N is going to be the sum of those two numbers, which is 28. So now we substitute those into our formula and --- well, that should actually be a 28 there. Well, that's a typo. And then, yeah, so we got that number fixed right here. And then we just simplify that expression. Punch it out on the calculator, and here comes our test statistic: -1.32. Nice work! Part 3 Now we're asked to find the P-value. And here StatCrunch actually is rather helpful for finding the P-value. So we're going to go back to the --- I mean, because alternatively, we could use the z-score tables, but I'm lazy. I like the 21st century. I want to use technology. And we're going to have to work a little bit anyway to get our P-value because we've got a two-tailed test. So here in StatCrunch, I want to go to Stat --> Calculators --> Normal. Here in my calculator I want to select the Between option because, as you see here, we have a two tail test here. I'm going to put in my test statistics. So on the negative side -1.32, and I want to put in all those decimal places that I had before. So let me move this over so I can stick all that in and get 1.322875. And I put the positive version of my test statistic here. And now StatCrunch has calculated the area in between the tails. The P-value is the area in the tails. So I want to take this number, and in my calculator I'm going to subtract that from 1. And there's my P-value. I’m asked to round to four decimal places. So let's see, that brings me out to there. Excellent! Part 4 And now the last part asks, "Is there sufficient evidence to support H1?" Well, supporting H1, or the alternative hypothesis, is the same thing as rejecting the null hypothesis. Can we reject the null hypothesis? Well, we've got a P-value of almost 19%. It's well above our significance level of 5%, so we're outside the region of rejection. We fail to reject the null hypothesis, and therefore we fail to support the alternative hypothesis. Good job!
And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.
0 Comments
Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to identify and interpret a linear correlation P-value using its definition. Here's our problem statement: For a data set of weights in pounds and highway fuel consumption amounts in miles per gallon of 10 types of automobile, the linear correlation coefficient is found, and the P-value is 0.004. Write a statement that interprets the P-value and includes a conclusion about linear correlation. Solution OK, here, uh, the statement they want us to write is already mostly written. We just have to fill in a few of the blanks. The first part of the statement says, "The P-value indicates that the probability of a linear correlation coefficient that is at least extreme as something percent." Well, this first part is the definition of the P-value itself. And so when they give you the P-value and they're asking you for, again, the P-value, but here notice the percent sign. They want you to take this decimal and convert it to a percent.
That's easy enough done. Just move that decimal place over two places to the right. So 0.004 becomes 0.4; this is a very low value. And of course when the P-value is low, that indicates that there's statistical evidence for a linear correlation. And that's really all there is to it. Fantastic! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find a probability of a positive test result from a combined water sample. Here's our problem statement: To reduce laboratory costs, water samples from four public swimming pools are combined for one test for the presence of bacteria. Further testing is done only if the combined sample test positive. Based on past results, there is a 0.008 probability of finding bacteria in a public swimming area. Find the probability that a combined sample from four public swimming areas will reveal the presence of bacteria. Is the probability low enough so that further testing of the individual samples is rarely necessary? Part 1 OK, The first part of this problem asks us to find the probability of a positive test result in the combined sample. The combined sample will test positive if any one of the four samples taken from each of the four public swimming pools tests positive. So we got to look at the probability of at least one of these swimming pools testing positive. Well, the probability that one of those swimming pools is going to test positive is going to be 0.008, as it says here in the problem statement. But we want the probability that we're going to find at least one of those pools. OK, so the probably that an individual pool test positive is 0.008, [and] we want the probability that the at least one of those swimming pools from all four considered together is going to be a testing positive for bacteria. So to calculate that, I'm going to bring up my calculator here. And the first thing I'm going to do is calculate the probability that we're not going to find the bacteria in any one of the swimming pools, which is just the complement of the probability they give us here. So I subtract that out from one, and that gives me 99.2%. So this is the probability that we're not going to find any bacteria in an individual swimming pool. So now if I take that and raise it to the fourth power, because I've got four swimming pools, and then I take this and subtract, this is now the probability that I'm going to find bacteria — excuse me. This is the probability that I'm not going to find bacteria in any of the swimming pools. So to find that probability that we — that we don't find it, which is what we're actually looking for, the probability of having a positive test result, then I'm going to have to take this and subtract it from one because that's how — that's the probability of at least one. It's one minus the complement. And this is the complement. So if I take that and I make it negative and then add it to one, that's going to be the same thing as subtracting it from one. And lo and behold, here's the probability of a positive test result. I'm asked around to three decimal places. Well done! Part 2 And now the second part of this problem asks, "Is the probability low enough?" So the further testing of the individual samples is rarely necessary. Well, here we've got 3%, which is a pretty low value. So yeah, I would say that 3% is low enough. So we don't need to --- rarely need to do any further testing. So I'm going to go and click the answer option that tells me that. It should be this one right here.
Whoops, what did I do wrong? Oh, it says "will not be a rarely necessary event" and it will be a rarely necessary event. You've got to watch the --- the little details in the words here; that'll trip you up. Excellent! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to conduct hypothesis testing on multiple linear regression coefficients. Here's our problem statement: The coefficient beta-1 has a non-zero value that is helpful in predicting the value of the response variable. If beta-1 is equal to zero, it is not helpful in predicting the value of the response variable and can be eliminated from the regression equation. So to test the claim that beta-1 equals zero, use the test statistic t equals beta-1 minus zero divided by s-sub-b. Critical values or P-values can be found using the t distribution with n minus k plus one degrees of freedom, where k is the number of predictor variables and n is the number of observations in the sample. The standard error s is often provided by software. For example, see the accompanying technology display which shows that s-sub-b-1equals 0.075033268, found in the column with the heading of standard error and the row corresponding to the first predictor variable of height. Use the technology display to test the claim that beta-1 equals zero. Also test the claim that beta-2 equals zero. What do the results imply about the regression equation? Part 1 OK, I can imagine a lot of students looking at this problem and thinking, What in the world have I got myself into here? Because this sounds like a bunch of gobbledygook! But really the problem is much more simple than what the problem statement would lead you to believe. Because if we look here at our technology display, notice here how we've got the intercept, which has no coefficient, and then we've got two predictor variables here, height and weight --- excuse me, height and waist. And we've got coefficient values for those variables here. Standard error has been calculated here as was said in the problem statement. So if we look at this first one, 0.075, we see that one is listed right here, surrounded in the red box. But notice how all of this stuff here where we're talking about calculating a test statistic and P-values and all that stuff that we need for hypothesis testing, all of that's already calculated for us here in these columns here over at the end. Notice this says "T stat" and this says "P-value." So the values that we actually need are found here in this table. And so we don't need to calculate anything, we just need to put the right numbers in the right places in our answer fields. So to test the claim that beta1 equals zero, first we need to find our null hypothesis, which of course is a statement of equality. Here our claim is a statement of equality. Well, we can't have that be our alternative hypothesis because equality by definition belongs with the null hypothesis. So this is going to be our null hypothesis. So I select that option from the drop down. The test statistic is located here in the table just as we got done talking about. So beta-1 corresponds with the first predictor variable, which is height. Intercept is not a predictor variable. It's just an actual constant number there that's in your equation, so there's no variable associated with intercept. The first predictor variable is height. If I come over here and look at my t-statistic, my t-statistic is right here. So all I have to do is put that number here in my answer field. And I'm asked to round to three decimal places, so I'm going to do that here. The P-value is right next door. And we can see that with a P-value of zero, we're going to reject the null hypothesis, because with the P-value of zero, no matter what your — no matter what your critical values are, no matter what your confidence level is going to be, you know, the level of significance you're going to be inside that region of rejection. And when you're inside the region of rejection, you reject the null hypothesis. Well, if we're rejecting the null hypothesis, then that says that beta-1 is not equal to zero. And so that means we're just going to keep the value that we see here listed in the table from our regression equation. So I'm just going to put that value in here and say that it should be kept. Well done! Part 2 Now the next part wants me to test the claim that beta-2 equals zero. Well, we're just going to go through the same process that we did before with the first hypothesis test. So we're going to select a statement of equality for our null hypothesis. The test statistic is located here on the last row because the last row corresponds with the second predictor variable, and that's where we're looking for beta-2 to correspond with the second predictor variable. So here's our test statistic. And we slip that in here, again rounding to three decimal places. The P-value again is zero. So that means we're inside the reason of rejection, we reject the null hypothesis, and that means we're just going to keep the same coefficient that we got here from the regression equation. Fantastic! Part 3 And now the last part of his problem asks, "What did the results imply about the regression equation?" Well, we've kept both of the coefficients from our regression equation, so that tells us that, you know, we should include both of them in our regression equation. So let's look through our answer options. And I'm going to select the answer option that says we should include both of them. And that's this one here. Fantastic!
And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to conduct mean hypothesis testing on IQ and lead level data. Here's our problem statement: Listed in the data table are IQ scores for a random sample of subjects with medium lead levels in their blood. Also listed are statistics from a study done of IQ scores for a random sample of subjects with high lead levels. Assume that the two samples are independent simple random samples selected from normally distributed populations. Do not assume that the population standard deviations are equal. Complete Parts A and B below. Use a 10% significance level for both parts. Part A OK, Part A asks us to test the claim that the mean IQ scores for subjects with medium lead levels is higher than the mean for subjects with high lead levels. And the first thing we're asked to do is to provide the null and alternative hypotheses. We're also instructed to assume that Population 1 consists of subjects with medium lead levels and Population 2 consists of subjects with high lead levels. So the null hypothesis is going to be a statement of equality, which it always is by definition. So we're going to be looking at Answer options B and D. And then to select the one with the proper alternative hypothesis, we look at this assumption that we were given here at the very end of our problem statement. Population 1 is — has the subjects with the medium lead levels; Population 2 are the subjects with the high lead levels. And we're testing the claim that the medium lead levels have a higher mean score than the high level. So the medium lead levels, [which] are Population 1, will have a higher mean score than the high lead levels, [which] are Population 2. So 1 will be greater than 2, and that's what we see here with Answer option B. The mean of Population 1 is greater than the mean of Population 2, so that's what we'll select for our answer. Good job! Now the next part of Part A asks us to provide the test statistic, and we can do this inside StatCrunch. Notice when we look at our data here, they're going to make us work a little bit here. We've got summary statistics for the high lead level population, or — excuse me, the high lead levels sample. But here for the medium lead level sample, we've got actual data. So we can get around this pretty easy. All we have to do is just calculate summary statistics for the medium lead level group, and then we've got summary stats for both groups. And we can use those summary stats to conduct our hypothesis test. So the first step I'm going to do is put this data for that first sample here in StatCrunch. I'm going to resize this window so we can see better everything that's going on. Now here in StatCrunch, I'm going to get my summary stats by going to Stat --> Summary Stats --> Columns. I select that column of data. I want to calculate the sample size, sample mean, and the sample standard deviation. And here's my numbers right here. So I'm going to move this results window down to the bottom. Now we have everything we need to calculate the test statistic by performing our hypothesis test. So to do that, I'm going to go to Stat, I go to T Stats (because we don't know what the population standard deviation is), Two Samples (because we have two independent samples), and we have With Summary (because we don't have actual data for both samples, but we do have summary statistics for both of the samples). Here in Sample 1 that — remember that was defined as the medium lead level group, and those are the summary stats that we just calculated a moment ago. So I'm going to put those numbers here. Actually the sample mean is 90.81. And notice I'm rounding to three decimal places because here those equivalent values that were given here and these summaries are rounded to three decimal places. I do the same thing with the standard deviation. And now I put in the sample statistics for the second sample. Now I scroll down here, make sure this button for Hypothesis test is selected --- that is the default, and that is what we want. This area here needs to match what we got over here with our non alternative hypothesis, so I need to change this inequality sign. And now I'm ready. I hit Compute!, and here's my test statistic, second to last number there in my results window. I'm asked around to two decimal places. Good job! Now the next part asks for the --- the next part is asking for the P-value. That's the last value there in the data table right next door to the test statistic. I'm asked to round to three decimal places. Excellent! OK, now I'm asked to state the conclusion from my test. A P-value of over 30%, we're using a significance level of 10% --- 30% is over 10%, so we're outside the reason of rejection, which means we fail to reject the null hypothesis. And every time we fail to reject the null hypothesis, there is not sufficient evidence. So we don't want Answer option D because that says there is sufficient evidence. We want Answer option A because we failed to reject the null hypothesis, and whenever we do that, there's not sufficient evidence. Good job! Part B Now Part B of this problem asks us to construct a confidence interval, which we can do reasonably well enough. Go back to your options window here, and I'm going to scroll down and switch this radio button down to confidence interval. And I need to put in a confidence level. Normally we would take our significance level and subtract it from 100%, but here we've got two alpha that we have to subtract because we've got two samples, so we're looking at, yeah, 20% that we subtract from 100%. So that gives us a confidence level of 80%. And there's my lower and upper limits right there in the results window. So all I have to do is just transfer those numbers over. Well done! Part C And now for the final question: Does the confidence interval support the conclusion of the test? Well, let's take a look at where's zero in our confidence interval. Is it inside or outside? Zero is inside our confidence interval, and so therefore it's possible that these two values could be the same. It's possible that the difference here could be zero. And if the difference is zero, that means these two could be the same. Well, if they're the same, then that's exactly what we set up here in our null hypothesis. So if they're the same, then that means the null hypothesis could be true. And if the null hypothesis is true, we don't want to reject it. So we're going to fail to reject the null hypothesis. And that's exactly the conclusion that we made right here. So yes, the confidence interval does support the conclusion of the test because zero is found inside the confidence interval. Fantastic!
And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to choose the best multiple regression model. Here's our problem statement. The accompanying table shows results from regressions performed on data from a random sample of 21 cars. The response variable is CITY (fuel consumption in miles per gallon). The predictor variables are WT (weight in pounds), DISP (engine displacement in liters) and HWY (highway fuel consumption in miles per gallon). If only one predictor variable is used to predict the CITY fuel consumption, which single variable is best and why? Solution OK, to solve this problem, we first need to take a look here at the table of regression equations that we have to select from. And if you notice we've got all sorts of different options here, but the ones that are going to be used are the ones here at the bottom that have only one predictor variable in them, because here in the problem statement we're only looking at the ones that have one predictor variable.
So to predict the best model, we're going to have to balance out optimum values for three different items here. The first is the P-value, the second is the adjusted R-squared value, and the third is the number of variables. The number of variables has already been taken care of for us from the restriction here in the problem statement. So all we have to do then is balance [the] P-value and adjusted R-squared value for the three possibilities here that have only one predictor variable. Well, all of the equations have the same P-value, and it's the best P-value you could possibly have — zero. So we can't use the P-value to make a determination of which model is the best. So we have to look to [the] adjusted R-squared value. And the reason why you want to use the adjusted R-squared value and not the R-squared value is because the adjusted R-squared value is adjusted for the differing numbers of variables in the different models. That tends not to be a big deal with what we're looking at here because we're restricted to just one predictor variable. But normally you don't have that restriction. And so looking at the adjusted R-squared value is always preferred over the R-squared value. So here we've got 0.696, 0.64 --- these are kind of in the same ballpark. And then right here, 0.92 --- [a] much better adjusted R-squared value for this last model here. So this is the one that we're going to select. It has the best combination of a small P-value, which is zero, and a large adjusted R-squared value, which you can see there in the table [is] 0.92. Good job! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use a nonstandard normal distribution to evaluate pregnancy and premature birth data. Here's our problem statement: The lengths of pregnancies are normally distributed with a mean of 266 days and a standard deviation of 15 days. Part A: Find the probability of a pregnancy lasting 309 days or longer. Part B: If the length of pregnancy is in the lowest 2%, then the baby is premature. Find the length that separates premature babies from those who are not premature. Part A OK, Part A is asking us to find the probability that a pregnancy will last 309 days or longer. To do this, I'm going to use the normal distribution calculator in StatCrunch, and I know I need the normal distribution calculator because here in the problem statement it says my data is normally distributed. So the first thing I need to do is open up StatCrunch. And let's resize this window so we can see everything a little bit better. Now in StatCrunch, I go to Stat --> Calculators --> Normal. Here in my Normal calculator, I need to establish the mean and the standard deviation, because the defaults here are for the standard Normal distribution and we have a nonstandard Normal distribution. Here in the problem statement, it says the mean is 266 days. So I need to put that in here. And the standard deviation is 15. And then I'm asked for the probability that a pregnancy will last 309 days or longer. Well, look at how this is ordered here. Probability is P, x is my random variable --- that's going to be the 309 days --- but it needs to be 309 days or longer, which means that this is greater than or equal to 309. So I got to flip that around. And notice here on the other side of the equals sign is my probability. This is a probability in decimal form. I'm asked to round to four decimal places, so I just type that in here. Fantastic! Part B Now, Part B asks for the number of days in which a birth would be considered premature. So babies who are born on or before how many days are considered premature? And to do that, I go back to my distribution calculator. And this is the number we're looking for now. So we get rid of that premature birth from the problem statement. We're looking for the lowest 2% of the distribution. That's the left tail of the distribution. So I need to change this to 2%. And now we just switch this around so I can get the left side and not the right side of my distribution. Now we're at the left tail of the distribution. This is the lowest 2%, and this boundary here for the tail is the value we're looking for. Rounding to the nearest integer gives me 235. Nice work!
And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Constructing & interpreting a standard deviation probability distribution with a 3-number population7/16/2019 Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to construct and interpret a standard deviation probability distribution with a three-number population. Here's our problem statement: Assume a population of 2, 6, and 7. Assume that samples of size n = 2 are randomly selected with replacement from the population. Listed below are the nine different samples. Complete Parts A through D below. Part A OK, Part A asks us to find the value of the population standard deviation, sigma. Well, that's easy enough to do. Once we put --- first, I'm going to put my samples here into StatCrunch. I don't actually need the samples to calculate the population standard deviation, but they're going to come in handy for the rest of the problem. So let's just go ahead and do that now. And I'm going to resize this window so we can see better what's going on. To calculate the population standard deviation, I need to have the population here in the data table in StatCrunch. The population are these three numbers here listed in the problem statement --- 2, 6, and 7. So I'm going to go to this next free column here in StatCrunch, and I'm going to label this column "Population." And then I'm going to put in the actual numbers for my population: 2,6, and 7. Now I can take the standard deviation of this column to find the population standard deviation. I do that by going to Stat --> Summary Stats --> Columns. Here in the options window, I'm going to select the column where that data is located. And then many students make the mistake of selecting this first standard deviation option here under the statistics window. The reason why that's a mistake is because this standard deviation is for samples. We're asked to calculate the population standard deviation, which means you need the unadjusted standard deviation. And to get that, you have to scroll down towards the bottom of the list here and select Unadjusted standard deviation. This is what you use when calculating standard deviation for a population. So I hit Compute!, and out comes my population standard deviation. I'm asked around to three decimal places. Good job! Part B Now Part B asks us to develop the probability distribution for the standard deviation of each of the nine samples. To do that, we're going to go back here to StatCrunch. And notice where the samples are actually located. They're located in rows. So the first sample is a 2 and a 2 located here in the first row. The second sample is located here in the second row. The third sample located here in the third row, and so on and so forth. So to calculate my standard deviation for these samples, I'm going to go to Stat --> Summary Stats --> Rows. Normally we select Columns, but here I'm going to select Rows because my data, the samples, are in rows. Here in the options window, I'm going to select the columns where my sample data is located. And then I'm going to go down here under Statistics, and the statistic we're calculating is standard deviation. This is for samples, so I can select that first standard deviation there in the statistics list. And I press Compute!, and out comes a window with all of my standard deviations for each of the nine samples there in my dataset . What we're looking for is a probability distribution. So we have the numbers we need here. We just need to assemble them into a distribution. The easiest way to do that is to sort these numbers, and I can do that very easily. If I come up here and click on the little arrow here in the title to the column in my results window, notice how everything's now sorted. And the default is to sort first by largest to --- I mean, excuse me, from lowest to highest number. If I want to start the other way, I just click again on that arrow, and now it's sorted from largest to lowest. But here in the promise statement, we see it says, “Use ascending order," which means from lowest to highest. So I'll click this again. Each time I click this, notice it's just toggling back and forth between those two settings. This is the setting I want, from lowest to highest. So to create my distribution, I'm going to look first at the number that is listed here first. So the first number here is 0. So in this dropdown here, I'm going to select 0. And then the probability is the part over the whole. So the part is how many zeros do I have, which are three, and the whole is how many numbers do I have total, which is nine. I've got nine numbers total. Don't look at this last number and think you've got nine, because see here, everything's been sorted so it's in a different order. So I've got nine numbers total. Three is the part that they're zero. So I've got three out of nine, or three over nine, which reduces to one over three. The next number in the dataset is the 0.7. So I'm going to select that here. How many do I have? I have two of them, so I've got two out of nine. The next number is 2.82, so I'm going to select that. I've got two of them. So that's two over nine. And then the last number 3.53, so I'm going to select that. And I have two of those --- two divided by nine. And that's really all there is to it. Well done! Part C Now Part C asks us to find the mean of the sampling distribution of a sample standard deviation. So what that means is to take these numbers that we used to create our sample sampling distribution and then find the mean of these numbers. Well, I can do that easily in StatCrunch if these numbers were listed in the data table, but they're not. They're here in an actual window. So to put them in the data table, I'm going to go back to my options window, and I'm going to check this box next to Store in data table. This will tell StatCrunch to put the results in the data table, where we can then perform further calculations on it, instead of in a separate window. So now I've got my numbers here in the data table. And now I can just calculate the standard deviation. I go up to Stat --> Summary Stats --> Columns, select that newly created column, standard deviation for the samples, and there's my mean --- excuse me, I want this. I'm off in another world here. I want the mean, the mean of the sampling distribution. So there is my mean value for the sampling distribution, 1.571. We were asked to round to three decimal places. Fantastic! Part D And now the last part, Part D asks, "Do the sample standard deviations target the value of the population standard deviation?" Sometimes these problems ask you to compare numbers. So in this case you might have been asked to compare the population value, which here is 2.16, with the mean of the sample values, which is 1.571. They don't equal to each other, and therefore we don't --- the sample doesn't target the population. The statistic doesn't target the parameter. Well, because of this, the samples that we're using here are so small, sometimes that doesn't play out quite that way. And for an unbiased estimator, you get numbers that are actually pretty much the same because you're using small samples.
So I always advise students when answering these types of questions about, you know, are your statistics targeting your parameters? Do the samples target the population? Just go by memorizing a list of biased and unbiased estimators. And we talk about those in the lecture video. So sample --- the standard deviation is going to be an unbiased estimator, so it doesn't target the population parameter. So the answer options that say that it's unbiased are going to be incorrect, because standard deviation is a biased estimator. What did I just say? Did I just say it was unbiased? Gee, I am in another world. Standard deviation is a biased estimator. And so we want to select --- here we've got Answer option A and Answer option C. A biased estimator means that the sample does not target the population. So we're going to want to select here Answer option C. Fantastic! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to construct and interpret a relative frequency distribution from categorical data. Here's our problem statement: Among fatal plane crashes that occurred during the past 70 years, 620 were due to pilot error, 85 were due to other human error, 308 were due to weather, 267 were due to mechanical problems, and 386 were due to sabotage. Construct the relative frequency distribution. What is the most serious threat to aviation safety, and can anything be done about it? Part 1 OK, I actually think this problem is a little bit easier to solve in Excel, but I'm going to solve it in StatCrunch. In StatCrunch, we're actually going to have it do everything for us. Now we can actually calculate this stuff out, and I could take my calculator and do all the old school calculations, and that's a legitimate way to approach this problem. But I'm lazy. I'm going to have StatCrunch do everything for me. So I know it's tempting to say, "OK, we're going to make a bar plot, and we've got data here, so why don't we just press With Data?” Well, this isn't actual data. This is actually a summary of the data. We don't have the actual individual counts. We just have a summary of the counts. So because this is a summary, I'm going to select With Summary after selecting Bar Plot. The categories are the cause, and the counts are in the frequency column. I'm going to select Relative frequency. Or if I wanted to, since these are in percent form here, let's just go ahead and just select Percent. And then all we have to do is just take the numbers straight over. I'm going to tick Value above bar. This is what's going to give me the numbers that I need on my graph to put here into the answer fields in my assignment. And that's all I need to do. Hit Compute!, and out comes this wonderful little bar graph. Now notice how that, first of all, everything's just really small as far as the typeface goes. You can barely see it. And there's actually a zoom feature that we can use. But first I want to make sure that we get this in the right order, which I didn't do previously. So back in my options menu, I need to make sure I order by worksheet. And that way it'll put in the same order that we have here in our assignment. Now I've got this ordered correctly. Now to get the zoom feature, there's these three bars that you see here in the lower left. Go ahead and click on that, and then hit Zoom. And now, when I click on the zoom tool, I can zoom in on area, I can move this around, and now there's the number I need to put in for pilot error, 37.2. I hit the X to go back. Whoops, that's not what I wanted to do. I want to go back this way. And now I just go ahead and just do the same thing for each of the following categories. And it really is that simple. Now to do this the old school way, I'd have to take the sum of the numbers that are listed here, and I could do that easily enough in StatCrunch. And then once I have that sum, I go ahead and divide by the --- each of these counts by the total sum, and that gives me the same percentages that you'll see here. And I can show you that in a moment as soon as I get done with all of this business. So we've got two more numbers to put in. I'm going to put that in here. And we've got one more. Excellent! Now to illustrate what I was showing you before, just very quickly, if you go up to Stat --> Summary stats --> Columns, at the frequency, we want to get the sum. The sum is 1666. So if I take that 1666 and divide it into each one of these numbers, I'm going to get the same numbers out. So if I took the first number, 620 for pilot error, and I divided by 1666, notice we get the same 37.2% that is the correct answer. And we can do the same thing for each of the others in succession. But like I said, I'm lazy. I just let StatCrunch do all that calculating for me. Part 2 The next part of the problem asks, “What is the most serious threat to aviation safety, and can anything be done about it?" Well, the most serious threat is going to be the category with the largest percentile. And that's going to be the 37.2% related to pilot error. Can you do something about pilot error? Yeah, you could probably train your pilots better. So let's see, we got --- yeah, right here, Answer option D: "Pilot errors are the most serious threat. Pilots could be better trained." So I go ahead and select that one. Excellent!
And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, and let us know how good a job we did or how we can improve. And if your stats teacher is boring, or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to determine the appropriate level of measurement for Olympic years. Here's our problem statement: Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate for the data below: Years in which an Olympics was held. Solution - Example 1 OK, we're given four different answer options here, each one corresponding to the different levels of measurement. And what's interesting here is that we've got definitions of each of the different levels to help us select the right answer. So we look at the definitions here --- the ordinal level of measurement --- and this says that "the data can be ordered but differences cannot be found or are meaningless." Well, we can find differences between different years, so that's obviously not going to be the right answer. "The nominal level of measurement is most appropriate because the data cannot be ordered." Well, yeah, the data actually can be ordered. That's the whole point of having years. We can order them from low to high or high to low. That's the interval level of measurement. It "is most appropriate because the data can be ordered" --- that's true. "Differences can be found and are meaningful" --- that's true. And "there's no natural starting zero point" --- that's true. The zero point for years is just an arbitrarily chosen value, so it's just something that's accepted by convention. There's no natural point for zero, and so the interval level of measurement is what we have. And when you see years, you need to think interval level of measurement because the two pretty much go together. The final answer --- ratio level of measurement --- would not be correct because it says here "the data can be ordered," which is true. "Differences can be found or are meaningful" --- that's true. "There is a natural starting point," and that's what we don't have with the years. So the correct answer here is the interval level of measurement. Fantastic! Solution - Example 2 Let's go through one more example just to illustrate what we've got here. So now we've got the number of houses that people own. Well, the number of houses people own, would that be the ratio level of measurement? Yeah, probably, because look, the data can be ordered, the differences can be found and are meaningful. I mean, you've got one person who's got two houses, and one person's got one. That extra house --- that's a meaningful difference. There is a natural starting zero point. It's like you got zero houses. That's a natural place to start counting something. So ratio level of measurement is what we would select here. Fantastic!
And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. |
AuthorFrustrated with a particular MyStatLab/MyMathLab homework problem? No worries! I'm Professor Curtis, and I'm here to help. Archives
July 2020
|
Stats
|
Company |
|