Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use StatCrunch to perform hypothesis testing on the proportion of polygraph results. Here's our problem statement: Trials in an experiment with a polygraph include 98 results that include 23 cases of wrong results and 75 cases of correct results. Use a 1% significance level to test the claim that such polygraph results are correct less than 80 percent of the time. Identify the null hypothesis, alternative hypothesis, test statistic, P-value, conclusion about the null hypothesis, and final conclusion that addresses the original claim. Use the P-value method. Use the normal distribution as approximation of the binomial distribution. Part 1 OK, the first part of our problem says, “Let p be the population proportion of correct polygraph results. Identify the null and alternative hypotheses. Choose the correct answer below.” To form our null and alternative hypotheses, we need to go back to our problem statement and look for the claim. The first step is to identify the claim. So here we find in our problem statement, it says the claim is that “polygraph results are correct less than 80% of the time.” So that means our proportion is going to be less than 80% of the time; that's the claim. Now the claim has no semblance of equality in it. Therefore, it will become our alternative hypothesis. The null hypothesis will then be a statement of equality because that's what it is by definition. So we look at our answer options here, and answer options A, E, and F are obviously wrong because they use 20% for their values and the value that we're looking for for our claim is 80%. So we want to choose between answer options B, C, and D. Look at the null hypotheses. In each case, is a statement of equality to be found? In each instance, the answer is yes, so we can't use that to differentiate our answer. Look at the alternative hypotheses. Which one matches the claim? Well, the one that matches the claim is here in answer option D. So I select that and check my answer. Excellent! Part 2 Now the next part of our question asks us to identify the test statistic. To do this, we're going to go into StatCrunch. So I’m gonna pull up StatCrunch here, and inside StatCrunch, I'm going to go to Stat –> Proportion Stats (because I'm looking for proportions) –> One Sample (because I'm only given one sample), and With Summary (because I don't have actual data but I do have some summary statistics in the problem statement). In the options window that appears, the number of successes — remember that the population proportion is identified as being those with the correct polygraph results. So looking at the problem statement here, how many of the sample results are correct results. That's going to be the 75. So that's what I put here in the first field. The number of observations is the total number in the sample; that's 98. I want hypothesis testing. Here we need to match these fields with the correct null and alternative hypothesis that we identified earlier. So I change these fields to match. And once I've done that, I'm ready to hit Compute! and get my results. Here in the results window towards the end, we see the second to last value in that table is my Z-statistic. So that's the number I'm going to slip in right here rounded to two places. I check my answer. Well done! Part 3 The P-value — I get that from the same results window here in StatCrunch. Well done! Part 4 And now the last part of our problem asks us to “identify the conclusion about the null hypothesis and the final conclusion that addresses the original claim.” To do this we're going to use the P-value method as we were instructed earlier. This means we compare our P-value with our significance level, alpha, from our problem state.
And we see that alpha equals 1%. Our P-value is almost 20%, and since that is greater than 1%, our P-value is greater than alpha. And that means we are outside the rejection region. Therefore we're going to fail to reject the null hypothesis. And because we fail to reject there is not sufficient evidence to support the claim. I check my answer. Nice work! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.
7 Comments
Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to identify Type I and Type II errors. Here's our problem statement: Identify the Type I error and the Type II error that correspond to the given hypothesis: The percentage of households with Internet access is less than 60%. OK, in order to identify these errors, we need to know what they are, and to that end I prepared a small presentation to help us understand what they are. With every null hypothesis that we have, the null hypothesis could be true or could be false. We also have two actions that we can take with a null hypothesis; we can either reject the null hypothesis or fail to reject the null hypothesis. In the event that the null hypothesis is actually false, we want to reject it. We do this because we want to accept the null hypothesis only if it is true. So when the null hypothesis is actually false, we want to reject the null hypothesis. And when we do, we make a good conclusion. However, if the null hypothesis is actually true and we reject it, well now we've done something we don't want to do. We want to accept the null hypothesis if it's actually true, but if we end up rejecting it, that leads to what we call an error. And specifically when we reject the null hypothesis when it's actually true, this is what we call a Type I error. On the flip side of the coin, if we're going to fail to reject the null hypothesis, we want to do that when the null hypothesis is actually true. So when we do, that leads to a good conclusion. However, if we fail to reject the null hypothesis when it's false, that is another error, and that's what we call a Type II error — failing to reject a false null hypothesis. Part 1 Let's bring this back to the problems that we have at hand. We're first asked to identify the Type I error. What I like to do when I look at these types of problems is identify what is the null hypothesis. The hypothesis that we're given in the problem statement is not the null hypothesis; it is the alternative hypothesis. We know this is true because by definition the null hypothesis is a statement of equality, and this statement that we have in the problem statement says “the percentage of households with Internet access is less than 60%.” This is not a statement of equality. Look at the words here — “is less than.” This must then be the alternative hypothesis. To create the null hypothesis, we simply make this statement one of equality. So the percentage of households with Internet access is 60%; this is our null hypothesis. The next thing I do is I remind myself of what the definition is of the error that I'm looking for. So Type I error means we're rejecting the null hypothesis when the null hypothesis is actually true. I write this definition out because what I'm going to do next is substitute in just like it were a mathematical equation this null hypothesis that we've identified into the definition. So everywhere we see “null hypothesis,” I'm going to replace that with “percentage of households with Internet access equals 60%.” This gives me what you see here. “The Type I error is rejecting that the percentage of households with Internet access is 60% when the percentage of households with Internet access is 60%.” Now I look at my answer options to determine which of these most closely matches what I've produced here. And doing that, the Type I error is going to be rejecting the null hypothesis, so it's going to be answer option A or C. “When it's actually equal to 60%” — that means we're looking at answer option A. I select that option and check my answer. Good job! Part 2 And now the second part asks us to identify the Type II error. To do that, I'm going to go back and repeat the same process that I used with the Type I error. So I write out the definition for a Type II error. The Type II error is failing to reject the null hypothesis when the null hypothesis is actually false. I then make the substitution as I did before, and I get “the Type II error means failing to reject that the percentage of households with Internet access is 60% when the percentage of households with Internet access is less than 60%.”
Now I look at my answer options and see which ones most closely match what I’ve ended up written. Here again it looks like it’s options A and C because we want to fail to reject the null hypothesis when that null hypothesis is actually false, meaning we're going to look at the alternative hypothesis of less than 60%. Again, that's answer option A. I check my answer. Nice work! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is just boring or doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can find out more about accessing our lecture videos or give us feedback to provide us with what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use a z-score to complete hypothesis testing for a claim of equality. Here's our problem statement: The test statistic of z = 2.45 is obtained when testing the claim that the population proportion is greater than 30%. Part A: Identify the hypothesis test as being two-tailed left held or right tailed. Part B: Find the P-value. Part C: Using a significance level of alpha (α) = 5%, should we reject HO or should we fail to reject HO? Part A OK, Part A wants us to identify if the hypothesis test is being two-tailed, left-tailed or right-tailed. To do that, we simply look at the claim that we're making. Remember that this often comes from the alternative hypothesis that we create, and very often the alternative hypothesis reflects the claim that's being made. Here the claim that's being made is that the population proportion is greater than 30%. If you look at the inequality sign — the greater-than sign — it's like an arrow pointing to the right that indicates that this is a right tailed test. So I'm gonna put that here in my answer field. Well done! Part B Now Part B asks us to find the P-value. To do this, I'm going to pull up StatCrunch and get into my Normal calculator. The only thing we have to go on is the z-score, and so that's why I'm pulling up the Normal calculator. So to do that, I go to Stat –> Calculator –> Normal. And here in my Normal calculator, I'm going to reflect the alternative hypothesis, or the claim here in my problem statement. But in order to do that, I have to do that through the z-score. Notice the default values that come up in my Normal calculator are for the standard normal distribution. This is what we want, so I'm going to leave those values alone. Then here are my probability fields down below. I'm going to make sure that this inequality sign matches the tests that I'm running. In Part A we concluded that we're doing a right tailed test, so that means this inequality sign needs to be greater-than-or-equal-to here. In the next field, I'm going to put the z-score that they give me. And now I press Compute! and this area underneath the curve that's shaded in red on your Normal calculator is this number here — 0.007. And that is the P-value. So I'm going to put that here in my answer field. Fantastic! Part C And last but not least, Part C asks us to choose the correct conclusion below. We're either going to fail to reject HO or we're going to reject HO. And the way we do that is by comparing the P-value to our α level. The α level that we’re given here is 5%. The P-value that we're given is seven tenths of 1%, so the P-value here is less than our α value. Therefore, we are in the region of rejection, and we're going to go ahead and reject HO and say that there is sufficient evidence to support the claim.
Rejecting HO and saying there's not sufficient evidence doesn't add up; it's not consistent. So if we reject HO, there's going to be sufficient evidence to support the claim. If we fail to reject HO, there's going to be insufficient evidence to support the claim. Here we're actually going to reject HO because our P-value is less than our α value. Well done! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below. Let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't care to help you learn stats, go to aspiremountainacademy.com, where you can find out more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Finding and interpreting a confidence interval for a population standard deviation given sample data3/21/2018 Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find and interpret a confidence interval for a population standard deviation given sample data. Here's our problem statement: The values listed below are waiting times (in minutes) of customers at two different banks. At Bank A, customers enter a single waiting line that feeds three teller windows. At Bank B, customers may enter any one of three different lines that have formed at three teller windows. Answer the following questions. Part 1 OK, the first part says, “Construct a 99% confidence interval for the population standard deviation σ at Bank A. Notice we have our data listed here just below the problem statement. I'm gonna click on this little icon to the right so I can open my data in StatCrunch. Now my data is here in StatCrunch, I’m ready to go ahead and find the confidence interval estimate for the population standard deviation. To do that, I'm going to go up here to Stat –> Variance Stats (because we don't have standard deviation, only variance) –> One Sample (because I'm only dealing with one sample) and With Data (because I'm given actual data here in my StatCrunch data table). In the options window that appears, I'm gonna select the column where my data appears, in this case Bank A, select the radio button for confidence interval, and then make sure my confidence level matches that of the problem statement. Then I come down here and click Compute! to produce this lovely results window. Now, when you're making confidence interval estimates in StatCrunch for proportions and for means, you can take the lower and upper limit from the ends of the table here and put them in the answer field. However, when you're asked to find a confidence interval for standard deviation and you're using StatCrunch to solve that, the only way you can do that is by looking at the variance and not the standard deviation. How then do we get the standard deviation? Well, notice the variance is simply the standard deviation squared. So if we take the square root of these limits here in the table, we can get the values that we seek. To start, I'm going to copy the value here for the lower limit, and then I'm going to pull up a calculator so I can paste the number in with Ctrl+V on my keyboard. Oh, it didn't like that. Let's try this again. I'm going to copy that number from StatCrunch, then in my calculator, put the number in, take the square root, and here we want to round to two decimal places, so now we have 0.33 for the lower limit. I repeat the procedure again for the upper limit. Now that I have limits in, I check my answer. Well done! Part 2 Now the second part says, “Construct a 99% confidence interval for the population standard deviation σ at Bank B. Well, I could go through the same process again with the menu options and what not, or I could just come to my results window here and click on the Options button located in the upper left corner. In the drop down menu that follows, I select Edit, and I'm right back in the options window that I had previously. And I just switch my data from Bank A to Bank B, hit Compute! and now I've got new numbers. I have to do the same thing again, take the square root of my upper and lower limits to find the values that I need. Good job! Part 3 And now the last part of the problem says, “Interpret the results found in the previous parts. Do the confidence intervals suggest a difference in the variation among waiting times? Does a single line system or the multiple line system seem to be a better arrangement?” Well, let's scroll back so we can see the confidence intervals for the two different banks. And if we go back to our problem statement, we see that the Bank A customers enter a single waiting line, but at Bank B customers enter a multi-line system.
So Bank A is the single-line system, and we see here that the range here is about a little less than one minute. For Bank B with the multi-line system, we see that the range for our confidence interval is just above three minutes, so a significantly longer range here for Bank B than for Bank A. That means there's much more variation here for Bank B which has the multi-line system than for Bank A which has a single-line system. So the single-line system will have lower variation, and given that this confidence interval, the bulk of it, is far less than the bulk of the confidence interval for Bank B, we can also surmise that there's typically going to be less wait time at Bank A then at Bank B. There's a slight overlap between the upper limit for Bank A and the lower limit for Bank B, but the bulk of the confidence interval for Bank A is less than that for Bank B. So we can say that there's going to be on average less variation for the waiting times in Bank A than for Bank B. So we would actually prefer to be in line in Bank A rather than Bank B. Now let's scroll back down and look at our answer options. If we compare the different answer options, we can see that Answer Options A and B will not be correct because it says the multi-line system has a lower variation, and we just showed that it's a single-line system that has the lower variation. So this leaves us with Answer Option C and D. The difference between Answer Option C and D is to select which of the two systems appears to be better. And as we saw by comparing the confidence intervals, the single-line system appears to be better. This leads us to select Answer Option C. Good job! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't care to help you learn stats, go to aspiremountainacademy.com, where you can find out more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find and interpret a confidence interval for a population mean when the population standard deviation is unknown. Here's our problem statement: A clinical trial was conducted to test the effectiveness of a drug for treating insomnia. In older subjects before treatment, 24 subjects had a mean wake time of 102 minutes. After treatment, the 24 subjects had a mean wait time of 96.4 minutes and a standard deviation of 43.7 minutes. Assume that the 24 sample values appear to be from a normally distributed population, and construct a 90% confidence interval estimate of the mean wake time for a population with drug treatments. What does the result suggest about the mean wake time of 102 minutes before the treatment? Does the drug appear to be effective? Part 1 OK, the first part of this problem asks us to “construct the 90% confidence interval estimate of the mean wake time for a population with the treatment.” This is most easily done inside StatCrunch, so I'm gonna pull up StatCrunch. Notice I have StatCrunch already set up in a separate window. I could come over here in Question Help and then select StatCrunch from the list, but then the window that pops up for StatCrunch would be confined inside this window where I need to put my answers. So in order to clear the field so I can see my work and then put answers in the answer field, I like to have StatCrunch set up in a separate window. Here in this separate window, we don't actually need any actual data. We have summary statistics listed here in the problem statement, and so we're going to use those to construct our confidence interval. However, before we get into StatCrunch, because we're asked to find a confidence interval estimate for the mean, we need to ask ourselves the key question: Do we know what the population standard deviation is? The answer to that question is no. We have listed here in the problem statement a standard deviation value, but that's for the sample of 24 subjects. We don't have a standard deviation value given for the population. Therefore, we don't know what it is. And therefore we want to use the Student-t distribution to calculate our confidence interval estimate. If we knew what the population standard deviation was, then we would use the standard normal distribution and calculate z-scores to get our confidence interval estimate. But here we don't know what it is (and that's typically the case), so we're going to use the Student-t distribution. Therefore, up here in StatCrunch, I want to select Stat –> T Stats (because we're using the Student-t distribution) –> One Sample (because we only have one sample for our summary statistics that we're looking at), and then I don't have any actual data to put into StatCrunch, so I'm gonna select With Summary. Here in the options window, I need to put in the summary statistics that I found in the problem statement. So the mean wake time that we want is for the group of the treatment because that's what we're asked to make the confidence interval estimate for — a population with the treatment. So here after treatment, the 24 subjects had a mean wake time of 96.4 minutes. So here in this first field, I'm gonna put 96.4, and then in the next field the standard deviation value, and then the sample size in this last field. Then I'm going to come down here and select the radio button for Confidence interval and make sure the confidence interval levels match what I'm looking for. Then I just press Compute!, and out pops this lovely results window with the upper and lower limits for my confidence interval. So I just put those here into my answer field. I check my answer. Nice work! Part 2 Now, the second part of the problem asks me, “What does the result suggest about the mean wake time of 102 minutes before the treatment? Does the drug appear to be effective?” To evaluate this, we need to take this value we're asked to evaluate — 102 minutes — and see where it lies with respect to our confidence interval estimate.
Is 102 inside the confidence interval or outside the confidence interval? Here 102 is inside our confidence interval. Therefore, it could potentially be the mean value of those who actually have the treatment for the population. Remember this mean time that we have up here — 96.4 — is for our 24 subjects in the sample. We want to know about the population. So the sample mean is 96.4, but the population mean could be anywhere between 81.1 and 111.7. 102 is inside our confidence interval. Therefore, it could potentially be the mean, and therefore if it's the same mean, then there's no change from what we had previously before applying the drug treatment. Therefore, the drug doesn't seem to pose any significant effect. If, on the other hand, the value we’re asked to evaluate is outside the confidence limit, well, then it seems to have some sort of significant effect. But that's not the case here. So the confidence limit interval here actually includes the mean wake time, so that means before and after the treatment could potentially be the same. This means that the drug treatment does not have a significant effect. Check my answer. Good job! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below. Let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't care to help you learn stats, go to aspiremountainacademy.com, where you can find out more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find a sample proportion and the corresponding confidence interval. Here's our problem statement: In a survey of 3363 adults aged 57 through eighty five years, it was found that 89.4% of them used at least one prescription medication. Complete Parts A through C below. Part A OK, Part A asks how many of the 3363 subjects used at least one prescription medication. This is easily answered with the help of a calculator, so I'm going to pull out my calculator here. And all I need to do is take the total number that I'm given and multiply it by the proportion that I'm looking for. In order to use this percent in a calculation, I need to convert it to decimal. So the first thing I'm going to do is put in the total number 3363 and multiply it by the proportion, convert it to a decimal form so I have to move that decimal point over two places to the left. And here I have my answer. Now what I need to provide is an actual whole number, because you don't count partial people; you only count whole people. At least, I hope you're not counting partial people! Here our instructions say, “Round to the nearest integer as needed.” So I'm just going to follow the regular rules of rounding here. I got 0.522; that means I'm going to round this number up to 3007. Excellent! Part B And now, Part B asks us to “construct a 90% confidence interval estimate of the percentage of adults aged 57 through 85 years who use at least one prescription medication.” This is most easily accomplished in StatCrunch, so I'm going to pull up my StatCrunch window. Notice I have StatCrunch opened in a separate window. I often do this when I'm working these problems because you never know when you're going to need it, and they don't actually give you any sort of icon here inside the problem for you to open up StatCrunch. I could go up here to Question Help, and then here StatCrunch is an option here. But I always like to just keep it separate in a window so I'm sure that I absolutely have it. To construct a confidence interval estimate, we don't need any actual data; we have summary stats here in our problem statement. So I'm going to go up to Stat –> Proportion Stats (because I'm constructing a confidence interval on proportions) –> One Sample (because I'm only given one sample), and then I want to select With Summary because I don't have actual data to use; I only have summary statistics. Now here in my options window, the first thing I need to do is determine the number of successes. This is simply the proportion that they're going to give you in the actual problem statement. But you need to enter it as a whole number. We calculated that in the previous part of the problem — 3007. The number of observations is the total number that's in the sample. That's this number here. Then I click on the radio button for confidence interval and make sure that I have the right confidence level in the confidence level field. Once I have that put in, all the other defaults are fine, so I press Compute! and out comes my results window with these two numbers on the end — the upper and lower limits for my confidence intervals. Notice, however, that the answer fields have a percent after them. That means I have to convert these numbers from decimal to percent, so I have to move the decimal point two places over. Then I put my answer in. So this first one is going to be 88.5, and the second one is going to be 90.3. I check my answer. Excellent! Part C Now Part C asks, “What do the results tell us about the proportion of college students who use at least one prescription medication?” Well, go back and look at the actual data that we have collected. We have adults aged 57 through 85 years. But the question is asking us about college students, so how representative of that population is our sample? In other words, how many college students are we gonna find in a sample of people aged 57 through 85 years? The answer is you might find a few; there are some older individuals who go back to college just for their own personal enrichment. They're not looking for any sort of career, but you're not gonna find a whole lot.
So no more than a small handful of people in that sample are actually going to be college students. That means our sample is not representative of the population that we're interested in. And therefore it can't really tell us very much. Remember, in order to make conclusions about a about a population from a sample, that sample needs to be representative and characteristic of the population. Here we don't have that connection. Therefore, the conclusions we make about this sample should not be applied to that population. The answer option that best matches that conclusion here is Answer Option A, “The results tell us nothing.” I check the answer. Good job! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't care to help you learn stats, go to aspiremountainacademy.com, where you can find out more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to create a normal quantile plot to find z-scores. Here's our problem statement: Use the given data values (a sample of female arm circumferences in centimeters) to identify the corresponding z-scores that are used for a normal quantile plot, then identify the coordinates of each point in the normal quantile plot. Construct the normal quantile plot, then determine whether the data appear to be from a population with a normal distribution. Part 1 OK, the first part of our problem asks us to list the z-scores for the normal quantile plot. This is most easily done by constructing a normal quantile plot in StatCrunch, so I'm going to take my data here, click on this icon to the right, and select Open in StatCrunch. Now that my data is open here in StatCrunch, I'm going to construct my normal quantile plot. And to do that, I'm going to come up here to the top and select Graph –> QQ Plot. In the options window, I select the column where my data is located, and then I'm going to check this box down here next to Normal quantile on y axis. For whatever reason, whoever coded StatCrunch decided the convention to use would be to put the normal quantile on the x axis. I don't know why they decided that, because everywhere I've seen a normal quantile plot in industry and in my days since, I've always seen the normal quantile on the y axis. This is the convention that Pearson uses for your assignments, so when you're asked to compare normal quantile plots, construct one, select the proper one, so on and so forth, the normal quantile always appears on the y axis. So you want to check this box to make sure you're comparing apples with apples. When I press Compute!, I get this lovely normal quantile plot. Now I can get the z-scores just by taking the y-coordinate from each of the ordered pairs that appear here in my plot. To do that, I'm gonna put my cursor over the first one, and notice how I have the Y and the X values for that point listed in a little window. So all I need to do is just copy those numbers over. So I'm gonna come over here, put my cursor in the first answer field, and then put my cursor back over here over this first plot and type in that value for the Y — the normal quantile. And then I'm gonna use my Tab button to tab between different answer field options as I move my cursor to each succeeding point. And I'm just going to type these in one at a time until I get every single one of them in. (I think that first one needs adjusting — it does need adjusting.) Now I check my answer. Fantastic! Part 2 Now the second part of our problem asks us to “identify the coordinates of each point in normal quantile plot.” Obviously I'm just going to go through the same process that I went through before, only this time in my answer field I'm going to type in ordered pairs. That means I have to put my x and y values in parenthesis separated by a comma. So this first one will be 31.9, -0.96. And I just continue on for each one in succession, and I'm using the Tab button on my keyboard to jump my cursor between answer fields in my assignment. Notice also that the box from what you're taking your ordered pair coordinates list the Y first and then the X, but the convention with ordered pairs is to list the X first and then the Y. So make sure you get everything in the right order. And I need to correct this first point because we're rounding to two decimal places. Fantastic! Part 3 The third part of our problem asks us to “construct a normal quantile plot and select the right option from four options given.” We've already constructed a normal quantile plot, but I find it easier to make the comparison for the right answer if I match the values on my axes for the normal quantile plot that I constructed in StatCrunch with those for my answer options. Notice here in the answer options, the x axis has a minimum value of 30 and a maximum value of 50, and the y axis has a minimum value of -2 and a maximum value of +2. So I'm going to change the axes on my normal quantile plot to match. To do that, I'm going to select this little three-line icon in the lower left-hand corner of my graph. And when I left click on that icon, I get a menu where I can change the x-axis and the y-axis as well as change some of the graphical display on the graph. I'm going to select X-axis to change the x-axis first. I want a minimum value of 30 and a maximum value of 50. And I'm going to change the y-axis so that it matches also. And I'm going to reduce the width a little bit so that I get something that more or less matches what I'm seeing in my answer options. Most students, when they get to this point, they're looking for an exact match. But the way this is constructed, you just want to look for a general trend. So look at the general pattern that the points in your graph are making, and find the answer option that has points with a similar pattern. Here that answer is going to be Answer Option B. Notice how the points don't exactly match up. For example, here this first point in the graph we constructed is above the red line, but in the answer option we selected it's below the red line. And you can see differences if you look at the last two points in the graph as well. However, we're looking for a general pattern of the points with respect to each other, not with respect to the red line. So I'm going to select this as my answer option. Excellent! Part 4 And now, the last part of the problem asks, “Do the data come from a normally distributed population?” Well, in our normal quantile plot, what we see are points that more or less conform to that red line. Yes, it's not exact. Yes, there's some deviation of the points from the red line. But for the most part, what we're looking for is a general trend. So don't get caught up in little details when you're answering these types of questions. Just look for a general trend.
The general trend here is that the data points are fairly close to the line. And they're not making any sort of pattern like an S- or sinusoidal pattern that would indicate something other than a normal distribution. So I'm going to conclude that, yes, these points are representing what is reasonably close to a straight line. That is Answer Option C. Nice work! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below. Let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't care to help you learn stats, go to aspiremountainacademy.com, where you can find out more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find probabilities for different sample sizes using a nonstandard normal distribution. Here's our problem statement: The overhead reach distances of adult females are normally distributed with a mean of 200 centimeters (cm) and a standard deviation of 8 cm. Part A Part A says, “Find the probability that an individual distance is greater than 212.5 cm.” This problem is very easily tackled inside StatCrunch, so I’m gonna pull up StatCrunch here. And inside StatCrunch, I'm going to go to Stat –> Calculators –> Normal. Why am I pulling up the Normal calculator? Well, I'm doing that because the problem statement says that our data are normally distributed. Note that in the Normal calculator, the mean and standard deviation values by default come up for the standard normal distribution. The problem statement gives us a nonstandard normal distribution. But this is very easily corrected. I just adjust the mean and the standard deviation for the values that are listed in the problem statement. Then in the line below we see a probability calculation. We know this is so because the P on the left here signals probability. So this is the probability that x is greater than or less than a given random variable value. That's going to be equal to a probability value or an area under the curve. Here the problem statement asks us to find the probability for an individual distance greater than 212.5 cm. So down here, I'm going to put “greater than” from the drop down menu, 212.5 for the random variable, and click on Compute! StatCrunch automatically computes my probability for me. I'm asked around to four decimal places, so i do that. Good job! Part B Now Part B asks, “Find the probability that the mean for 20 randomly selected distances is greater than 198.2 cm.” Now before you rush off and just change this random variable value here in your Normal calculator in StatCrunch, you need to make an adjustment to your standard deviation. Why? Because you're taking a sample that's greater than one in size. The adjustment that we need to make is shown here. We take the population standard deviation (that's given to us in the problem) and we divide it by the square root of the sample size. This is an adjustment that we need to make to the standard deviation. So I just plug in the values that's given to me in the problem statement. I have a standard deviation of 8, and the sample size of 20, so 8 divided by the square root of 20 gives me 1.7889 (rounded to 4 decimal places). I could just put this new value for standard deviation into my calculator in StatCrunch. However, because Pearson is often so exacting with the precision of answers that they want you to provide in the answer fields for your assignments, I'm a little weary of putting any sort of rounded number into the calculator and then getting another number that I have to round out. It might be off just enough to where Pearson marks me wrong. So what I'm going to do is make this same calculation with a calculator that's inside my computer. Then I can copy the value from the computer and paste it into the calculator in StatCrunch. So take 8 divided by the square root of 20. That gives me the same value, but look at all these decimal places that I get to put into my value. I right-click on the mouse, select Copy, and then back into StatCrunch, come here to Standard Deviation, and press Ctrl+V on my keyboard, and this puts that entire number into the field in StatCrunch. I press the Home button on my keyboard so I can go to the front of that field so I can see that the same number is actually in there. Now all I need to do is make the appropriate adjustment to my random variable value. Here in the problem statement, we want the probability that the mean of 20 randomly selected instances is greater than 198.2, so I make that adjustment here. Press Compute! and out comes my probability. I want to round to four decimal places, so I do that, put my answer in, press Enter or Check Answer. Excellent! Part C And now the last part, Part C, asks, “Why can the normal distribution be used in Part B even though the sample size does not exceed 30?” Well, notice that this last part of the question — “even though the sample size does not exceed 30" --- is a reference to the Central Limit Theorem, which says that if our sample size is greater than 30, we can assume that our data conforms to a normal distribution.
However, the Central Limit Theorem also says that there's no threshold value to me for sample size if we're already normally distributed. That makes sense, right? I mean, why would you assume that you're normally distributed? We already have a normal distribution; there's no need for that extra assumption. You're already normally distributed. And that's the answer to the question here in Part C. Why can the normal distribution be used? Because we're already normally distributed. It says so right here in our problem statement — we're normally distributed. Therefore that's the answer option that I want to select. And looking at the different answer options available to me, I can see that Answer Option B is the one that I want to select. Well done! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't care to help you learn stats, go to aspiremountainacademy.com, where you can find out more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Creating in StatCrunch a probability distribution table for a sampling distribution of the medians3/2/2018 Intro Howdy! I am Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to create in StatCrunch a probability distribution table for a sampling distribution of the medians. Here's our problem statement: Three randomly selected households are surveyed. The numbers of people in the households are 2, 3, and 10. Assume that samples of size n = 2 are randomly selected with replacement from the population of 2, 3, and 10. Listed below are the nine different samples. Complete Parts A through C. Part A Part A says, “Find the median of each of the nine samples, then summarize the sampling distribution of the medians in the format of a table representing the probability distribution of the distinct median values.” When most students see this type of problem, they freak out because they don't know what in the world they're looking at, and they have no idea how to approach this. Once you understand what's going on with this, it's pretty easy to solve these types of problems. Let's run through that. I mentioned earlier that we were going to solve this problem in StatCrunch. So here in the little icon next to the data set that they give me in the problem statement, I'm going to click on that, and I'm going to open this in StatCrunch. Now I actually find it a little bit easier to solve these types of problems in Excel, but that's because I'm very much more versed in Excel than I am in StatCrunch. I have more experience in Excel than StatCrunch, so I'm more comfortable with it there. But I said we're going to solve this problem in StatCrunch, so that's what we're going to do. I’ll have to make another video at another time to show you how to solve this problem in Excel if you're interested. So here we have our data in StatCrunch, and I'm going to resize this window to make things a little easier for us. Now, here we have our data in StatCrunch. To start with, we need to calculate the medians for each of the different samples. So here we have our samples listed in rows where the first sample we get was 2 and 2. And the second sample we get is 2 and 3. We're picking from these population values that are stated here in our problem statement — 2, 3, and 10. We have a total of 9 samples. Now what we need to do is calculate the median, because this is the statistic we were asked to calculate — a sampling distribution of the medians — so we're going to calculate the median value for each of these different samples. And to do that, I'm going to put my cursor over here in the next available column, and then I'm going to go up here to Stat –> Summary Stats —> Rows, because my data for the sample is listed in rows. Here I'm going to select the actual columns where my sample data is located. Notice that's in the x1 and x2 column. I can select more than one column by holding the Shift key or the Control (Ctrl) key on my keyboard while I click the mouse. The difference between the two is if you hold down the Shift key it will select everything in between the first selection you made and the second selection you made. If you hold down the Ctrl key, then that's selecting just the individual selections by themselves regardless of what's in between. Here that makes no difference because the two columns we want are right next to each other. The statistic we're asked to calculate is the median, so I'm going to scroll down here and select Median and then come down and hit Compute! Now I've got all the median values that I need to construct my probability table. When we're constructing the probability table, we're going to ignore this first column for Row because it's just listing the different rows here, and we're actually going to sort the column for the medians because it makes it easier to construct our probability table. To sort the rows, all I'm going to do here is click on this little arrow at the top right next to Median, and notice how the values sorted themselves automatically. If I click this again, it sorts it again in reverse order. We actually want this from smallest to largest, so I'm going to sort it in that order. Now I have everything I need to construct my probability table. Here in my answer fields, the way I'm going to do that is by recognizing what a probability is. A probability is nothing more than the part divided by the whole. So we look at our first value here, which is a 2. That means I'm going to go over here and from the drop-down menu on this first selection, I'm going to select 2. What's the probability of getting a 2 for my median value? Well, how many 2s are there here in my Median column? There's only one. This is why I sorted it, because it makes it easier to count. There's only one, so the part that I have is going to be 1 divided by the whole — which there's nine rows total, so that's going to be 1 over 9. The next value I see in my in my Median column is 2.5, so down here under the drop-down menu selection, I'm going to select 2.5. I have two of those, so that means my probability is 2 divided by 9, because there's two for the part and then there's nine for the whole. I continue on in that way to complete the probability table. The next value was 3, and there's only one of those, so its probability is 1 over 9. The next value is 6, and there's two of those, so that probability is 2 divided by 9. The next value is 6.5, and there's two of those, so that probability is 2 divided by 9. And the last value is 10, and its probability is 1/9. Notice that all the probabilities in my table are either 1/9 or 2/9. If, for example, I had 3/9, I would need to simplify that to 1/3 before I check my answer because, the way this is constructed, the answers are expected to be in simplest terms. I don't even need to worry about reducing anything, because everything's already reduced as far as it can go with 1/9 and 2/9. So I check my answer. Well done! Part B Now Part B asks, “Compare the population median to the mean of the sample medians. Choose the correct answer below.” OK, so the first thing we need to do is find the population median and the mean of the sample median so that we can compare them. To get the population median, we're going to have to put those numbers here into StatCrunch. So back here in StatCrunch, I'm going to label this next column Population so that I know what I'm looking at when I'm trying to select between different columns. And here I'm going to put in the values from the population, which from our problem statement listed here is 2, 3, and 10. Now I'm looking for the median of the population, so I do that by coming up to Stat –> Summary Stats –> Columns. Select Population, and then down here for the Statistics, select Median. Here we have a population median of 3. I need to compare that with the mean of the sample medians. Well, the sample median values are over here in this table. But in order to calculate a mean value, I have to get these numbers into the data table in StatCrunch. If I wanted, I could just type these in, but there's a much easier way to get them in. Here in my results window, if I push the Options button here at the top, and then in the drop down menu I select Edit, I go right back to my options window. Then I can come down here to the bottom and click on this box for Store in data table. This will actually put the output not in a results window but in the data table, and that's where we want it. I press Compute!, and now my median values are there in the data table itself. Now it's easy to run the calculation I need. Go to Stat –> Summary Stats –> Columns. I want the Row Median because that's where I'm going to calculate my data, and then I'm looking for the mean value for those numbers. Hit Compute!, and the mean of the median values is 5. So we see that 3 is the population and 5 is for the sample. So now we can look at our answer options and see which one is the correct one. Answer Option A says, “The population median is equal to half of the mean of the sample medians.” Well, 3 is not half of 5, so that's not right. Answer Option B says, “The population median is equal to the mean of the sample medians.” 3 is not equal to 5, so that's not right. Answer Option C says, “The population median is not equal to the mean of a sample medians. It is also not half or double the mean of the sample medians.” That sounds right, but let's check Answer Option D just to make sure. Answer Option D says, “The population median is equal to double the mean of the sample medians.” Well, 3 is not twice 5, so Answer Option C is the correct answer. Fantastic! Part C And now the last part of our problem, Part C, says, “Do the sample medians target the value of the population median? In general, do sample medians make good estimators of population medians? Why or why not?” By target, what it means is that the number we get from the sample should approach or equal the number from the population with the more samples that we take. Here we see that the numbers are not equal; therefore, we're going to say that they're not going to target that.
I actually recommend that students answer these types of questions not by looking at the numbers so to speak, because I actually have seen instances where the numbers are the same but the correct answer is the sample doesn't target the population. What you need to look at is what's the statistic that you're calculating and is that a biased or unbiased estimator. Here we're calculating the median. Medians are biased estimators; they tend not to target the population from the sample. So here right off the bat, I know Answer Option A and Answer Option D are incorrect, because they say the sample statistic targets the population parameter. Here that's not the case. Between Answer Options B and C, the difference is that in Answer Option B it says, “Sample medians do not make good estimators.” Answer Option C says, “Sample medians make good estimators.” Because the median is a biased estimator, it doesn't target the population median. And so therefore, it's not going to make a good estimator. So the correct answer we want is Answer option B. Well done! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below. Let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't care to help you learn stats, go to aspiremountainacademy.com, where you can find out more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. |
AuthorFrustrated with a particular MyStatLab/MyMathLab homework problem? No worries! I'm Professor Curtis, and I'm here to help. Archives
July 2020
|
Stats
|
Company |
|