Using StatCrunch to construct and evaluate a histogram from a frequency distribution table4/26/2019 Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use StatCrunch to construct and evaluate a histogram from a frequency distribution table. Here's our problem statement: The table below shows the frequency distribution of the rain fall on 52 consecutive Fridays in a certain city. Use the frequency distribution to construct a histogram. Do the data appear to have a distribution that is approximately normal? Part 1 OK, the first part of this problem asks us to construct a histogram. So we need to get this information here in the table into StatCrunch. Notice that we don't have any icon to click on to dump this information directly into StatCrunch. So what we're going to have to do is input this information in manually. To do this, we're going to open up our own separate copy here of StatCrunch. And I'm going to pop that out here so we can take a look at our problem and see everything that's going on just a little bit better. Alright, so here in StatCrunch, I'm going to enter in this first column here for the classes. And if you look and see that there's actually a pattern going on here, that actually helps you to input the information in just a little bit more quickly. And it doesn't take too long once you get going with this; you just gotta keep punching those buttons on your keyboard, and away you'll go with this. OK. There's the first column, and now we put in the second column for the frequency counts. Alright, so now we've got the information here into StatCrunch. Most people, when they want to make a histogram, they're going to go up to Graph --> Histogram. The problem with that is that this function in StatCrunch assumes that the numbers you have here in the data table are actual data values, and they're not. What you have here is a frequency distribution table. This is a summary of the data that you actually have, not the data itself. And that's why the histogram feature doesn't work, because the histogram feature assumes that this information is actual data and not summary information. So what we want to do is go to Graph --> Bar plot --> With Summary so that we can make our graphical representation of the summary Information. Categories are going to be the class; counts are going to be the frequency. And then I always like checking this box for Value above bar. I wish this was selected by default, but it's not. It's very useful. So I just go ahead and just check it. Hit compute!, and look! Now I've got something that looks more like what I'm asked to choose in my answer options. So looking at my three answer options here, Answer option C is not going to be right because the high part of the graph is here on the right. But the one that comes from our summary information is here on the left. So we're not going to choose Answer options C. Distinguishing between Answer options A and B --- so if I look here at Answer option A --- I'll blow that up just a little bit --- notice here how the graph goes. It starts at a high point and comes down and then comes back up a little bit. That's what we have here. And notice the values here that we have. So this bar that comes up here near the end is a 4, which is just under 5, and here we've got 5 and you bring that over and, yeah, you could say that that that might be the case. If I look at Answer option B, notice how that same point actually goes up closer to 10, so it's well over 5. So Answer B is not going to be correct. It's going to be Answer option A. Excellent! Part 2 And now the last part of this problem asks, "Do the data appear to have a distribution that is approximately normal?" Well, a normal distribution starts out small, comes up to a high point in the middle, and then comes back down again. That's not what we have here. So looking at our answer options, Answer option A is not going to be correct because we don't conform to that characteristic bell shape that a normal distribution should have.
Answer option B is not going to be correct. It says that the distribution is approximately uniform. Uniform means that all of these bars would be coming up to about the same height or level, and that's not what we have here. And we've got quite a bit of variation among the different classes. So Answer option B is not going to be correct. Answer option C is not correct because it says the distribution has no obvious maximum. That's obviously wrong. There's an obvious maximum right here on the left side of her distribution. So yeah, we're not going to select Answer option C. It must be Answer option D: "No, it is not symmetric." And that is true that the distribution that we have here is not symmetric. If we were to take a line down the middle, then each half should be a mirror image of the other half, and that's not what we see here. So Answer option D is the one we want. Nice work! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.
0 Comments
Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform linear regression hypothesis testing on data of lemon imports and car crash fatality rates. Here's our problem statement: Listed below are annual data for various years. The data are weights in metric tons of imported lemons and car crash fatality rates per 100,000 population. Construct a scatterplot, find the value of the linear correlation coefficient R, and find the P-value using alpha equals 5%. Is there sufficient evidence to conclude that there is a linear correlation between lemon imports and crash fatality rates? Do the results suggest that imported lemons cause car fatalities? Part 1 OK, this first part of the problem is asking for the null and alternative hypotheses. A null hypothesis by definition is a statement of equality, so we're not going to select Answer option B here because this null hypothesis is not a statement of equality. Of the three answer options that remain, we're going to look at the alternative hypothesis because that's what's different among those three answer options. What is our alternative hypothesis? Well, when you're dealing with linear correlation, there's three ways this can go. You're looking for linear correlation in general, or you're looking for a specific type of linear correlation, which could be positive or negative. Here in our problem statement, we're asked, "Is there sufficient evidence to conclude that there is a linear correlation?" So we're not looking for positive or negative linear correlation, just linear correlation in general. That means it could be positive, which would be the right side, the right tail of our distribution; or it could be on the left, the negative, the left tail of the distribution. So a two-tailed test means we're going to have not equal to be our alternative hypothesis. And that is Answer option A. Well done! Part 2 Now the next part asks for a scatter plot. We're going to have to put this data into StatCrunch to do that. Well, actually we could do it in Excel as well, but I just think this is a little easier in StatCrunch. So let me move this window so we can see everything better. Beautiful. Alright. Now our data is here in StatCrunch. I could go to Graph and then on down here to Scatter Plot, but I know I'm going to have to make a linear regression model for the next parts of the problem, and in part of doing that regression analysis I'm going to get a scatter plot. So let's just go ahead and do the regression analysis and get everything all in one shot. I go to Stat --> Regression --> Simple Linear. Here in my options window, I'm asked to identify the x- and y-variables. Sometimes in the problem they'll specify what they want to be the x-variable. But typically you're not going to get anything like that. So the guiding assumption is that the variable that's mentioned first is your x-variable. Here that's going to be the lemon imports, which means the y-variable will be the crash fatality rates. I want a hypothesis test, so this radio button, it needs to be selected, and it is by default. I make sure these values match what I've got here from the earlier part of the problem, and they do. I hit Compute!, and here's my results window. The scatterplot --- of course, notice here it says "1 of 2". So the scatter plot is going to be on the second page. Well, I come down here and click on this arrow. I get to the second page and wow! There's my scatter plot along with a line of best fit. So yeah, this one's clearly the one we want to select. Good job! Part 3 Now the third part of this problem asks for the linear correlation coefficient. I come back here to my results window and click back to the first page. My correlation coefficient is right up here at the top. So we're asked to round to three decimal places. I can stick that number in. Notice the negative sign out in front. You have negative correlation. Don't forget that negative sign. Fantastic! Part 4 Now the next part asks for the test statistic. Notice our test statistic here is a t-score. Well, if I look here at my parameter estimates table, I have a column labeled t-stat. These are t-scores, but notice we have two of them. So which one are we going to be using? Well, the first one corresponds with the intercept of the model, and the second corresponds with the slope. We're going to select the one for the slope because the slope makes a more major contribution to the creation of our model than the intercept does. The intercept just shows us where it's located on the graph, but the slope is, you know, how steep that line is going to be, and that's a bigger part of our model than where it's located on the plot. So we're going to select that t-score for that second item, the slope there in our model. Again, don't forget that negative sign. We're asked to round to three decimal places. Excellent! Part 5 Next we're asked for the P-value. The P-value is right next door to the test statistic, just as we've seen previously, so we're going to take that same row there for the slope and take its P-value there, 0.016. Good job! Part 6 Now the next part of this problem is asking us to make a conclusion about our hypothesis test. Our P-value 1.6% is going to be less than our alpha level of 5%. So I select that here in the drop down because our P-value is less than our significance level. That means we're inside the region of rejection. So we're going to reject the null hypothesis. And every time we reject the null hypothesis, there is sufficient evidence. Good job! Part 7 And now this last part of the problem asks, "Do the results suggest that imported lemons cause car fatalities?" Well, one of the things that we need to clearly understand about correlation is that correlation does not imply causation. All correlation does is say that there's some relationship at work here. It may be a cause and effect relationship, but it's not necessarily so. And in fact, more often than not, when you've correlated two variables, what it means is that those two variables are related to a third variable that's not in your regression analysis. So you can't say that there's necessarily a cause and effect relationship going on here. And that's exactly what we see. If we look at our answer options here, it's Answer option D. Good job!
And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find and interpret the probability of flu side effects from a cholesterol treatment. Here's our problem statement: The probability of flu symptoms for a person not receiving any treatment is 2%. In a clinical trial of a common drug used the lower cholesterol, 19 of 840 people treated experienced flu symptoms. Assuming the drug has no effect on the likelihood of flu symptoms, estimate the probability that at least 19 people experienced flu symptoms. What do these results suggest about flu symptoms as an adverse reaction to the drug? Part A OK, Part A of this problem asks us for the probability that at least 19 people from our sample experienced flu symptoms. Now, normally --- pun intended --- we would use a normal distribution to solve a problem like this. But notice in the problem statement, we don't get any information about a mean or a standard deviation for our distribution. So what information are we given? Well, we're given this information here: a probability of "success" and then we're given here a proportion of our sample. So what we're going to do with this information then is use the binomial distribution as an approximation for the normal. To start out with, we're going to have to load up StatCrunch. So let's open that up here, and I'll pop this window out. And let's move this around so we can see everything quite a bit better. OK, so now here in StatCrunch, we're going to go to Stat --> Calculators --> Binomial. Here in my binomial calculator, I need to insert the sample size. That's n, so my sample size here is the total number of people in the sample, which is the 840. p is the probability of success, which we saw earlier in our problem statement is 2%. And then I want the probability that x is going to be greater than or equal to 19 --- greater than or equal to 19. And there's my probability. I'm asked to round to four decimal places. Fantastic! Part B Now Part B of this problem asks, "What does the result from Part A suggest?" OK, this one is a little tricky for a lot of students, and I admit it's a little tricky for me too because sometimes the logic that the statisticians use makes you feel like you're wrapping your brain around a tree. But let's go through each one of these answer options one at a time and see what we can make of this.
Answer option A says, "The drug has no effect of flu symptoms because x greater than or equal to 19 is highly unlikely." Well, the drug doesn't have an effect on flu symptoms. That part is true, because remember we did this probability distribution on the assumption that the drug has no effect on the likelihood of flu symptoms. So that's our assumption. So to be true with that assumption, we're going to have to suggest that that's what our results going to suggest, that the drug has no effect on flu symptoms. But looking at the probability here, I mean you've got almost a one in three chance. And I would feel more comfortable if the probability were greater than 50% to say that it's, that it's likely or not likely. But you know, one in three chance. Hey, we can consider that to be significant. So highly unlikely? No, we're going to say ... we're going to say that it's likely, not unlikely. Answer option B says, "The drug increases the likelihood of flu symptoms because x greater than or equal to 19 is not highly unlikely." Well, the drug's not increasing the likelihood of flu symptoms because we're assuming that it doesn't do that. So we're not going to choose Answer option B. Answer options C starts out the same way. "The drug increases the likelihood of flu symptoms." So we're not going to choose that answer. Answer option D says, "The drug has no effect on flu symptoms because our probability is not highly unlikely." Wow. Yeah. We're really wrapping our brain around a tree here. "Not highly unlikely." Let's get rid of that double negative so we can make more sense of this. We're saying our probability is highly likely. Well, a one in three chance? Yeah, I don't know if I call it highly likely, but given the way some of these statisticians think, yeah, they might be actually considering that highly likely. So let's go ahead and select Answer option D. Excellent! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find the probability of a false positive drug test from tabulated data. Here's our problem statement: Refer to the sample data for pre-employment drug screening shown below. If one of the subjects is randomly selected, what is the probability that the test result is a false positive? Who would suffer from a false positive results and why? Part 1 OK, the first part of this problem is asking us for the probability of a false positive. We have the option to put this time-related data into StatCrunch or Excel. I actually prefer Excel, and I'll show you why here in a moment because --- well, I actually have the data here in Excel already loaded. So the reason why I love this in Excel is because I can just select the data here, and notice how here at the bottom of my Excel window, I've got all of these summary stats. The sum is what we're looking for. This is the whole that we're going to use to calculate our probability. Remember the probability is the part over the whole. Well, this makes calculating the whole very easy. The part is just the false positive test result. A false positive is someone who tests positive but doesn't use. And that's this space right here. So 15 is my part, and 107 is my whole. So if I just take my calculator and take 15 divided by 107, I get my probability. We're asked around to three decimal places. And notice there's no percent sign here next to the answer field. So they want us to submit our answer in decimal form. Excellent! Part 2 Now the second part of his problem asks, "Who would suffer from a false positive result and why?" Well, let's look through our answer options here. Answer option A says, "The employer would suffer because the person tested would not be suspected of using drugs when in reality he or she does use drugs." Well, no, a false positive means that you're testing positive but you don't use. So that can't be right. Answer option B says, "The person tested will suffer because he or she would not be suspected of using drugs when in reality he or she does use drugs." Again, that's the opposite of a false positive. That's actually a false negative. A false positive is where they're not using but they test positive. So that's not what we want here.
Answer option C: "The employer will suffer because the person tested would be suspected of using drugs when in reality he or she does not use drugs." Well, the second part of this statement is true, but how does the employer suffer from that? I mean more of the suffering is going to be from the individual because someone who is subjected to a false positive drug test, they're going to be treated like a user even though they aren't. And so in the very least, their reputation in the workplace is going to suffer. You know, uh, you know, in most cases it's probably going to mean you're losing your job. So it's really the person who's going to suffer more than the employer. Answer option D says, "The person tested will suffer because he or she would be suspected of using drugs when in reality he or she does not use drugs." That's what we're looking for. Fantastic! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just as I want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use goodness of fit hypothesis testing and Benford's law to detect online hacking. Here's our problem statement: The accompanying data table lists leading digits of 317 inter-arrival Internet traffic times for a computer along with the frequencies of leading digits expected with Benford's law. The accompanying technology results are obtained when testing for goodness of fit with the distribution described by Benford's law, and H0 is rejected. What does this result suggest about the possibility that the computer has been hacked? Is there any corrective action that should be taken? Solution OK, we have a series of drop down menus for a fill in the blank statement here. So in order to understand what the right selections are in this drop down, we need to come to a conclusion with the hypothesis test. We look at our data here. Notice there's no icon here to slip the data into StatCrunch so we can actually perform a hypothesis test. That's because as was stated earlier in the problem statement, the hypothesis test has already been conducted for us. So if we click on this link here for the technology output, we see here are the results for our hypothesis test. Now we know in the problem statement it says H0 is rejected. We know our null hypothesis is rejected because --- look at the P-value. The P-value here is less than 1%. So no matter what significance level we're using, we're going to be inside that critical region, inside the region of rejection. And therefore we reject the null hypothesis.
We can get the same conclusion comparing the test statistic with the critical value. Goodness-of-fit tests are by definition right-tailed tests. So the critical region is the right tail of the distribution, the left boundary. That right tail is going to be our critical value, which here is listed as 15.5. Our test statistic, 20.9, is to the right of that left boundary. So we're in the tail that is the critical region, that right tail of our distribution. So we're in the region of rejection. So we reject the null hypothesis. Rejecting the null hypothesis means that the distribution that we're seeing doesn't fit the distribution we're comparing it to, which in this case is the one for Benford's Law. So I go back to my data table. What we're saying is that this distribution here with these frequencies does not fit the distribution here with Benford's Law because we rejected the null hypothesis. The null hypothesis says that everything's the same, everything matches up. So that's not what we see here because we rejected the null hypothesis. So because the leading digits of inner arrival traffic times do not fit the distribution described by Benford's law, it appears that those times are not typical. Benford's Law is a description of typical times. So if we don't fit that distribution, the times we do have are therefore not typical. And therefore there's probably some good evidence that the computer has been hacked, because if it hadn't been hacked, then we would be conforming to Benford's law. Because we don't, then there's some evidence here that says, yeah, we were probably being hacked. And if you're being hacked, then of course you're going to need corrective action. I check my answer. Good job! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching. We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to construct box plots to compare male and female pulse rates. Here's our problem statement: Use the same scales to construct box plots for the pulse rates of males and females from the company data sets. Use the box plots to compare the two data sets. Part 1 OK, the first part of this problem is asking us for a box plot for the men's pulse rate. So to do that, we need to access the data. So I'm going to click on this icon here, and here's all of the data that we need to look at for this problem. We'll click on this icon so I can dump the data into StatCrunch. OK, here we've got the data here in StatCrunch, so now we need to make our box plot. And to do that I'm going to come up here to Graph --> Box Plot. In the options window, I'm going to select the column for the men's pulse rate. But I know that the problem is also going to ask me to get a box plot for the women's pulse rates. So let's just make both of them now while we're here. Then down here amongst the other options, I'm going to check the box for Draw boxes horizontally. That's because the box plots that are here in my assignment are drawn in the horizontal orientation. I don't know why this isn't a default selection because all of the box plots that I've ever saw working in the real world were drawn with the horizontal orientation. So yeah, I don't know why the default is the vertical, but it is what it is. We'll check the box, and we'll get what we need. And that's all we need. There's lots of other options you could select, but all we need for our purposes is what we've selected already. So I'm going to hit Compute!, and then here in my results window, I got these wonderful box plots. Now to help us compare the box plot that we've drawn here in StatCrunch with the options that were listed here in our assignment, what I'm going do is I'm going to change the scale of my x-axis so we're comparing apples with apples. So to do that, I'm going to click on this little three bar symbol in the left lower corner of my results window. And in the menu that appears, I'm going to select the x-axis, and I'm going to change the minimum to what we see here with the graph in the assignment. All the minimum values here for the graphs are 40, and all the maximums you see are 110. So I'm going to change those settings. So now I'm comparing apples with apples and it's easier for me to see which is the actual correct answer. And after doing that, it seems pretty obvious for the men it's going to be Answer option D. Good job! Part 2 Now the second part asks for the women's box plot, which we just made before or during the previous part. Notice the scales here are the same. So we don't need to make any adjustment down here. And so now all that remains is to select the correct answer, which looking at our different options answers and comparing them, it looks like this is going to be the correct one. Excellent! Part 3 And now the last part asks us to compare the two box plots. Well, this is the advantage we have of making both of them at the same time, because now that one is overlaying the other, it makes it easier to compare the two box plots. So looking at the general location, it seems like the box plot for the men, it's shifted slightly to the left of that of the women. So we could conclude that the mean pulse rate and the mean that you see here is this middle bar here in the box --- definitely lower for the men than for the women. The width of the box plot --- so from whisker to whisker, it's about the same. The box width is about the same. So we could say that the variation amongst the pulse rates, it's about the same between the men and the women.
Now let's go look at our answer options and select the one that matches the observations we just made. Answer option A says, "In general it appears the males have higher pulse rates." No, that's not what we want. Answer option B --- "In general it appears that males have lower pulse rates and the variation is similar." So yeah, that's the one we want. But before we check our answer to finalize it, let's double check the remaining answer options to make sure we've got what we want. Answer options C says, "In general it appears males have higher pulse rates." No, that's not right. Answer option D says, "It appears that males have lower pulse rates than females." That's true. "Variation among the male pulse rates is much greater." No, we wouldn't say that. So we're going to stick with answer option B. Fantastic! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching. We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use a nonstandard normal distribution to evaluate aircraft load capacity. Here's our problem statement: Before every flight, the pilot must verify that the total weight of the load is less than the maximum allowable load for the aircraft. The aircraft can carry 38 passengers, and a flight has fuel and baggage that allows for a total passenger load of 6,384 pounds. The pilot sees at the plane is full and all passengers are men. The aircraft will be overloaded if the mean weight of the passengers is greater than 6,384 pounds divided by 38 (or 168 pounds). What is the probability that the aircraft is overloaded? Should the pilot taking any action to correct for an overloaded aircraft? Assume that weights of men are normally distributed with a mean of 173.1 pounds and a standard deviation of 37.7. Part 1 OK, the first part of this problem is asking us for the probability that the aircraft is overloaded. To get the probability, we're going to be using the normal distribution calculator in StatCrunch, because here it says that all the passengers are men and we're assuming that the weights of the men are normally distributed. So I need to open up StatCrunch. There's no icon with data for me to put into because there's no data to load into StatCrunch. So I'm going to come up here to Question Help, and then in the dropdown menu select StatCrunch. Then I'm going to select this arrow here in the upper right corner so I can move that window out. And let's resize this so we can see everything a little bit better. OK, now here in StatCrunch, I go to Stat --> Calculators --> Normal. Here I'm told that the mean of my distribution is 173.1, so I'm going to put that in here --- 173.1. And we know the standard deviation is 37.7, but keep in mind when you're calculating probability with the distribution and you have more than one element that you're selecting, then you want to adjust the standard deviation because the standard deviation is a biased estimator. So whenever you have the probability of more than one, you got to adjust for that bias in your biased estimator. The way we do that with standard deviation is to divide by the square root of the sample size that we want to take, which in this case here is 38. So I want to take that standard deviation of 37.7 and I'm going to divide it by the square root of 38. And this is the number that I want to use for my standard deviation because this number accounts for the bias in my biased estimator. So in order to avoid any transcription errors, I'm just going to copy and paste that bad boy right in here. Excellent. OK, now for this part down here --- let's get the calculator out of the way --- it says we want the probability that the aircraft is overloaded. It'll be overloaded if the weight --- the mean weight --- is greater than 168. So this needs to be greater than 168. I hit Compute!, and there's my probability that we're overloaded. I'm supposed to round to four decimal places, so I'll do that here. Excellent! Part 2 And now the second part of this problem asks, "Should the pilot take any action to correct for an overloaded aircraft?" Well, look at our probability --- almost 80%. That's a pretty high probability. Not an absolute certainty --- I mean, there's a 20% chance that you're going to be OK, but only a 20% chance. I mean, there's more than 50% chance that you're going to be overloaded. So I'd say the probability is pretty high, and that means you need to be taking some corrective action. So we're going to mark yes here. Good job!
And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Identifying the symbolic claim and hypotheses for a claim about a population standard deviation4/2/2019 Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to identify the symbolic claim and hypotheses for a claim about a population standard deviation. Here's our problem statement: Claim: The standard deviation of pulse rates of adult males is more than 12 bpm for a random sample of 161 adult males. The pulse rates of a standard deviation of 12.8 bpm. Complete Parts A and B below. Part A OK, Part A asks us to express the symbolic form of the original claim. So to do that, we first look in our problem statement to see where the claim is. The claim is this first statement here in the problem statement: "The standard deviation of pulse rates of adult males is more than 12 beats per minute." So [in] this first dropdown, we want to select the population parameter that matches standard deviation. It's not going to be the p because p is the representation for population proportion. It's not going to be μ because μ is the representation for population mean. What we want is σ; σ is the population standard deviation. So this is what we'll select. And then in the next dropdown, we want to look and see what does the claim say about which inequality symbol we should be using. Here it says standard deviation is more than 12, so we want to select greater then. And then our claimed value is that value from the claim, which is 12. I check my answer. Good job! Part B Now Part B wants us to identify the null and alternative hypotheses. These hypotheses will always use the same population parameter from what we see here earlier in Part A. So we're looking at standard deviation, so I'm going to select that here. The null hypothesis is by definition a statement of equality. So I want to select the equal sign. And then this value is the claimed value, which we saw earlier was 12. There's my null hypothesis.
The alternative hypothesis --- typically the alternative hypothesis reflects the claim unless there is some semblance of equality with the claim, because equality by definition belongs to the null hypothesis. So if there's any semblance of equality with the claim, then we have to take the compliment for our alternative hypothesis. Here, if we look here to Part A, what we have here --- notice how the symbol here is greater than. There's no or equal to; there's no semblance of equality here. So therefore we can just adopt this statement as our alternative hypothesis. Nice work! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use selection with replacement to approximate selection without replacement. Here's our problem statement: In a study of helicopter usage and patient survival, among the 54,115 patients transported by helicopter, 172 of them left the treatment center against medical advice, and the other 53,943 did not leave against medical advice. If 40 of the subjects transported by helicopter are randomly selected without replacement, what is the probability that none of them left the treatment center against medical advice? Solution OK, we're asked for a probability, and probability is a part over the whole. But we're looking at 40 subjects here and the probability that none of them left the treatment center. So this is going to be the probability that the first one didn't leave and the probability that the second one didn't leave and the probability that the third one didn't leave, so on and so forth until we get to the 40th subject. Well, and, when we're dealing with probability, calculations means multiplication. So we're taking this probability that the randomly selected person did not leave the treatment center against medical advice and we're multiplying it by itself 40 times.
But that's when you're selecting with replacement here. Technically we would need to look at selecting without replacement, because once the person leaves, it's not like they come back into the hospital to get selected again. So we're going to have to --- if we want to do this technically correct, we will be selecting without replacement. Well, that means we're going to have to take this 53,943 and divide it by 54,115 --- there's your first probability. But then we got to multiply that by 53,942 over 54,114 to account for the one person that is no longer available to be selected. And we'd have to continue that on for 40 different numbers that we're going to multiply together. Gee, that's some pain. What we're going to do instead is we're actually going to use selection with replacement to approximate selection without replacement. And what allows us to do this is that the number of people that we're looking at is actually within 5% of the whole. So if we take, say, the 40 subjects that we're looking at out of the total that we can select from, so you would get a number here that's way less than one --- well, 1%. So it's definitely going to be less than 5%, which means we can use selection with replacement to approximate selection without replacement. So again, the selection with replacement, it's just going to be the part over the whole. So the 172 that left the treatment center is the part that left, but we want the probability that none of them are leaving. So we want the 53,943 that did not leave divided by the whole. So there's a probability that one randomly selected patient is not going to leave the hospital. I want this multiplied by itself 40 times for the 40 subjects that we're selecting from the --- from the total pool of candidates. So then there is our probability. I round it to three decimal places. Fantastic! Now this value that we get here is not that different from if we were to actually go and do the actual calculation itself in Excel. And actually it wouldn't take me long to set it up in Excel. We can run through this very quickly. So if I've got the total number of patients, and I'm selecting from the 54,115, and I'm going to select one of them from the 53,943. So then the probability is going to be the part over the whole. So that's my probability for one. And now if I just go ahead and say I'm going to take one away from you, I'm going to take one away from you, and I'm going to select the same probability here. And we wanted to do that for the 40 patients that were looking to select. So if I just bring this down, I get the 40. And there are my individual probabilities. Now to get the final probability, I've got to multiply all of those probabilities together, and look at the number that we get. It's not appreciably different from what we saw earlier. This is 0.880395, and this is 0.880435. So we can see that, you know, even though it's not the same number, it's close enough to where we can use it as an approximation. Again, you have to be within 5% in order for that approximation of the whole, because once you leave that 5% range, then the difference between your two numbers becomes great enough to be significant. So anyway, that's why we can actually use this selection with replacement to approximate selection without replacement. It makes the calculation much, much easier. And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. Intro Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to perform hypothesis testing on means of right and left hand reaction times. Here's our problem statement: Several students were tested for reaction times (in thousands of a second) using their right and left hands. Each value is the elapsed time between the release of a strip of paper and the instant that it is caught by the subject. Results from five of the students are included in the graph to the right. Use a 5% significance level to test the claim that there is no difference between the reaction times of the right and left hands. Part 1 OK, the first part of this problem is asking us for the hypotheses for the test. So in this first dropdown blank here, it says let μ-d be the blank of the right and left hand reaction times. μ-d is going to be the mean of the differences. That's how μ-d is defined. The null hypothesis will always carry an equal sign. So we select that there. And then for the alternative hypothesis, we look to our claim. The claim that we're testing here, it says, "Test the claim that there is no difference" --- in other words, that they're the same. Well, we can't really set this to be equal, because equality by definition belongs to the null hypothesis. So then we have to take the compliment of that. And the complement of being equal to is being not equal to. So now we check our answer. Good job! Part 2 Now the next part asks us for the test statistic. And to do that, we're going to access the data here, and we're going to dump the data into StatCrunch. I resize the window here so we can get a better look at what's going on. And now with the data here in StatCrunch, I'm going to go up to Stat --> T Stats --> Paired (because for every left hand there, there's a corresponding right hand for the same student). Here in my options window, the first sample is just going to be the variable that's listed first. And the second sample is going to be the variable that's listed second, I look down here, and the default selection is for hypothesis testing, so I don't have to change that. And then I make sure these fields match what we got here earlier for our null and alternative hypothesis, and they do. So now I press Compute!, and here in my results window is my test statistic, the second to last value listed there in the results window. So I'm just going to go ahead and type that in. It says round to three decimal places --- 72.418. Well done! Part 3 Now the next part of the problem asks for the critical values. Notice the critical values are not listed here on the results window from the hypothesis testing that we just did. And so we're going to have to go to our distribution calculator in order to calculate the critical values. And the distribution that you want to look at is the t value --- is the t distribution. Because look at the values that were listed here. So your test statistic was a t score, and the critical value they're asked for is a t score. So we're going to go up to Stat --> Calculators --> T. Here in my distribution calculator, the first thing I want to look at is the alternative hypothesis because this tells me, "Do I have a one-tail test or a two-tail test?" And the alternative hypothesis is not equal to, so not equal to means we have a two-tail test. So I'm going to click the Between option, and I'm going to have two critical values: One on the left, one on the right. Now to get the correct critical value, I need first to enter in the corresponding value for degrees of freedom. We've got five data points you can see here, or you could actually --- if you wanted to, you could go back and count them here in your StatCrunch window. So we've got five pairs. Degrees of freedom is one less than that. So I'm going to put 4 here for my degrees of freedom. And then for --- remember the Between option here in StatCrunch lists the area in between the tails. So the area of the tails, the significance level, which here it says we want to use 5% --- so the area in between has to be the complement of 5%. So I subtract that from 1 and get 95%. So there's 95% in between the tails. Then I just press Compute!, and here are my critical values which I can now put here in my answer field. And it wants three decimal places. Nice work! Part 4 Now this last part of the problem is asking us for a conclusion. To conclude our hypothesis test, notice there's nothing here with P-values. So we're going to have to do it straight with the test statistic and the critical value, which is easy enough to do here in StatCrunch in the results window. Notice we have our critical values listed here. So those values are marking the boundaries of our tails there in our distribution. And what we're going to compare it with is the test statistic.
Here's the test statistic: -2.4. So -2.4 is going to put me here in the central region, because notice the boundary here is negative 2.7. So -2.4 is gonna put me just inside that central region in between the tails. So since I'm not in the tail, I am therefore not in the critical region. I'm not in the region of rejection, and therefore there's not sufficient evidence. We're going to fail to reject the null hypothesis. And whenever you fail to reject the null hypothesis, there's not sufficient evidence. Of course, the claim here is that there is no difference between the reaction times. Excellent! And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video. |
AuthorFrustrated with a particular MyStatLab/MyMathLab homework problem? No worries! I'm Professor Curtis, and I'm here to help. Archives
July 2020
|
Stats
|
Company |
|