Using bootstrap methods to construct a mean confidence interval estimate

6/26/2020

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use bootstrap methods to construct a mean confidence interval estimate. Here's our problem statement: Use the Geyser eruption duration times in seconds and the accompanying 200 bootstrap samples to complete parts A through C below.

Part A

OK, Part A wants us to use the bootstrap method with 200 samples to find a 95% confidence interval estimate of the population mean. So to do that, the first thing we need to do is take these bootstrap samples and we need to stick them in StatCrunch. And it takes a while for this to load up because, yeah, there literally are so many samples. You've got 200 samples, and each of those samples contains 15 values. So there's a lot of data here, but that's not unlike what we see in a real world situation. So I'm going to put those here in StatCrunch. And I'm going to resize this window a little bit to help us to see a little bit better everything that's going on in our problem statement here.

And now here in StatCrunch, I'm going to calculate the mean value for each of these 200 samples. The reason why I'm calculating the mean is because that's the parameter that we're looking at for in the population; we're looking for the population mean. So I want to calculate the mean for each of these different samples. To do that, I go to Stat --> Summary Stats --> Columns. Here I'm going to select all the columns that I have. So I select one of them, and then I just --- on my keyboard, I'm going to select Ctrl+A; that selects all the different samples. I just need the mean. So I'm just going to select just that one to come out. I hit Compute!, and now here are all 200 mean values for each of the different samples.

I need to sort this list from smallest to greatest, from lowest to highest. And I do that by clicking on this little double arrow up here. So now it's sorted automatically from lowest to highest. Now I have to figure out where's the actual cutoff between the numbers I need and the numbers I don't, because I want a 95% confidence interval. That's going to be the 95% of the numbers that are in the middle of this list. So that means there's going to be some numbers that are lower, some numbers that are higher than what we actually include in our confidence interval.

So where are the cutoffs? Where's the upper and lower limits? Well, to do that, I'm going to take my 95% confidence interval, and I'm going to subtract that 95% from 100%. That gives me 5%. I divide that by 2 because half of this is on the left and half is on the right of my "distribution." So now I multiply that by the 200 --- whoops --- and that gives me 5. So I want to go down five values --- one, two, three, four, five. And I want to take these two numbers here. The cutoff value that I'm looking for is the average of these two values here. So in my calculator, I'm going to take the first number and I'm going to add it to the second number. And then I divide by 2. There's my lower limit. And so I'm going to put that number here in my answer field.

And I'm going to do the same thing for the other end of the list. So I scroll down to the bottom and I count up one, two, three, four, five. So now I'm going to want this number and the number that comes before it. So I'm going to average those two numbers out and that's going to give me the upper limit that I need for my confidence interval. Stick that number in here. Fantastic!

Part B

Now Part B wants us to make a similar 95% confidence interval estimate using the T distribution. To use the T distribution, I can't be using all 200 of these samples; I just need one. And that's why they give you this one sample up here in the problem statement. So the first thing I'm going to do is I'm going to swap out samples in StatCrunch. And again, I'm going to resize this window so I can see better what's going on.

Now for this one sample of 15 values, I'm going to calculate the mean and the standard deviation. To do that, I go to Stat --> Summary Stats --> Columns, select the column, so the only columns that got data here, then I want the mean and the standard deviation. The reason why I want both of these numbers is the mean value is going to give me the center point for my confidence interval. The standard deviation is going to help me calculate the margin of error, which is the distance between the center point and the ends of the confidence interval, the lower and upper limits that I'm asked to provide here in my answer fields.

So to get the lower and upper limits, I'm going to take my standard deviation, put that in my calculator here, and I'm going to divide that by the square root of the number of values in my sample, which here you can see is 15. So I'm gonna divide that by the square root of 15. And now I've got to multiply this number by my T-score. To get the T-score, I have to go to my T calculator. So I go to Calculate --> Stat --> Calculators --> T. Here are my degrees of freedom; it's one less than the number of values I have in my sample. I want to go and hit the Between option up at top because we've got a portion on the left and a portion on the right that's not included in our confidence interval. Here for the percentage, that's going to be the level of my confidence, which is 95%. [Hit] Compute!, and here's my T score, 2.1448. So back in my calculator, I take this number that I calculated previously, and I'm going multiply that by 2.1448. This then is my margin of error.

So if I come back here to StatCrunch, get my mean value here. So the center point, 238.2, I need to subtract from that the margin of error. So I take this --- if I make it negative and then add it to the mean, that's the same effect of subtracting it from the mean --- and there's my lower limit. I want one decimal place. I gotta re-type that. And that we'll do the same thing with --- to get the upper limit. So first I'm going to go back and get my margin of error, and I add that to my center point, which is the mean value. And there's my upper limit. Excellent!

Part C

And now Part C asks us to compare the results. So if we look, our upper limits here in this case are exactly the same here --- excuse me --- the lower limits are exactly the same. The upper limits, they're not exactly the same, but they're not that far apart either; they're actually pretty close to each other. So we could say that the results are reasonably close to one another. Well done!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below, and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

1 Comment