Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to find the best nonlinear regression model for a subway fare. Here's our problem statement: Construct a scatterplot and identify the mathematical model that best fits the data shown below. Let x represent the years since 1959 with 1960 coded as x = 1, 1973 coded as x = 14, 1986 coded as x = 27, and so on. Let y represent the subway fare.
OK, the first part is asking for a scatter plot. And we know the second part --- by looking at the problem statement --- we know the second part is going to ask us to make the actual model that best fits the data. And I find that that's actually easier to do in Excel. So I usually click on here on Open in Excel. Now I already have the data loaded in Excel, and here's the data from the problem loaded in Excel already.
Notice here that the problem statement asks us for coded years for our model. So even though they're giving us the actual calendar years that everybody thinks in, they're asking us to make a model in coded years. Now realistically, you would never make a model in coded years, because anyone that would then use that model to make a prediction would need to think in coded years, because that's how you would need to input the data into the model. Nobody thinks in coded years.
And so a more user friendly model would be one which you actually use calendar years that everybody thinks in. So my conclusion for the problem here is that the only reason they're making you use coded years is because they're trying to add some complexity to it so they can provide more rigor to the assignment, which to me is just absolutely ridiculous. I mean, let's just keep it simple, stupid.
But since they forcing us here to use coded years, we need to make our model with coded years. And the easiest way to do that is to come over here, and I select this column so I can then right click on my mouse and then insert a column. The reason why I want to insert a column is because Excel normally considers the values here on the left column that we select to be the variables for the horizontal axis and then the values here in the right column for the vertical axis. So we don't want to add it here right after the information we're given because then everything's in reverse from the way Excel normally thinks. So I'm gonna put my coded years in here. Then I'm going to expand this out so I can see the whole title.
And now to make the coded years, notice what we're doing is for each one of these years, we're subtracting out our zero year, which is 1959. So 1960 minus 1959 gives me the 1 that I'm supposed to be using as a coded year. So here on the cell, if I just type an equal sign and then select that cell next to it, then minus my zero year of 1959, I press Enter, and voila! There is a coded year.
Now notice if I take the cursor here on my mouse, and notice how when I put it over the right bottom corner that cell, notice how it changes to a smaller plus sign. If I leave it on a smaller plus sign and I push down the left button of my mouse and hold that button down while I select the cells below, I extend that same formula down to these cells so I don't have to type the formula into each of those individual cells; it's automatically copied and calculated for me.
So now I have got --- if I go here and select these cells, here are the ones I want to select from my model. So once I have those selected, I come up here to Insert, and then I'm going to click on this button here for the scatter plot. And I want this first scatter plot here that has just the points and nothing else. So there's my scatter plot that I made. And if I look at my answer options here, the one that seems to be the best fit is Answer option D. Nice work!
Now the second part asks us for the actual regression model. This is where Excel comes in really handy. You can do the same thing in StatCrunch, but StatCrunch is really clunky for this sort of thing. And unless you have a little cheat sheet that lets you know which model to make or, you know, which type of model to make, which equation type you use for your model, you're going to have to make each one one at a time. Excel is really good for this because we don't have to do any of that.
So what I want to do here in Excel is I'm going to click inside the plot area, and we'll click on one of these data points here. And then I'm going to right click, and I'm going to select Format data series. Actually I want to select to Add trendline. So if I move my cursor over here --- so now what I want to do is scroll down here to the bottom and click on Display equation and chart and Display R squared value on chart. So notice how those values came up here.
OK, now what I want to do is go back up to the top, and Linear is selected first by default. So we're going to compare R squared values here with a different model type. Exponential gives me a much better R value. You see that's 0.9709, so this is the number to beat. No, that's a lower number. That's a lower number. That's a lower number. That's a lower number. And you're never going to need to use Moving average. So it looks like Exponential is the one we want to use. It has the best R squared value to it. I click inside that label there and I select everything inside the label and then I go back to home. I look at my font size, I can blow that up so I can see that a whole lot better.
OK, so now over here in my answer fields, I need to select the right model type and the Exponential function. If you look at the model type here, see, the x is here in the actual exponent and there's a coefficient out front. So that looks like this model type here. And so I put the numbers in. So the coefficient is going to go here, and I'm asked around to three decimal places. And then for the number here, it's tempting to just put this number in here. But notice this is an e raised to this power. So I've got to whip out my calculator and calculate that value out --- 0.067, and I have to use that as an exponent, e to that power. So here's the button on the calculator for that. And this is the number I need to put into my answer field. Good job!
And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com, where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.
Frustrated with a particular MyStatLab/MyMathLab homework problem? No worries! I'm Professor Curtis, and I'm here to help.