Choosing the best multiple regression model

1/10/2020

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to choose the best multiple regression model. Here's our problem statement. The accompanying table shows results from regressions performed on data from a random sample of 21 cars. The response variable is CITY (fuel consumption in miles per gallon). The predictor variables are WT (weight in pounds), DISP (engine displacement in liters) and HWY (highway fuel consumption in miles per gallon). If only one predictor variable is used to predict the CITY fuel consumption, which single variable is best and why?

Solution

OK, to solve this problem, we first need to take a look here at the table of regression equations that we have to select from. And if you notice we've got all sorts of different options here, but the ones that are going to be used are the ones here at the bottom that have only one predictor variable in them, because here in the problem statement we're only looking at the ones that have one predictor variable.

So to predict the best model, we're going to have to balance out optimum values for three different items here. The first is the P-value, the second is the adjusted R-squared value, and the third is the number of variables. The number of variables has already been taken care of for us from the restriction here in the problem statement. So all we have to do then is balance [the] P-value and adjusted R-squared value for the three possibilities here that have only one predictor variable.

Well, all of the equations have the same P-value, and it's the best P-value you could possibly have — zero. So we can't use the P-value to make a determination of which model is the best. So we have to look to [the] adjusted R-squared value. And the reason why you want to use the adjusted R-squared value and not the R-squared value is because the adjusted R-squared value is adjusted for the differing numbers of variables in the different models. That tends not to be a big deal with what we're looking at here because we're restricted to just one predictor variable. But normally you don't have that restriction. And so looking at the adjusted R-squared value is always preferred over the R-squared value.

So here we've got 0.696, 0.64 --- these are kind of in the same ballpark. And then right here, 0.92 --- [a] much better adjusted R-squared value for this last model here. So this is the one that we're going to select. It has the best combination of a small P-value, which is zero, and a large adjusted R-squared value, which you can see there in the table [is] 0.92. Good job!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below and let us know how good a job we did or how we can improve. And if your stats teacher is boring or just doesn't want to help you learn stats, go to aspiremountainacademy.com where you can learn more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

1 Comment

Ana Carol link

5/1/2023 09:42:53 am

Great explanation on how to choose the best multiple regression model based on P-value and adjusted R-squared value! Just out of curiosity, what would be the recommended approach to select the best model if all the adjusted R-squared values were similar?