Call:
lm(formula = Tondu ~ age + edu + income, data = TEDS_2016)
Residuals:
Min 1Q Median 3Q Max
-3.7780 -1.1841 -0.4322 1.1079 5.4157
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.302529 0.257369 20.603 < 2e-16 ***
age -0.004205 0.003194 -1.316 0.1882
edu -0.244608 0.037579 -6.509 9.96e-11 ***
income -0.031855 0.016357 -1.948 0.0516 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.725 on 1676 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.04287, Adjusted R-squared: 0.04115
F-statistic: 25.02 on 3 and 1676 DF, p-value: 7.771e-16
plot(lm.fit)
Question: What is the problem? Why?
Answer: Tondu is a 7-category nominal variable. Attempting to predict it with a linear regression does not work meaningfully. The scale is not consistent between categories and the higher numbers do not mean more of the variable than the lower numbers.
“No response” - Does not show up as a missing value but that’s what it is in practice. If there are a lot of “No responses” then that may impact prediction of the DV.
The results show that our current model does not have good fit (the above could be one possible reason to answer why).
Question: What can be done to improve prediction of the dependent variable?
Answer: A logistic regression is one option to potentially better predict a nominal categorical variable. However, a better first step is to test multiple models to determine which one has the best fit. We can do this by assessing their ROC curves.
“No response” issue - This could impact all models, so would want to find a way to either remove or treat these responses to that they do not become noise in your models.