Discussion:
[R] GAM: Overfitting
Jean G. Orelien
2004-12-22 01:37:23 UTC
Permalink
I am analyzing particulate matter data (PM10) on a small data set (147
observations). I fitted a semi-parametric model and am worried about
overfitting. How can one check for model fit in GAM?



Jean G. Orelien
Frank E Harrell Jr
2004-12-22 03:25:39 UTC
Permalink
Post by Jean G. Orelien
I am analyzing particulate matter data (PM10) on a small data set (147
observations). I fitted a semi-parametric model and am worried about
overfitting. How can one check for model fit in GAM?
Jean G. Orelien
It's good to separate 'model fit' (or lack of fit) from 'overfitting'.
Overfitting can cause the model fit to appear to be excellent, but there
is still a huge problem.
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
Simon Wood
2004-12-22 11:08:17 UTC
Permalink
Post by Jean G. Orelien
I am analyzing particulate matter data (PM10) on a small data set (147
observations). I fitted a semi-parametric model and am worried about
overfitting. How can one check for model fit in GAM?
- Keeping a random subset of the data as a validation set, fitting
to the remaining data and then comparing the R^2/ proportion deviance explained
on fit set and validation set is usually quite diagnostic. If the fit data
are much better predicted than the validation data, then you probably have
over-fitting.

- If your response is treated as Poisson then scale parameter estimates
<<1 are also diagnostic, but only if you are not expecting overdispersion,
of course.

- If you use gam from package mgcv then, by default, model
effective degrees of freedom are estimated from your data by GCV or an
approximation to AIC. mgcv::gam allows you to increase the penalty on each
model degree of freedom in these criteria, via gam argument `gamma'. Some
work by Kim and Gu (2004, J.Roy.Statist.Soc.B) suggests that gamma around
1.4 can be a sensible choise for surpressing overfitting, without
much of a degredation in MSE performance.


best,
Simon

Loading...