| Title: | Select Variables for Linear Models |
|---|---|
| Description: | Provides variable selection for linear models and generalized linear models using Bayesian information criterion (BIC) and model posterior probability (MPP). Given a set of candidate predictors, it evaluates candidate models and returns model-level summaries (BIC and MPP) and predictor-level posterior inclusion probabilities (PIP). For more details see Xu, S., Ferreira, M. A., & Tegge, A. N. (2025) <doi:10.48550/arXiv.2510.02628>. |
| Authors: | Shuangshuang Xu [aut, cre] |
| Maintainer: | Shuangshuang Xu <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.0 |
| Built: | 2026-05-19 07:11:43 UTC |
| Source: | https://github.com/xss55/variableselection |
A data frame with seven columns. The independent variables are in the first six columns. The dependent variable is in the seventh column.
datdat
datA data frame.
Description: glm.best is used to fit generalized linear model for the best model provided by modelselect.glm.
glm.best( object, family, method = "models", threshold = 0.95, x = FALSE, y = FALSE )glm.best( object, family, method = "models", threshold = 0.95, x = FALSE, y = FALSE )
object |
the model selection result from |
family |
a character string naming a family function describing the error distribution to be used in the model. |
method |
the criteria to do model select.
|
threshold |
The threshold for variable selection. The variables with posterior inclusion probability larger than the threshold are selected in the best model. The default is 0.95. |
x, y
|
logicals. If |
An object of class "glm", which is a list containing the following components:
coefficientsa named vector of coefficients.
residualsthe working residuals, that is the residuals in the final iteration of the IWLS fit.
fitted.valuesthe fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.
rankthe numeric rank of the fitted linear model.
familythe family object used.
linear.predictorsthe linear fit on the link scale.
devianceup to a constant, minus twice the maximized log-likelihood.
aicA version of Akaike's An Information Criterion, minus twice the maximized log-likelihood plus twice the number of parameters, computed by the aic component of the family.
null.devianceThe deviance for the null model, comparable with deviance. The null model will include the offset, and an intercept if there is one in the model.
iterthe number of iterations of IWLS used.
weightsthe working weights, that is the weights in the final iteration of the IWLS fit.
prior.weightsthe weights initially supplied, a vector of 1s if none were.
df.residualthe residual degrees of freedom.
df.nullthe residual degrees of freedom for the null model.
yif requested, the response vector used.
convergedlogical. Was the IWLS algorithm judged to have converged?
boundarylogical. Is the fitted value on the boundary of the allowable values?
modelif requested (the default), the model frame used.
callthe matched call.
formulathe formula supplied.
termsthe terms.object used.
datathe data argument.
thresholdthe threshold used for method = "variables".
A data frame with seven columns. The independent variables are in the first six columns. The dependent variable is in the seventh column.
glmdatglmdat
glmdatA data frame.
Description: lm.best is used to fit linear model for the best model provided by modelselect.lm.
lm.best(object, method = "models", threshold = 0.95, x = FALSE, y = FALSE)lm.best(object, method = "models", threshold = 0.95, x = FALSE, y = FALSE)
object |
the model selection result from |
method |
the criteria to do model select.
|
threshold |
The threshold for variable selection. The variables with posterior inclusion probability larger than the threshold are selected in the best model. The default is 0.95. |
x, y
|
logicals. If |
An object of class "lm", which is a list containing the following components:
coefficientsA named vector of coefficients.
residualsThe residuals, that is the response minus the fitted values.
fitted.valuesThe fitted mean values.
rankThe numeric rank of the fitted linear model.
df.residualThe residual degrees of freedom.
callThe matched call.
termsThe terms object used.
model(If requested) the model frame used.
qr(If requested) the QR decomposition of the design matrix.
xlevels(If the model formula includes factors) a record of the levels of the factors.
contrasts(If the model formula includes factors) the contrasts used.
offsetThe offset used.
thresholdthe threshold used for method = "variables".
Description: use BIC to do variable selection.
modelselect.glm( formula, data, family, GA_var = 16, maxiterations = 2000, runs_til_stop = 1000, monitor = TRUE, popSize = 100, verbose = TRUE )modelselect.glm( formula, data, family, GA_var = 16, maxiterations = 2000, runs_til_stop = 1000, monitor = TRUE, popSize = 100, verbose = TRUE )
formula |
an object of class "formula": a symbolic description of the model to be fitted.
A typical model has the form |
data |
an data frame containing the variables in the model. |
family |
a character string naming a family function describing the error distribution to be used in the model. |
GA_var |
if the number of variables is smaller than |
maxiterations |
the maximum number of iterations to run before the GA search is halted. |
runs_til_stop |
the number of consecutive generations without any improvement in the best fitness value before the GA is stopped. |
monitor |
a logical defaulting to TRUE showing the evolution of the search. If monitor = FALSE, any output is suppressed. |
popSize |
the population size. |
verbose |
Logical; if TRUE, print a brief summary of results. |
modelselect.glm returns a list containing the following components:
modelsA data frame of candidate models' BIC and posterior probabilities, sorted by decreasing posterior probability
variablesA data frame of candidate variables' posterior inclusion probabilities
dataThe data with variables in the formula.
The function glm.best is used to obtain the linear fitting to the best model by posterior probability or by controlling variables' posterior inclusion probabilities.
Description: use BIC to do variable selection.
modelselect.lm( formula, data, GA_var = 16, maxiterations = 2000, runs_til_stop = 1000, monitor = TRUE, popSize = 100, verbose = TRUE )modelselect.lm( formula, data, GA_var = 16, maxiterations = 2000, runs_til_stop = 1000, monitor = TRUE, popSize = 100, verbose = TRUE )
formula |
an object of class "formula": a symbolic description of the model to be fitted.
A typical model has the form |
data |
an data frame containing the variables in the model. |
GA_var |
if the number of variables is smaller than |
maxiterations |
the maximum number of iterations to run before the GA search is halted. |
runs_til_stop |
the number of consecutive generations without any improvement in the best fitness value before the GA is stopped. |
monitor |
a logical defaulting to TRUE showing the evolution of the search. If monitor = FALSE, any output is suppressed. |
popSize |
the population size. |
verbose |
Logical; if TRUE, print a brief summary of results. |
modelselect.lm returns a list containing the following components:
modelsA data frame of candidate models' BIC and posterior probabilities, sorted by decreasing posterior probability
variablesA data frame of candidate variables' posterior inclusion probabilities
dataThe data with variables in the formula.
The function lm.best is used to obtain the linear fitting to the best model by posterior probability or by controlling variables' posterior inclusion probabilities.