![]() ![]() These examples use some simulated data from the following problem. We use a series of examples to make our discussion of the lasso more accessible. ![]() In other cases, the many potential covariates come from administrative data, social media, or other sources that naturally produce huge numbers of potential covariates. In many cases, the many potential covariates are created from polynomials, splines, or other functions of the original covariates. High-dimensional models are nearly ubiquitous in prediction problems and models that use flexible functional forms. In these technical terms, the lasso is most useful when estimating the coefficients in a high-dimensional, approximately sparse, model. More realistically, the approximate sparsity assumption requires that the number of nonzero coefficients in the model that best approximates the real world be small relative to the sample size. The assumption that the number of coefficients that are nonzero in the true model is small relative to the sample size is known as a sparsity assumption. A model with more covariates than whose coefficients you could reliably estimate from the available sample size is known as a high-dimensional model. There are technical terms for our example situation. The lasso produces estimates of the coefficients and solves this covariate-selection problem. Given that only a few of the many covariates affect the outcome, the problem is now that we don’t know which covariates are important and which are not. ![]() We believe that only about 10 of the covariates are important, and we feel that 10 covariates are “a few” relative to 600 observations. We have too many potential covariates because we cannot reliably estimate 100 coefficients from 600 observations. In the example discussed below, we observe the most recent health-inspection scores for 600 restaurants, and we have 100 covariates that could potentially affect each one’s score. “Few” and “many” are defined relative to the sample size. The lasso is most useful when a few out of many potential covariates affect the outcome and it is important to include only the covariates that have an affect. In the next post, we discuss using the lasso for inference about causal parameters. ![]() In this post, we provide an introduction to the lasso and discuss using the lasso for prediction. The lasso is used for outcome prediction and for inference about causal parameters. The least absolute shrinkage and selection operator (lasso) estimates model coefficients and these estimates can be used to select which covariates should be included in a model. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
May 2023
Categories |