Induced Smoothed Lasso Regularization Path

Fits a sequence of penalized regression models using the Induced Smoothing Lasso approach over a grid of lambda values. Supports elastic-net penalties and generalized linear models: Gaussian, Binomial, Poisson, and Gamma.

Usage

islasso.path(
  formula,
  family = gaussian(),
  lambda = NULL,
  nlambda = 100,
  lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04),
  alpha = 1,
  data,
  weights,
  subset,
  offset,
  contrasts = NULL,
  unpenalized,
  control = is.control()
)

Arguments

formula: Model formula of type response ~ predictors.
family: Response distribution. Supported families: gaussian(), binomial(), poisson(), Gamma().
lambda: Optional numeric vector of lambda values. If not provided, a sequence is automatically generated.
nlambda: Integer. Number of lambda values to generate if lambda is missing. Default is 100.
lambda.min.ratio: Smallest lambda as a fraction of lambda.max. Default: 1e-2 if nobs < nvars, else 1e-3.
alpha: Elastic-net mixing parameter: alpha = 1 is lasso, alpha = 0 is ridge.
data: Data frame containing model variables.
weights: Optional observation weights.
subset: Optional logical or numeric vector to subset observations.
offset: Optional vector of prior known components for the linear predictor.
contrasts: Optional contrast settings for factor variables.
unpenalized: Optional vector of variable names or indices excluded from penalization.
control: A list of control parameters via is.control.

Value

A list with components:

call: Matched function call.
Info: Matrix with diagnostics: lambda, deviance, degrees of freedom, dispersion, iterations, convergence status.
GoF: Model goodness-of-fit metrics: AIC, BIC, AICc, GCV, GIC, eBIC.
Coef: Matrix of coefficients across lambda values.
SE: Matrix of standard errors.
Weights: Matrix of mixing weights for the smoothed penalty.
Gradient: Matrix of gradients for the smoothed penalty.
Linear.predictors, Fitted.values, Residuals: Matrices of fitted quantities across the path.
Input: List of input arguments and design matrix.
control, formula, model, terms, data, xlevels, contrasts: Standard model components.

Details

This function fits a regularization path of models using the induced smoothing paradigm, replacing the non-smooth L1 penalty with a differentiable surrogate. Standard errors are returned for all lambda points, allowing for Wald-based hypothesis testing. The regularization path spans a range of lambda values, either user-defined or automatically computed.

References

Cilluffo G., Sottile G., La Grutta S., Muggeo V.M.R. (2019). The Induced Smoothed Lasso: A practical framework for hypothesis testing in high dimensional regression. Statistical Methods in Medical Research. DOI: 10.1177/0962280219842890

Sottile G., Cilluffo G., Muggeo V.M.R. (2019). The R package islasso: estimation and hypothesis testing in lasso regression. Technical Report. DOI: 10.13140/RG.2.2.16360.11521

Author

Gianluca Sottile gianluca.sottile@unipa.it

Examples

n <- 100; p <- 30; p1 <- 10  # number of nonzero coefficients

beta.veri <- sort(round(c(seq(.5, 3, length.out = p1/2),
                         seq(-1, -2, length.out = p1/2)), 2))
beta <- c(beta.veri, rep(0, p - p1))
sim1 <- simulXy(n = n, p = p, beta = beta, seed = 1, family = gaussian())
o <- islasso.path(y ~ ., data = sim1$data,
                  family = gaussian(), nlambda = 30L)
o
#> 
#> Call:
#> islasso.path(formula = y ~ ., family = gaussian(), nlambda = 30L, 
#>     data = sim1$data)
#> 
#> Coefficients:
#>      lambda      df     phi  deviance    logLik
#> 1    0.0353 30.9790  1.4072   97.1265 -141.4361
#> 2    0.0485 30.9712  1.4070   97.1268 -141.4362
#> 3    0.0667 30.9605  1.4068   97.1274 -141.4365
#> 4    0.0916 30.9457  1.4066   97.1285 -141.4371
#> 5    0.1257 30.9254  1.4062   97.1305 -141.4381
#> 6    0.1727 30.8974  1.4057   97.1344 -141.4401
#> 7    0.2372 30.8587  1.4050   97.1417 -141.4439
#> 8    0.3257 30.8054  1.4041   97.1554 -141.4510
#> 9    0.4473 30.7317  1.4030   97.1812 -141.4642
#> 10   0.6143 30.6296  1.4016   97.2293 -141.4890
#> 11   0.8437 30.4878  1.4000   97.3192 -141.5351
#> 12   1.1587 29.4555  1.3818   97.4797 -141.6176
#> 13   1.5912 29.0905  1.3787   97.7607 -141.7615
#> 14   2.1853 28.0146  1.3652   98.2776 -142.0251
#> 15   3.0012 27.5673  1.3698   99.2183 -142.5015
#> 16   4.1217 26.9184  1.3810  100.9230 -143.3533
#> 17   5.6606 25.9228  1.4031  103.9412 -144.8266
#> 18   7.7740 22.3593  1.4034  108.9589 -147.1839
#> 19  10.6764 19.9483  1.4640  117.1917 -150.8259
#> 20  14.6625 17.5170  1.5821  130.4956 -156.2023
#> 21  20.1368 14.9479  1.7872  152.0027 -163.8302
#> 22  27.6549 13.3220  2.1816  189.0993 -174.7490
#> 23  37.9800 11.8667  2.8761  253.4760 -189.3988
#> 24  52.1599 10.9019  4.1500  369.7558 -208.2775
#> 25  71.6340  9.7464  6.4075  578.2957 -230.6396
#> 26  98.3787  8.2018 10.1444  931.2362 -254.4610
#> 27 135.1086  5.8536 15.3597 1446.0646 -276.4654
#> 28 185.5517  3.4090 21.7090 2096.8918 -295.0459
#> 29 254.8279  1.9548 27.3788 2684.3632 -307.3953
#> 30 349.9686  1.2498 32.0240 3162.3808 -315.5894
#> 

summary(o, lambda = 10, pval = 0.05)
#> 
#> Call:
#> islasso.path(formula = y ~ ., family = gaussian(), nlambda = 30L, 
#>     data = sim1$data)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.99684 -0.74992 -0.07064  0.68113  2.75682 
#> 
#>     Estimate Std. Error z value Pr(>|z|)    
#> X1   -1.7574     0.1280 -13.732  < 2e-16 ***
#> X2   -1.5041     0.1290 -11.663  < 2e-16 ***
#> X3   -1.3583     0.1303 -10.427  < 2e-16 ***
#> X4   -1.1141     0.1360  -8.189 2.63e-16 ***
#> X5   -0.8420     0.1276  -6.601 4.08e-11 ***
#> X7    0.9960     0.1295   7.690 1.47e-14 ***
#> X8    1.6094     0.1318  12.211  < 2e-16 ***
#> X9    2.3739     0.1332  17.825  < 2e-16 ***
#> X10   2.7974     0.1362  20.540  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for  family taken to be 1.451452)
#> 
#>     Null deviance: 3311.24  on 99.00  degrees of freedom
#> Residual deviance: -150.07  on 79.55  degrees of freedom
#> AIC: 341.04
#> Lambda: 10
#> 
#> Number of Newton-Raphson iterations: 32
#> 
coef(o, lambda = 10)
#>   (Intercept)            X1            X2            X3            X4 
#>  1.419430e-01 -1.757353e+00 -1.504053e+00 -1.358269e+00 -1.114115e+00 
#>            X5            X6            X7            X8            X9 
#> -8.419879e-01  2.233492e-01  9.960151e-01  1.609426e+00  2.373855e+00 
#>           X10           X11           X12           X13           X14 
#>  2.797407e+00  9.426526e-03  1.047444e-01  4.157399e-03  6.098024e-02 
#>           X15           X16           X17           X18           X19 
#>  6.047714e-07  1.812838e-01  3.729639e-02 -7.646930e-02  7.589241e-07 
#>           X20           X21           X22           X23           X24 
#>  2.482677e-07 -4.254386e-02 -4.260865e-07  1.785697e-03  3.846848e-03 
#>           X25           X26           X27           X28           X29 
#>  2.485859e-02 -7.597015e-07 -1.467046e-03  2.050808e-01  4.678174e-07 
#>           X30 
#> -2.105243e-02 
fitted(o, lambda = 10)
#>            1            2            3            4            5            6 
#>  -7.33372135  -1.22751966   3.05928434  -4.61705978   1.30141466 -11.08718465 
#>            7            8            9           10           11           12 
#>  -5.18491838  -1.40646029   6.55306771  -5.62052384   0.34894354   1.63372128 
#>           13           14           15           16           17           18 
#>  -3.40300737   1.27310051   3.82308202  -3.48029065   1.62045665   0.46234652 
#>           19           20           21           22           23           24 
#>  -4.19240461   6.20954940   1.04068246   1.51369417   6.65138746   6.88709546 
#>           25           26           27           28           29           30 
#>   0.70908941  -8.69208753  -4.10513009  11.65909323   3.80705267   1.17755103 
#>           31           32           33           34           35           36 
#>  -0.12433214  14.36003621   0.60577280  -8.53343608   4.94045365   9.88682623 
#>           37           38           39           40           41           42 
#>   5.93981569   0.87948459  -3.26448745  -4.30682762   1.00443673  -7.35378177 
#>           43           44           45           46           47           48 
#>  12.91330858   1.41808107  -0.23925888   8.40869369   5.05172053   3.64497588 
#>           49           50           51           52           53           54 
#>  -1.58627191   3.78272123  -1.68651966   0.21730940  -0.27324986   0.96774503 
#>           55           56           57           58           59           60 
#>  -3.85317437 -11.74956161   7.58473990   5.36133383   2.15519334  -9.11074753 
#>           61           62           63           64           65           66 
#>  -3.03682786   2.26741932 -13.56657055   1.76699765  -3.92700269  -2.77588333 
#>           67           68           69           70           71           72 
#>   8.22694073  -6.75548073   3.21363663  -3.00933468   0.18196975   3.53387084 
#>           73           74           75           76           77           78 
#>  -1.86021587  -3.26539566  -4.76175555  -5.23304028   5.13892706  -0.12888707 
#>           79           80           81           82           83           84 
#>   5.39047078  -3.74756451   1.27469427   1.87393595  -4.69814434  10.55278794 
#>           85           86           87           88           89           90 
#>   0.74413618   2.36640852  -4.85420248   6.07682531   6.79852410   0.35144271 
#>           91           92           93           94           95           96 
#>   1.35933796  -5.95151395   2.19015159   1.31566559  -4.63570003   0.36480625 
#>           97           98           99          100 
#>  -8.80972385   2.86087944  -9.03175750  -0.02783179 
predict(o, type = "response", lambda = 10)
#>            1            2            3            4            5            6 
#>  -7.33372135  -1.22751966   3.05928434  -4.61705978   1.30141466 -11.08718465 
#>            7            8            9           10           11           12 
#>  -5.18491838  -1.40646029   6.55306771  -5.62052384   0.34894354   1.63372128 
#>           13           14           15           16           17           18 
#>  -3.40300737   1.27310051   3.82308202  -3.48029065   1.62045665   0.46234652 
#>           19           20           21           22           23           24 
#>  -4.19240461   6.20954940   1.04068246   1.51369417   6.65138746   6.88709546 
#>           25           26           27           28           29           30 
#>   0.70908941  -8.69208753  -4.10513009  11.65909323   3.80705267   1.17755103 
#>           31           32           33           34           35           36 
#>  -0.12433214  14.36003621   0.60577280  -8.53343608   4.94045365   9.88682623 
#>           37           38           39           40           41           42 
#>   5.93981569   0.87948459  -3.26448745  -4.30682762   1.00443673  -7.35378177 
#>           43           44           45           46           47           48 
#>  12.91330858   1.41808107  -0.23925888   8.40869369   5.05172053   3.64497588 
#>           49           50           51           52           53           54 
#>  -1.58627191   3.78272123  -1.68651966   0.21730940  -0.27324986   0.96774503 
#>           55           56           57           58           59           60 
#>  -3.85317437 -11.74956161   7.58473990   5.36133383   2.15519334  -9.11074753 
#>           61           62           63           64           65           66 
#>  -3.03682786   2.26741932 -13.56657055   1.76699765  -3.92700269  -2.77588333 
#>           67           68           69           70           71           72 
#>   8.22694073  -6.75548073   3.21363663  -3.00933468   0.18196975   3.53387084 
#>           73           74           75           76           77           78 
#>  -1.86021587  -3.26539566  -4.76175555  -5.23304028   5.13892706  -0.12888707 
#>           79           80           81           82           83           84 
#>   5.39047078  -3.74756451   1.27469427   1.87393595  -4.69814434  10.55278794 
#>           85           86           87           88           89           90 
#>   0.74413618   2.36640852  -4.85420248   6.07682531   6.79852410   0.35144271 
#>           91           92           93           94           95           96 
#>   1.35933796  -5.95151395   2.19015159   1.31566559  -4.63570003   0.36480625 
#>           97           98           99          100 
#>  -8.80972385   2.86087944  -9.03175750  -0.02783179 
plot(o, yvar = "coef")

residuals(o, lambda = 10)
#>            1            2            3            4            5            6 
#>  0.203500758 -0.070921088  0.655601438 -0.948292461 -1.292941380  0.031617959 
#>            7            8            9           10           11           12 
#>  0.534476652 -0.938054664  2.341107305  0.704135489  0.054296638  0.123963203 
#>           13           14           15           16           17           18 
#> -2.455919333  2.756824803  0.444845290 -1.077745651 -0.957225554 -2.996836042 
#>           19           20           21           22           23           24 
#> -0.268443807 -1.110295074 -0.175677909  0.756638957  0.732691516 -0.399924545 
#>           25           26           27           28           29           30 
#> -0.012697210 -1.808746364 -1.769115029  0.080492002 -0.705686605 -0.867954745 
#>           31           32           33           34           35           36 
#>  0.818314525 -0.539616946  0.387907768 -0.460102359  0.487185482  0.533349627 
#>           37           38           39           40           41           42 
#>  0.227232446 -0.070367646  1.400655737  1.235397972  0.108949067 -0.804937076 
#>           43           44           45           46           47           48 
#>  1.808331720 -1.353897571  0.641704960  1.260640190  0.354164189 -0.214549640 
#>           49           50           51           52           53           54 
#>  1.751864072 -0.162454929 -1.111939382  1.856458881 -0.073545689 -0.021959911 
#>           55           56           57           58           59           60 
#> -0.950707727 -0.496808471 -0.295716050 -0.007979055 -0.101287323 -0.220969297 
#>           61           62           63           64           65           66 
#> -0.738191013 -1.345008720  1.219241656 -1.186707655 -0.246818470  1.143297529 
#>           67           68           69           70           71           72 
#>  0.673460263  1.212373436  2.386493906  1.229814839  0.452816942 -0.009161309 
#>           73           74           75           76           77           78 
#>  1.433755696 -0.706450678  0.410774191 -0.160249470 -0.072069544  1.719784551 
#>           79           80           81           82           83           84 
#>  1.206996077 -1.544574482 -0.516246467 -0.478693632 -1.242234274  1.870383884 
#>           85           86           87           88           89           90 
#> -0.301511577  0.493521563 -1.312221684  1.075241477  1.195504244 -0.056009178 
#>           91           92           93           94           95           96 
#> -1.028586437 -1.009431470  1.159645432 -0.103428990 -0.530800976 -2.230500039 
#>           97           98           99          100 
#> -1.414122318  0.720231658 -0.785114819 -0.134236260 
deviance(o, lambda = 10)
#> [1] 115.4931
logLik(o, lambda = 10)
#> 
#> 'log Lik.' -150.1 (df = 20.44573)
#> 
GoF.islasso.path(o)

#> $gof
#>            AIC      BIC      AICc     eBIC      GCV      GIC
#>  [1,] 344.8300 425.5355 3257.6783 531.9168 593.7821 445.3355
#>  [2,] 344.8148 425.5000 3255.8926 531.8547 593.6491 445.2951
#>  [3,] 344.7940 425.4514 3253.4379 531.7694 593.4666 445.2397
#>  [4,] 344.7656 425.3845 3250.0441 531.6517 593.2149 445.1633
#>  [5,] 344.7270 425.2929 3245.3813 531.4903 592.8701 445.0587
#>  [6,] 344.6750 425.1678 3238.9644 531.2690 592.3976 444.9157
#>  [7,] 344.6052 424.9974 3230.1349 530.9658 591.7513 444.7206
#>  [8,] 344.5127 424.7659 3217.9846 530.5512 590.8690 444.4550
#>  [9,] 344.3917 424.4530 3201.2593 529.9852 589.6675 444.0950
#> [10,] 344.2372 424.0326 3178.2269 529.2144 588.0370 443.6094
#> [11,] 344.0460 423.4720 3146.4555 528.1668 585.8314 442.9581
#> [12,] 342.1461 418.8827 2922.0229 520.0324 569.1420 437.7090
#> [13,] 341.7039 417.4896 2845.9322 517.3859 563.8702 436.0826
#> [14,] 340.0794 413.0621 2630.2105 509.2637 548.1581 430.9675
#> [15,] 340.1376 411.9552 2545.0813 506.6210 543.2257 429.5747
#> [16,] 340.5434 410.6705 2425.7392 503.1081 536.8117 427.8753
#> [17,] 341.4989 409.0323 2251.5789 498.0510 527.8489 425.6007
#> [18,] 339.0863 397.3361 1702.0608 474.1175 488.3275 411.6269
#> [19,] 341.5484 393.5171 1398.7886 462.0194 470.7224 406.2670
#> [20,] 347.4386 393.0732 1143.5818 453.2263 459.1855 404.2691
#> [21,] 357.5562 396.4979  924.7907 447.8287 452.9535 406.0518
#> [22,] 376.1419 410.8480  821.5249 456.5956 465.1862 419.3627
#> [23,] 402.5310 433.4457  752.9937 474.1957 487.6711 441.0302
#> [24,] 438.3587 466.7598  732.9215 504.1967 524.7289 473.7277
#> [25,] 480.7720 506.1630  715.4700 539.6319 566.2844 512.3923
#> [26,] 525.3257 546.6929  691.5674 574.8579 603.9255 551.9351
#> [27,] 564.6379 579.8874  650.7769 599.9885 623.8253 583.6287
#> [28,] 596.9099 605.7910  628.3575 617.4976 632.4797 607.9699
#> [29,] 618.7001 623.7925  630.6034 630.5051 639.5494 625.0419
#> [30,] 633.6783 636.9342  639.4313 641.2260 647.2563 637.7330
#> 
#> $minimum
#>  AIC  BIC AICc eBIC  GCV  GIC 
#>   18   20   28   21   21   20 
#> 
#> $lambda.min
#>        AIC        BIC       AICc       eBIC        GCV        GIC 
#>   7.774004  14.662527 185.551724  20.136818  20.136818  14.662527 
#> 
#> $plot

#> 

if (FALSE) { # \dontrun{
##### binomial ######
beta <- c(1, 1, 1, rep(0, p - 3))
sim2 <- simulXy(n = n, p = p, beta = beta, interc = 1, seed = 1,
                size = 100, family = binomial())
o2 <- islasso.path(cbind(y.success, y.failure) ~ ., data = sim2$data,
                   family = binomial(), lambda = seq(0.1, 100, l = 50L))
temp <- GoF.islasso.path(o2)
summary(o2, pval = 0.05, lambda = temp$lambda.min["BIC"])

##### poisson ######
beta <- c(1, 1, 1, rep(0, p - 3))
sim3 <- simulXy(n = n, p = p, beta = beta, interc = 1, seed = 1,
                family = poisson())
o3 <- islasso.path(y ~ ., data = sim3$data, family = poisson(), nlambda = 30L)
temp <- GoF.islasso.path(o3)
summary(o3, pval = 0.05, lambda = temp$lambda.min["BIC"])

##### Gamma ######
beta <- c(1, 1, 1, rep(0, p - 3))
sim4 <- simulXy(n = n, p = p, beta = beta, interc = -1, seed = 1,
                family = Gamma(link = "log"))
o4 <- islasso.path(y ~ ., data = sim4$data, family = Gamma(link = "log"),
                   nlambda = 30L)
temp <- GoF.islasso.path(o4)
summary(o4, pval = .05, lambda = temp$lambda.min["BIC"])
} # }