Skip to contents

Fits a sequence of penalized regression models using the Induced Smoothing Lasso approach over a grid of lambda values. Supports elastic-net penalties and generalized linear models: Gaussian, Binomial, Poisson, and Gamma.

Usage

islasso.path(
  formula,
  family = gaussian(),
  lambda = NULL,
  nlambda = 100,
  lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04),
  alpha = 1,
  data,
  weights,
  subset,
  offset,
  contrasts = NULL,
  unpenalized,
  control = is.control()
)

Arguments

formula

Model formula of type response ~ predictors.

family

Response distribution. Supported families: gaussian(), binomial(), poisson(), Gamma().

lambda

Optional numeric vector of lambda values. If not provided, a sequence is automatically generated.

nlambda

Integer. Number of lambda values to generate if lambda is missing. Default is 100.

lambda.min.ratio

Smallest lambda as a fraction of lambda.max. Default: 1e-2 if nobs < nvars, else 1e-3.

alpha

Elastic-net mixing parameter: alpha = 1 is lasso, alpha = 0 is ridge.

data

Data frame containing model variables.

weights

Optional observation weights.

subset

Optional logical or numeric vector to subset observations.

offset

Optional vector of prior known components for the linear predictor.

contrasts

Optional contrast settings for factor variables.

unpenalized

Optional vector of variable names or indices excluded from penalization.

control

A list of control parameters via is.control.

Value

A list with components:

call

Matched function call.

Info

Matrix with diagnostics: lambda, deviance, degrees of freedom, dispersion, iterations, convergence status.

GoF

Model goodness-of-fit metrics: AIC, BIC, AICc, GCV, GIC, eBIC.

Coef

Matrix of coefficients across lambda values.

SE

Matrix of standard errors.

Weights

Matrix of mixing weights for the smoothed penalty.

Gradient

Matrix of gradients for the smoothed penalty.

Linear.predictors, Fitted.values, Residuals

Matrices of fitted quantities across the path.

Input

List of input arguments and design matrix.

control, formula, model, terms, data, xlevels, contrasts

Standard model components.

Details

This function fits a regularization path of models using the induced smoothing paradigm, replacing the non-smooth L1 penalty with a differentiable surrogate. Standard errors are returned for all lambda points, allowing for Wald-based hypothesis testing. The regularization path spans a range of lambda values, either user-defined or automatically computed.

References

Cilluffo G., Sottile G., La Grutta S., Muggeo V.M.R. (2019). The Induced Smoothed Lasso: A practical framework for hypothesis testing in high dimensional regression. Statistical Methods in Medical Research. DOI: 10.1177/0962280219842890

Sottile G., Cilluffo G., Muggeo V.M.R. (2019). The R package islasso: estimation and hypothesis testing in lasso regression. Technical Report. DOI: 10.13140/RG.2.2.16360.11521

Author

Gianluca Sottile gianluca.sottile@unipa.it

Examples

n <- 100; p <- 30; p1 <- 10  # number of nonzero coefficients

beta.veri <- sort(round(c(seq(.5, 3, length.out = p1/2),
                         seq(-1, -2, length.out = p1/2)), 2))
beta <- c(beta.veri, rep(0, p - p1))
sim1 <- simulXy(n = n, p = p, beta = beta, seed = 1, family = gaussian())
o <- islasso.path(y ~ ., data = sim1$data,
                  family = gaussian(), nlambda = 30L)
o
#> 
#> Call:
#> islasso.path(formula = y ~ ., family = gaussian(), nlambda = 30L, 
#>     data = sim1$data)
#> 
#> Coefficients:
#>      lambda      df     phi  deviance    logLik
#> 1    0.0353 30.9790  1.4072   97.1265 -141.4361
#> 2    0.0485 30.9712  1.4070   97.1268 -141.4362
#> 3    0.0667 30.9605  1.4068   97.1274 -141.4365
#> 4    0.0916 30.9457  1.4066   97.1285 -141.4371
#> 5    0.1257 30.9254  1.4062   97.1305 -141.4381
#> 6    0.1727 30.8974  1.4057   97.1344 -141.4401
#> 7    0.2372 30.8587  1.4050   97.1417 -141.4439
#> 8    0.3257 30.8054  1.4041   97.1554 -141.4510
#> 9    0.4473 30.7317  1.4030   97.1812 -141.4642
#> 10   0.6143 30.6296  1.4016   97.2293 -141.4890
#> 11   0.8437 30.4878  1.4000   97.3192 -141.5351
#> 12   1.1587 29.4555  1.3818   97.4797 -141.6176
#> 13   1.5912 29.0905  1.3787   97.7607 -141.7615
#> 14   2.1853 28.0146  1.3652   98.2776 -142.0251
#> 15   3.0012 27.5673  1.3698   99.2183 -142.5015
#> 16   4.1217 26.9184  1.3810  100.9230 -143.3533
#> 17   5.6606 25.9228  1.4031  103.9412 -144.8266
#> 18   7.7740 22.3593  1.4034  108.9589 -147.1839
#> 19  10.6764 19.9483  1.4640  117.1917 -150.8259
#> 20  14.6625 17.5170  1.5821  130.4956 -156.2023
#> 21  20.1368 14.9479  1.7872  152.0027 -163.8302
#> 22  27.6549 13.3220  2.1816  189.0993 -174.7490
#> 23  37.9800 11.8667  2.8761  253.4760 -189.3988
#> 24  52.1599 10.9019  4.1500  369.7558 -208.2775
#> 25  71.6340  9.7464  6.4075  578.2957 -230.6396
#> 26  98.3787  8.2018 10.1444  931.2362 -254.4610
#> 27 135.1086  5.8536 15.3597 1446.0646 -276.4654
#> 28 185.5517  3.4090 21.7090 2096.8918 -295.0459
#> 29 254.8279  1.9548 27.3788 2684.3632 -307.3953
#> 30 349.9686  1.2498 32.0240 3162.3808 -315.5894
#> 

summary(o, lambda = 10, pval = 0.05)
#> 
#> Call:
#> islasso.path(formula = y ~ ., family = gaussian(), nlambda = 30L, 
#>     data = sim1$data)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.99684 -0.74992 -0.07064  0.68113  2.75682 
#> 
#>     Estimate Std. Error z value Pr(>|z|)    
#> X1   -1.7574     0.1280 -13.732  < 2e-16 ***
#> X2   -1.5041     0.1290 -11.663  < 2e-16 ***
#> X3   -1.3583     0.1303 -10.427  < 2e-16 ***
#> X4   -1.1141     0.1360  -8.189 2.63e-16 ***
#> X5   -0.8420     0.1276  -6.601 4.08e-11 ***
#> X7    0.9960     0.1295   7.690 1.47e-14 ***
#> X8    1.6094     0.1318  12.211  < 2e-16 ***
#> X9    2.3739     0.1332  17.825  < 2e-16 ***
#> X10   2.7974     0.1362  20.540  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for  family taken to be 1.451452)
#> 
#>     Null deviance: 3311.24  on 99.00  degrees of freedom
#> Residual deviance: -150.07  on 79.55  degrees of freedom
#> AIC: 341.04
#> Lambda: 10
#> 
#> Number of Newton-Raphson iterations: 32
#> 
coef(o, lambda = 10)
#>   (Intercept)            X1            X2            X3            X4 
#>  1.419430e-01 -1.757353e+00 -1.504053e+00 -1.358269e+00 -1.114115e+00 
#>            X5            X6            X7            X8            X9 
#> -8.419879e-01  2.233492e-01  9.960151e-01  1.609426e+00  2.373855e+00 
#>           X10           X11           X12           X13           X14 
#>  2.797407e+00  9.426526e-03  1.047444e-01  4.157399e-03  6.098024e-02 
#>           X15           X16           X17           X18           X19 
#>  6.047714e-07  1.812838e-01  3.729639e-02 -7.646930e-02  7.589241e-07 
#>           X20           X21           X22           X23           X24 
#>  2.482677e-07 -4.254386e-02 -4.260865e-07  1.785697e-03  3.846848e-03 
#>           X25           X26           X27           X28           X29 
#>  2.485859e-02 -7.597015e-07 -1.467046e-03  2.050808e-01  4.678174e-07 
#>           X30 
#> -2.105243e-02 
fitted(o, lambda = 10)
#>            1            2            3            4            5            6 
#>  -7.33372135  -1.22751966   3.05928434  -4.61705978   1.30141466 -11.08718465 
#>            7            8            9           10           11           12 
#>  -5.18491838  -1.40646029   6.55306771  -5.62052384   0.34894354   1.63372128 
#>           13           14           15           16           17           18 
#>  -3.40300737   1.27310051   3.82308202  -3.48029065   1.62045665   0.46234652 
#>           19           20           21           22           23           24 
#>  -4.19240461   6.20954940   1.04068246   1.51369417   6.65138746   6.88709546 
#>           25           26           27           28           29           30 
#>   0.70908941  -8.69208753  -4.10513009  11.65909323   3.80705267   1.17755103 
#>           31           32           33           34           35           36 
#>  -0.12433214  14.36003621   0.60577280  -8.53343608   4.94045365   9.88682623 
#>           37           38           39           40           41           42 
#>   5.93981569   0.87948459  -3.26448745  -4.30682762   1.00443673  -7.35378177 
#>           43           44           45           46           47           48 
#>  12.91330858   1.41808107  -0.23925888   8.40869369   5.05172053   3.64497588 
#>           49           50           51           52           53           54 
#>  -1.58627191   3.78272123  -1.68651966   0.21730940  -0.27324986   0.96774503 
#>           55           56           57           58           59           60 
#>  -3.85317437 -11.74956161   7.58473990   5.36133383   2.15519334  -9.11074753 
#>           61           62           63           64           65           66 
#>  -3.03682786   2.26741932 -13.56657055   1.76699765  -3.92700269  -2.77588333 
#>           67           68           69           70           71           72 
#>   8.22694073  -6.75548073   3.21363663  -3.00933468   0.18196975   3.53387084 
#>           73           74           75           76           77           78 
#>  -1.86021587  -3.26539566  -4.76175555  -5.23304028   5.13892706  -0.12888707 
#>           79           80           81           82           83           84 
#>   5.39047078  -3.74756451   1.27469427   1.87393595  -4.69814434  10.55278794 
#>           85           86           87           88           89           90 
#>   0.74413618   2.36640852  -4.85420248   6.07682531   6.79852410   0.35144271 
#>           91           92           93           94           95           96 
#>   1.35933796  -5.95151395   2.19015159   1.31566559  -4.63570003   0.36480625 
#>           97           98           99          100 
#>  -8.80972385   2.86087944  -9.03175750  -0.02783179 
predict(o, type = "response", lambda = 10)
#>            1            2            3            4            5            6 
#>  -7.33372135  -1.22751966   3.05928434  -4.61705978   1.30141466 -11.08718465 
#>            7            8            9           10           11           12 
#>  -5.18491838  -1.40646029   6.55306771  -5.62052384   0.34894354   1.63372128 
#>           13           14           15           16           17           18 
#>  -3.40300737   1.27310051   3.82308202  -3.48029065   1.62045665   0.46234652 
#>           19           20           21           22           23           24 
#>  -4.19240461   6.20954940   1.04068246   1.51369417   6.65138746   6.88709546 
#>           25           26           27           28           29           30 
#>   0.70908941  -8.69208753  -4.10513009  11.65909323   3.80705267   1.17755103 
#>           31           32           33           34           35           36 
#>  -0.12433214  14.36003621   0.60577280  -8.53343608   4.94045365   9.88682623 
#>           37           38           39           40           41           42 
#>   5.93981569   0.87948459  -3.26448745  -4.30682762   1.00443673  -7.35378177 
#>           43           44           45           46           47           48 
#>  12.91330858   1.41808107  -0.23925888   8.40869369   5.05172053   3.64497588 
#>           49           50           51           52           53           54 
#>  -1.58627191   3.78272123  -1.68651966   0.21730940  -0.27324986   0.96774503 
#>           55           56           57           58           59           60 
#>  -3.85317437 -11.74956161   7.58473990   5.36133383   2.15519334  -9.11074753 
#>           61           62           63           64           65           66 
#>  -3.03682786   2.26741932 -13.56657055   1.76699765  -3.92700269  -2.77588333 
#>           67           68           69           70           71           72 
#>   8.22694073  -6.75548073   3.21363663  -3.00933468   0.18196975   3.53387084 
#>           73           74           75           76           77           78 
#>  -1.86021587  -3.26539566  -4.76175555  -5.23304028   5.13892706  -0.12888707 
#>           79           80           81           82           83           84 
#>   5.39047078  -3.74756451   1.27469427   1.87393595  -4.69814434  10.55278794 
#>           85           86           87           88           89           90 
#>   0.74413618   2.36640852  -4.85420248   6.07682531   6.79852410   0.35144271 
#>           91           92           93           94           95           96 
#>   1.35933796  -5.95151395   2.19015159   1.31566559  -4.63570003   0.36480625 
#>           97           98           99          100 
#>  -8.80972385   2.86087944  -9.03175750  -0.02783179 
plot(o, yvar = "coef")

residuals(o, lambda = 10)
#>            1            2            3            4            5            6 
#>  0.203500758 -0.070921088  0.655601438 -0.948292461 -1.292941380  0.031617959 
#>            7            8            9           10           11           12 
#>  0.534476652 -0.938054664  2.341107305  0.704135489  0.054296638  0.123963203 
#>           13           14           15           16           17           18 
#> -2.455919333  2.756824803  0.444845290 -1.077745651 -0.957225554 -2.996836042 
#>           19           20           21           22           23           24 
#> -0.268443807 -1.110295074 -0.175677909  0.756638957  0.732691516 -0.399924545 
#>           25           26           27           28           29           30 
#> -0.012697210 -1.808746364 -1.769115029  0.080492002 -0.705686605 -0.867954745 
#>           31           32           33           34           35           36 
#>  0.818314525 -0.539616946  0.387907768 -0.460102359  0.487185482  0.533349627 
#>           37           38           39           40           41           42 
#>  0.227232446 -0.070367646  1.400655737  1.235397972  0.108949067 -0.804937076 
#>           43           44           45           46           47           48 
#>  1.808331720 -1.353897571  0.641704960  1.260640190  0.354164189 -0.214549640 
#>           49           50           51           52           53           54 
#>  1.751864072 -0.162454929 -1.111939382  1.856458881 -0.073545689 -0.021959911 
#>           55           56           57           58           59           60 
#> -0.950707727 -0.496808471 -0.295716050 -0.007979055 -0.101287323 -0.220969297 
#>           61           62           63           64           65           66 
#> -0.738191013 -1.345008720  1.219241656 -1.186707655 -0.246818470  1.143297529 
#>           67           68           69           70           71           72 
#>  0.673460263  1.212373436  2.386493906  1.229814839  0.452816942 -0.009161309 
#>           73           74           75           76           77           78 
#>  1.433755696 -0.706450678  0.410774191 -0.160249470 -0.072069544  1.719784551 
#>           79           80           81           82           83           84 
#>  1.206996077 -1.544574482 -0.516246467 -0.478693632 -1.242234274  1.870383884 
#>           85           86           87           88           89           90 
#> -0.301511577  0.493521563 -1.312221684  1.075241477  1.195504244 -0.056009178 
#>           91           92           93           94           95           96 
#> -1.028586437 -1.009431470  1.159645432 -0.103428990 -0.530800976 -2.230500039 
#>           97           98           99          100 
#> -1.414122318  0.720231658 -0.785114819 -0.134236260 
deviance(o, lambda = 10)
#> [1] 115.4931
logLik(o, lambda = 10)
#> 
#> 'log Lik.' -150.1 (df = 20.44573)
#> 
GoF.islasso.path(o)

#> $gof
#>            AIC      BIC      AICc     eBIC      GCV      GIC
#>  [1,] 344.8300 425.5355 3257.6783 531.9168 593.7821 445.3355
#>  [2,] 344.8148 425.5000 3255.8926 531.8547 593.6491 445.2951
#>  [3,] 344.7940 425.4514 3253.4379 531.7694 593.4666 445.2397
#>  [4,] 344.7656 425.3845 3250.0441 531.6517 593.2149 445.1633
#>  [5,] 344.7270 425.2929 3245.3813 531.4903 592.8701 445.0587
#>  [6,] 344.6750 425.1678 3238.9644 531.2690 592.3976 444.9157
#>  [7,] 344.6052 424.9974 3230.1349 530.9658 591.7513 444.7206
#>  [8,] 344.5127 424.7659 3217.9846 530.5512 590.8690 444.4550
#>  [9,] 344.3917 424.4530 3201.2593 529.9852 589.6675 444.0950
#> [10,] 344.2372 424.0326 3178.2269 529.2144 588.0370 443.6094
#> [11,] 344.0460 423.4720 3146.4555 528.1668 585.8314 442.9581
#> [12,] 342.1461 418.8827 2922.0229 520.0324 569.1420 437.7090
#> [13,] 341.7039 417.4896 2845.9322 517.3859 563.8702 436.0826
#> [14,] 340.0794 413.0621 2630.2105 509.2637 548.1581 430.9675
#> [15,] 340.1376 411.9552 2545.0813 506.6210 543.2257 429.5747
#> [16,] 340.5434 410.6705 2425.7392 503.1081 536.8117 427.8753
#> [17,] 341.4989 409.0323 2251.5789 498.0510 527.8489 425.6007
#> [18,] 339.0863 397.3361 1702.0608 474.1175 488.3275 411.6269
#> [19,] 341.5484 393.5171 1398.7886 462.0194 470.7224 406.2670
#> [20,] 347.4386 393.0732 1143.5818 453.2263 459.1855 404.2691
#> [21,] 357.5562 396.4979  924.7907 447.8287 452.9535 406.0518
#> [22,] 376.1419 410.8480  821.5249 456.5956 465.1862 419.3627
#> [23,] 402.5310 433.4457  752.9937 474.1957 487.6711 441.0302
#> [24,] 438.3587 466.7598  732.9215 504.1967 524.7289 473.7277
#> [25,] 480.7720 506.1630  715.4700 539.6319 566.2844 512.3923
#> [26,] 525.3257 546.6929  691.5674 574.8579 603.9255 551.9351
#> [27,] 564.6379 579.8874  650.7769 599.9885 623.8253 583.6287
#> [28,] 596.9099 605.7910  628.3575 617.4976 632.4797 607.9699
#> [29,] 618.7001 623.7925  630.6034 630.5051 639.5494 625.0419
#> [30,] 633.6783 636.9342  639.4313 641.2260 647.2563 637.7330
#> 
#> $minimum
#>  AIC  BIC AICc eBIC  GCV  GIC 
#>   18   20   28   21   21   20 
#> 
#> $lambda.min
#>        AIC        BIC       AICc       eBIC        GCV        GIC 
#>   7.774004  14.662527 185.551724  20.136818  20.136818  14.662527 
#> 
#> $plot

#> 

if (FALSE) { # \dontrun{
##### binomial ######
beta <- c(1, 1, 1, rep(0, p - 3))
sim2 <- simulXy(n = n, p = p, beta = beta, interc = 1, seed = 1,
                size = 100, family = binomial())
o2 <- islasso.path(cbind(y.success, y.failure) ~ ., data = sim2$data,
                   family = binomial(), lambda = seq(0.1, 100, l = 50L))
temp <- GoF.islasso.path(o2)
summary(o2, pval = 0.05, lambda = temp$lambda.min["BIC"])

##### poisson ######
beta <- c(1, 1, 1, rep(0, p - 3))
sim3 <- simulXy(n = n, p = p, beta = beta, interc = 1, seed = 1,
                family = poisson())
o3 <- islasso.path(y ~ ., data = sim3$data, family = poisson(), nlambda = 30L)
temp <- GoF.islasso.path(o3)
summary(o3, pval = 0.05, lambda = temp$lambda.min["BIC"])

##### Gamma ######
beta <- c(1, 1, 1, rep(0, p - 3))
sim4 <- simulXy(n = n, p = p, beta = beta, interc = -1, seed = 1,
                family = Gamma(link = "log"))
o4 <- islasso.path(y ~ ., data = sim4$data, family = Gamma(link = "log"),
                   nlambda = 30L)
temp <- GoF.islasso.path(o4)
summary(o4, pval = .05, lambda = temp$lambda.min["BIC"])
} # }