regression analysis in the case of censored data

 Posted by Cassano, Rossella at November 12. 2012

 

Hello,
I want just to ask is there in “R” a routine to perform a linear regression analysis in the presence of censored data (i.e., upper limits
in the y-variable)? The case I’m considering is that of “classical” luminosity-luminosity correlation.
Thanks in advance,
Rossella

 

 

 Posted by Feigelson, Eric at April 08. 2013

Rossella,   [Sorry for the late response]

There is no regression for censored data in R, but there is in (at least) two of the 4000+ CRAN packages.

  • The package emplik (Empirical likelihood ratio for censored/truncated data) has function BJnoint that computes the Buckley-James line for multivariate (of course, including  bivariate) data with censoring in the response variable.  This is a well-established regression procedure that allows non-Gaussian residuals estimated using the Kaplan-Meier distribution.  Beware that BJnoint, like almost all available code, is written for right-censored data, so astronomers with left-censoring (i.e. upper limits) need to multiply by -1.  The Buckley-James line was in the stand-alone Fortran code ASURV that we wrote 20 years ago.  The package also has a hypothesis test called bjtest to test whether the regression slope is compatible with a preselected value.

 

  • The package rms (Regression Modeling Strategies) is a large well-maintained CRAN package for biostatistics and econometrics with function bj implementing the Buckley-James regression model.  It has several useful acnillary functions.  The bootcov function giving bootstrap confidence intervals for the regression coefficients is recommended for nonstandard censoring patterns, as typically seen in astronomy.  The bjplot function give specialized plots, residuals.bj pulls out regression residuals so (for example) their maximum-likelihood Kaplan-Meier estimator can be obtained. The rms package has a variety of other regression tools including Cox regression, variable selection tools, a generalized least squares that permits heteroscedastic errors (i.e. unequal measurement errors in the response variable), logistic regression (for binary Yes/No response variables), interfaces with other R packages, and extensive plotting and infrastructure capabilities.

I can imagine that astronomers with measurement errors and censoring (upper limits) in a response variable (Y), and no censoring in the independent variables (X), would find considerable benefit from the rms package.  Its methods are described in the book Regression Modeling Strategies by F. E. Harrell (Springer-Verlag, 2001) and the web page http://biostat.mc.vanderbilt.edu/rms.

Eric