Censored data with measurement errors

 Posted by Shivaei, Irene at May 29. 2014

Hi everyone,

 

I have a sample with censored data and measurement errors for the uncensored (detected) ones. Is there a command in R that I can use for a regression fit, which take into account both the censored data and the errors on the uncensored data?

 

Thanks for your help.

 

Irene.

 Posted by Feigelson, Eric at September 16. 2014

Irene,

The simple answer is No, R does not have such a function.  The reason is that statisticians have not treated this problem, despite its ubiquity in astronomy!  Survival analysis, developed since the 1960s and very widely used in industry (with ~190 CRAN packages), mostly deals with datasets having right-censored values (lower limits) that are known with perfect precision.  The date of death or failure in a survival experiment has essentially no uncertainty, and the censored value is known in advance from the experiment design.  Consulting senior statisticians of the years, I have been told that there is essentially no way to retrofit the mathematics of established survival analysis to include measurement errors in both the observed and censored data values.

The situation is not hopeless, as statistical inference has vast potential capabilities, though little has been done.  A promising start was made by Brandon Kelly in a 2007 Astrophysical Journal article.  Here he writes a complicated likelihood function for a bivariate linear regression model that includes heteroscedastic measurement errors, as well as a population error, about the line.  This approach leads to less biased regression fits than methods ignoring the measurement errors. The effects can be important; for example, an observed anti-correlation relating to infrared spectral energy distributions of interstellar dust was found to be a positive correlation when the new method was used.

Kelly mentions that his likelihood can be extended to the case of censored data, truncated data, and also to multivariate and nonlinear regression.  But noone has actually pursued this idea … which I think would be very valuable to the astronomical community.  I can imagine that his likelihood can be combined with the schematic likelihood for 1-dimensional (detected + censored + truncated) data given in the text by Klein & Moeschberger (2005) to form a generic likelihood for astronomical data subject to heteroscedastic noise and sensitivity limits.  The astronomer would adapt this likelihood to their particular model and dataset, and then use maximum likelihood estimation and/or Bayesian inference to obtain best fit parameters and their uncertainties.

This is a great project for a statistically-inclined astronomer, or an astronomically-inclined statistician!

Eric Feigelson