An Astrostatistical Grand Challenge – Astrostatistics and Astroinformatics Portal

Astronomical data with measurement errors and non-detections. Comments on this article should be placed in the ASAIP Astrostatistics Forum.

The Problem

Astronomical data often consists of quantitative measures of some properties for a sample of celestial objects observed at a telescope. Often the sample is predefined by some previous survey, and new properties are under examination. Two problems arise:

1. Every measurement is accompanied by a carefully evaluated measurement error. It may include statistical errors due to noise in the detector and other errors introduced by the instrument and data processing. The errors are obtained by examination of source-free regions of the raw data, calibration measurements at the telescope, and simulation. In statistical parlance, the dataset has heteroscedastic measurement errors with known variances.

2. Sometimes the property of a celestial object is too faint to be detected, and the measured value is indistinguishable from the noise. The astronomer then considered this value to be undetected and typically assigns a value like (3 x noise) as an upper limit. In statistical parlance, these are called left-censored data points.

The result is a multivariate dataset like the following:

Object	Property #1	Meas Err #1	Property #2	Meas Err #2	Class
NGC_1111	23.32	0.4	7.5×10⁴²	0.1×10⁴²	B
NGC_2222	22.97	0.1	<2.3×10⁴¹	…	A
NGC_3333	<23.75	…	4.1×10⁴²	0.6×10⁴²	A
NGC_4444	21.88	1.2	<7.6×10⁴²	…	B
NGC_5555	22.55	0.3	9.1×10⁴¹	1.9×10⁴¹	B

Available Statistical Methodology

Statistics has tools for treating parts of this problem, but not the combination shown above. While treatments of homoscedastic measurement errors in least squares regression is a major field (Buonaccorsi 2010, Carroll et al. 2006), little methodology is available for heteroscedasticity when the variances are known. A summer of similar solutions have been proposed in the astronomical literature, but the most promising is the likelihood based approach to regression by Kelly (2007). Some research is available for univariate density estimation; that is, the distribution of a column in the above dataset accounting for the measurement errors but not the nondetections (Delaigle & Meister 2008, Staudenmayer et al. 2008, Delaigle et al. 2009, Apanasovich et al. 2009). Survival analysis is a well-established field for treating censored data, but the censoring times are assumed to be precisely known (as it is in biological and industrial failure time contexts) rather than based on a probabilistic relationship to the known noise characteristics. One early attempt to unify measurement errors and nondectetions was evaluated as related to Fisher’s flawed `fiducial distribution’ approach to statistics (Marshall 1992). The estimation of nondetections in the context of Poisson data processes has been widely discussed in physics and astronomy (Cowan 2007, Kashyap et al. 2011).

Desiderata

A full scope suite of statistical methods that treat heteroscedastic measurement errors and nondetections in a self consistent and integrated fashion. It might be based on the likelihood written by Kelly (2007). Tools needed include:

Univariate and multivariate density estimation (data smoothing)
Univariate and multivariate two-sample tests
Bivariate correlation coefficients
Bivariate and multivariate linear and nonlinear regression
Principal components analysis
Unsupervised multivariate hierarchical clustering
Supervised multivariate classification

References

Apanasovich, T. V., Carroll, R. J. & Maity, A. (2009) Density estimation in the presence of heteroscedastic measurement error, Electronic J. Statist. , 3, 318-348

Buonaccorsi, J. P. (2010) Measurement Error: Models, Methods, and Applications , Chapman & Hall

Carroll, R. J., Ruppert, D., Stefanski, L. A. & Crainiceanu, C. M. (2006) Measurement Error in Nonlinear Models: A Modern Perspective , 2nd. ed., Chapman & Hall

Cowan, G, (2007) The small-N problem in high energy physics, in Statistical Challenges in Modern Astronomy IV, ASP Conf 371, 75

Delaigle, A. & Meister, A. (2008) Density estimation with heteroscedastic error, Bernoulli, 14, 562-579

Delaigle, A., Fan, J. & Carroll, R. J. (2009) A design-adaptive local polynomial estimator for the errors-in-variables problem, J. Amer. Stat. Assoc. , 104, 348-359

Kelly, B. C. (2007) Some aspects of measurement error in linear regression of astronomical data, Astrophys. J. , 665, 1489-1506

Marshall, H. (1992) Detecting and measuring sources at the noise limit, in Statistical Challenges in Modern Astronomy, 247, Springer (with commentary by L. Gleser)

Staudenmayer, J., Ruppert, D. & Buonaccorsi, J. P. (2008) Density estimation in the presence of heteroscedastic measurement error, J. Amer. Stat. Assoc. , 103, 726-736

Comments on this article should be made on the ASAIP Astrostatistics Discussion Forum