# An Astrostatistical Grand Challenge

**The Problem**

Astronomical data often consists of quantitative measures of some properties for a sample of celestial objects observed at a telescope. Often the sample is predefined by some previous survey, and new properties are under examination. Two problems arise:

1. Every measurement is accompanied by a carefully evaluated measurement error. It may include statistical errors due to noise in the detector and other errors introduced by the instrument and data processing. The errors are obtained by examination of source-free regions of the raw data, calibration measurements at the telescope, and simulation. In statistical parlance, the dataset has heteroscedastic measurement errors with known variances.

2. Sometimes the property of a celestial object is too faint to be detected, and the measured value is indistinguishable from the noise. The astronomer then considered this value to be undetected and typically assigns a value like (3 x noise) as an upper limit. In statistical parlance, these are called left-censored data points.

The result is a multivariate dataset like the following:

Object | Property #1 | Meas Err #1 | Property #2 | Meas Err #2 | Class |
---|---|---|---|---|---|

NGC_1111 | 23.32 | 0.4 | 7.5x10^{42} |
0.1x10^{42} |
B |

NGC_2222 | 22.97 | 0.1 | <2.3x10^{41} |
... | A |

NGC_3333 | <23.75 | ... | 4.1x10^{42} |
0.6x10^{42} |
A |

NGC_4444 | 21.88 | 1.2 | <7.6x10^{42} |
... | B |

NGC_5555 | 22.55 | 0.3 | 9.1x10^{41} |
1.9x10^{41} |
B |

** **

** **

** **

**Available Statistical Methodology**

Statistics has tools for treating parts of this problem, but not the combination shown above. While treatments of homoscedastic measurement errors in least squares regression is a major field (Buonaccorsi 2010, Carroll et al. 2006), little methodology is available for heteroscedasticity when the variances are known. A summer of similar solutions have been proposed in the astronomical literature, but the most promising is the likelihood based approach to regression by Kelly (2007). Some research is available for univariate density estimation; that is, the distribution of a column in the above dataset accounting for the measurement errors but not the nondetections (Delaigle & Meister 2008, Staudenmayer et al. 2008, Delaigle et al. 2009, Apanasovich et al. 2009). Survival analysis is a well-established field for treating censored data, but the censoring times are assumed to be precisely known (as it is in biological and industrial failure time contexts) rather than based on a probabilistic relationship to the known noise characteristics. One early attempt to unify measurement errors and nondectetions was evaluated as related to Fisher's flawed `fiducial distribution' approach to statistics (Marshall 1992). The estimation of nondetections in the context of Poisson data processes has been widely discussed in physics and astronomy (Cowan 2007, Kashyap et al. 2011).

**Desiderata**

A full scope suite of statistical methods that treat heteroscedastic measurement errors and nondetections in a self consistent and integrated fashion. It might be based on the likelihood written by Kelly (2007). Tools needed include:

- Univariate and multivariate density estimation (data smoothing)
- Univariate and multivariate two-sample tests
- Bivariate correlation coefficients
- Bivariate and multivariate linear and nonlinear regression
- Principal components analysis
- Unsupervised multivariate hierarchical clustering
- Supervised multivariate classification

* *

**References**

Apanasovich, T. V., Carroll, R. J. & Maity, A. (2009) Density estimation in the presence of heteroscedastic measurement error, Electronic J. Statist. , 3, 318-348

Buonaccorsi, J. P. (2010) Measurement Error: Models, Methods, and Applications , Chapman & Hall

Carroll, R. J., Ruppert, D., Stefanski, L. A. & Crainiceanu, C. M. (2006) Measurement Error in Nonlinear Models: A Modern Perspective , 2nd. ed., Chapman & Hall

Cowan, G, (2007) The small-N problem in high energy physics, in Statistical Challenges in Modern Astronomy IV, ASP Conf 371, 75

Delaigle, A. & Meister, A. (2008) Density estimation with heteroscedastic error, Bernoulli, 14, 562-579

Delaigle, A., Fan, J. & Carroll, R. J. (2009) A design-adaptive local polynomial estimator for the errors-in-variables problem, J. Amer. Stat. Assoc. , 104, 348-359

Kelly, B. C. (2007) Some aspects of measurement error in linear regression of astronomical data, Astrophys. J. , 665, 1489-1506

Marshall, H. (1992) Detecting and measuring sources at the noise limit, in Statistical Challenges in Modern Astronomy, 247, Springer (with commentary by L. Gleser)

Staudenmayer, J., Ruppert, D. & Buonaccorsi, J. P. (2008) Density estimation in the presence of heteroscedastic measurement error, J. Amer. Stat. Assoc. , 103, 726-736

*Comments on this article should be made on the ASAIP Astrostatistics Discussion Forum*