How do measurement errors affect my data analysis?

This article is by Brandon Kelly, University of California, Santa Barbara. Comments should be made on the ASAIP Astrostatistics Forum.

 

Measurement Errors are ubiquitous in astronomy. Common sources of measurement error are the Poissonian nature of photon counts, instrumental noise, and calibration. Other situations where measurement errors occur are when one derives physical quantities (e.g., mass, temperature) from fitting, say, a source’s spectral energy distribution (SED). While not a measurement error problem in the traditional sense, this may still be considered a measurement error problem in that the physical quantities are ‘measured’ by fitting a model to the SED. Because the SED is contaminated by measurement error, the quantities derived from it will also exhibit an error.

Accounting for measurement error when it is the only source of variability in a data set is straightforward, and there are many well-known tools for doing so. A common example of this situation is fitting a spectrum. In this case, the only source of variability are the measurement errors due to, for example, the Poissonian nature of photon counts, instrumental noise, and calibration. However, when the measurement errors are not the only source of variability, it is not always obvious how to account for them. Moreover, if ignored they can have a significant effect on the scientific conclusions. For example, measurement errors are not the only source of variability when analyzing the distribution of a quantity or sets of quantities, such as luminosity and redshift. In this case, there exists real physical source-to-source variation that would be present even in the absence of measurement error. For studies like this, common scientific questions to be answered are:

  • What is the real physical dispersion among the quantities in my data set? For example, what is the dispersion in stellar mass values for the population of galaxies I have selected?
  • Are two quantities in my data set correlated? How does one quantity depend on another? For example, how does luminosity depend on black hole mass for the quasars in my sample?

If the quantities in the above examples were known (i.e., mass and luminosity are ‘measured’ without error), then the analysis is straightforward. However, in reality this is not the case, especially for physical quantities derived from, say, SED fits. Because of this, it is important to understand how measurement error affects the data analysis, and consequently the scientific conclusions.

When the measurement error is additive and independent of the true values, the distribution of the measured quantities is the convolution of the true distribution of these quantities with their error distribution. In this sense the error distribution acts like a ‘PSF’, distorting the ‘image’ of the true distribution. This is illustrated in the figure below. In this example, the researcher fits a SED model through least-squares independently for each source in her sample. The SED model consists of a blackbody at some temperature, modified by a power-law in frequency with power-law index beta. The free parameters of interest for each source are temperature and beta. For this illustration, the errors are not strictly additive, but the PSF analogy is still useful for understanding their effects. The true joint distribution of temperature and beta is illustrated in the left panel, while their error distribution is shown in the middle panel, centered at some arbitrary reference value. In this illustration suppose for each source there are only a small number of SED data points that are available to the research for constraining the values of temperature and beta, and as a result their uncertainties are very large. Moreover, their errors are strongly anti-correlated, due to the fact the similar model SEDs are obtained by increasing temperature when one decreases beta.

true_beta_temp.jpgbeta_temp_error.jpgestimated_beta_temp.jpg

The right panel shows the measured distribution of temperature and beta. The measurement error ‘PSF’ has completely distorted the image of the temperature-beta distribution, and even reversed the sign of the correlation. On can think of the temperature-beta distribution as being more-or-less ‘unresolved’ due to the large measurement errors.  From this illustration, we can conclude that in general measurement errors have the following effects:

  1. Measurement errors artificially broaden the distribution of the quantities of interest, making their physical dispersion appear larger than it is, and
  2. Measurement errors bias the covariance of two quantities toward the covariance in their error distribution. This can make correlations appear stronger or weaker, or even cause them to reverse sign.

Strictly speaking, these conclusions will not necessarily hold if the measurement errors are not additive and statistically independent of the true values, but in general should still provide a guide for understanding how measurement error will affect one’s scientific conclusions. It should be clear that measurement errors begin to have a significant effect on one’s data analysis once their variance becomes a non-negligible fraction of the variance of the true values of the quantities of interest.

For linear regression, there are a number of methods that have been developed to handle measurement errors (some astronomical references are Akritas & Bershady 1996, Tremaine et al. 2002, Kelly 2007). Kelly (2013) provides a review of measurement error models for astronomical audiences. Fuller (1987) is a good reference for measurement error in linear models. For nonlinear models, a good reference is Carroll et al. (2006). For more complicated models, such as when one derives (i.e., ‘measures’) physical quantities from fitting, say, an SED model, Bayesian hierarchical modeling provides a flexible and powerful way of accounting for error in the physical quantities. Gelman et al. (2004) is a good reference for Bayesian methods, including hierarchical modeling; Carroll et al. (2006) also discuss these within the context of measurement errors. A few examples of hierarchical modeling applied to astronomical problems containing measurement error can be found in Loredo (2004), Hogg et al. (2010), Mandel et al. (2011), and Kelly et al. (2012).

References:

Akritas, M., & Bershady, M., Linear Regression for Astronomical Data with Measurement Errors and Intrinsic Scatter, 1996, ApJ, 470, 706

Carroll, R.J., Ruppert, D., Stefanski, L.A., & Crainiceanu, C.M., Measurement Error in Non-linear Models: A Modern Perspective, 2nd edn. (Chapman & Hall/CRC, Boca Raton, 2006)

Fuller, W.A., Measurement Error Models (John Wily & Sons, New York, 1987)

Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B., Bayesian Data Analysis, 2nd edn. (Chapman & Hall/CRC, Boca Raton, 2004)

Hogg, D.W., Myers, A.D., & Bovy, J., Inferring the Eccentricity Distribution, 2010, ApJ, 725, 2166

Kelly, B.C., Some Aspects of Measurement Error in Linear Regression of Astronomical Data, 2007, ApJ, 665, 1489

Kelly, B.C., Measurement Error Models in Astronomy, in Statistical Challenges in Modern Astronomy V (eds: E.D. Feigelson & G.J. Babu, Springer Lecture Notes in Statistics, 2013, p. 147)

Kelly, B.C., Shetty, R., Stutz, A., Kauffmann, J., Goodman, A., & Launhardt, R., Dust Spectral Energy Distributions in the Era of Herschel and Planck: A Hierarchical Bayesian Fitting Technique, 2012, ApJ, 752, 55

Loredo, T.J., Accounting for Source Uncertainties in Analyses of Astronomical Survey Data, in Bayesian Inference and Maximum Entropy Methods in Science and Engineering: 24th International Workshop, 2004 (ed. V. Dose, et al., AIP Conference Proceedings, 735, 195)

Mandel, K., Narayan, G., & Kirshner, R.P., Type 1a Supernova Light Curve Inference: Hierarchical Models in the Optical and Near-infrared, 2011, ApJ, 731, 120

Tremaine, S. et al. The Slope of the Black Hole Mass versus Velocity Dispersion Correlation, 2002, ApJ, 574, 740