estimating class tallies across a population

 Posted by Broos, Patrick at December 08. 2012

Hello ASAIP Experts,

Thanks very much for donating your time to help your colleagues.

 

My questions live in the realm of Bayesian classification.  Suppose you have a sample of objects that have some measured properties, and you wish to evaluate the probability that each object belongs to each of four classes.  You’ve built a Bayesian classifier to do this—you’ve estimated four prior class probabilities for the sample as a whole, and you’ve built four class likelihood functions (PDFs for the measured properties, conditioned on each class).  Your turn the Bayesian crank and calculate four posterior class probabilities for each object in the sample.

 

Now, suppose you’re not actually interested in the true classification of individual objects, but are instead interested in the fraction of objects in your sample that belong to each class.  The intuitive approach is to simply sum (across the sample of objects) the posterior probabilities for class 1, sum the posteriors for class 2, etc.  One might informally refer to those four sums as the “estimated number of objects” of a given class that lie in the sample.  Those four sums will equal the number of objects in the sample.

 

My first question may seem simple-minded—I don’t know what terminology a statistician would use to refer to such a sum of posteriors.  What is that beast called?

My second question is whether this intuitive calculation is “legitimate”, i.e. backed by some sort of theory.  Is there any alternative method for estimating class tallies in a sample of objects?

 

 

In case you’re interested in **why** I’m asking these questions, I’ll now explain what we’re actually trying to do.  We’re trying to evaluate how well our classification results agree with our class priors.  Those priors consist of an estimate (from simulations) of the fraction of each class that is expected in our sample.  We’d be rather happy if the classification results were consistent with those predictions.  Tallying the class assignments we make to our objects is not very helpful, because such tallies depend on the particular “decision rule” we use to interpret the class posteriors, and because our decision rule produces a 5th outcome, “not classified”, when the posteriors are ambiguous.

Thanks again,

Patrick Broos, Penn State

 Posted by Loredo, Thomas at January 04. 2013

Hi Patrick,

 

It sounds to me like your intuition is essentially approximating a multilevel (aka hierarchical) Bayesian model, where the class prior probabilities, usually considered given in basic Bayesian classification, are promoted to “hyperparamters,” assigned priors themselves, and considered to be uncertain in subsequent calculations.  For determining class memberships, you would marginalize over these hyperparameters, producing membership probabilities that (1) estimate the population-level probabilities for membership from the distributions of likelihood ratios, and (2) take into account uncertainty in the population-level probabilities when estimating membership probabilities for each classified object (as an approximation, an “Empirical Bayes” approach would just plug in an estimate of the hyperparameters, rather than marginalize over them).  Alternatively, if the population properties are of interest, you would focus on the hyperparameters, marginalizing over the class assignments of the objects.

 

Here’s a short paper with a simple calculation showing how this can be done in a binary classification setting:

 

Commentary on Bayesian coincidence assessment (cross-matching)

http://adsabs.harvard.edu/abs/2012arXiv1206.4278L

 

Fig. 1 from the following paper (the fig. was omitted from the published version due to page constraints) gives an illustration of this kind of calculation in action in an even simpler setting:

 

On the future of astrostatistics: statistical foundations and statistical practice

http://adsabs.harvard.edu/abs/2012arXiv1208.3035L

 

You don’t end up summing the posterior probabilities calculated from a given set of prior probabilities; if you want to estimate the population-level probabilities, you don’t know what they are in the first place.  Instead what the 1st paper shows is that you end up estimating the population-level probabilities in such a way that sums something like what you describe end up being self-consistent.  It’s probably easier to understand from the equations than from this explanation!

When I first had to think about this kind of problem (back in the 90s) I made a similar intuitive “calculation” to yours, and then came up with this type of multilevel model (MLM) after the fact to justify it.  I, too, suspected this must be a well-known result with some standard name or nomenclature associated with it.  But I haven’t been able to find it discussed in the literature on Bayesian classification or mixture models (where the same type of calculation appears).  I’ve asked some statistician and machine learning colleagues about it, and although they considered the result fairly obvious, none of them could point me to a place in the information sciences literature where it is discussed.  I recently found a fairly old paper that, if I recall correctly, was on maximum likelihood estimation of mixture models, where equations essentially equivalent to what I present in the 1st paper above are derived, but there was no significant discussion or interpretation of the results.  I can’t find it at the moment, but if I turn it up, I’ll post a reference.

 

I hope this helps,

Tom Loredo