Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning

You are here: Home / Submitted Papers / 2015 / Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning

Abstract

Neighborhood Components Analysis (NCA) is a popular method for learning a distance metric to be used within a k-nearest neigh- bors (kNN) classifier. A key assumption built into the model is that each point stochasti- cally selects a single neighbor, which makes the model well-justified only for kNN with k = 1. However, kNN classifiers with k > 1 are more robust and usually preferred in practice. Here we present kNCA, which gen- eralizes NCA by learning distance metrics that are appropriate for kNN with arbitrary k. The main technical contribution is show- ing how to efficiently compute and optimize the expected accuracy of a kNN classifier. We apply similar ideas in an unsupervised setting to yield kSNE and kt-SNE, generalizations of Stochastic Neighbor Embedding (SNE, t- SNE) that operate on neighborhoods of size k, which provide an axis of control over em- beddings that allow for more homogeneous and interpretable regions. Empirically, we show that kNCA often improves classification accuracy over state of the art methods, pro- duces qualitative differences in the embed- dings as k is varied, and is more robust with respect to label noise.

Author

Daniel Tarlow, Kevin Swersky, Ilya Sutskever, Richard S. Zemel

Journal

Proceedings of the 30th International Conference on Machine Learning,

Paper Publication Date

2013

Paper Type

Astroinformatics