A Data-Dependent Distance for Regression
We develop a new data-dependent distance for regression problems to compare two regressors (hyperplanes that fits or divides a data set). Most distances between objects attempt to capture the intrinsic geometry of these objects and measure how similar that geometry is. However, we argue that existing measures are inappropriate for regressors to a data set. We introduce a family of new distances that measure how similarly two regressors interact with the data they attempt to fit. For the variant we advocate we show it is a metric (under mild assumptions), induces metric balls with bounded VC-dimension, it is robust to changes in the corresponding data, and can be approximated quickly. We show a simple extension to trajectories that inherits these properties, as well as several other algorithmic applications. Moreover, in order to develop efficient approximation algorithms for this distance we formalize the relationship between sensitivity and leverage scores. This may be of independent interest.
READ FULL TEXT