In the 18th century, Hume : ‘even after the observation of the frequent or constant conjunction of objects, we have no reason to draw any inference concerning any object beyond those of which we have had experience’
In the 20th, Mitchell (1980), Schaffer (1994) and Wolpert (1996) : No free lunch. Bias-free learning is futile.
"Wolpert proved NFL assuming uniform problem distributions
-for learning (1996)
-for optimization (in 1997. with Macready)"
English (2001)
for Supervised Machine Learning (statistical inference) : in a noise-free scenario where the loss function is the misclassification rate, if one is interested in off-training-set error, then there are no a priori distinctions between learning algorithms.
for Search/Optimization : this applies to finite spaces and algorithms that do not resample points.
"We show that all algorithms that search for an extremum of a cost function perform exactly the same, when averaged over all possible cost functions. In particular, if algorithm A outperforms algorithm B on some cost functions, then loosely speaking there must exist exactly as many other functions where B outperforms A. Starting from this we analyze a number of the other a priori characteristics of the search problem, like its geometry and its information-theoretic aspects. This analysis allows us to derive mathematical benchmarks for assessing a particular search algorithm's performance. We also investigate minimax aspects of the search problem, the validity of using characteristics of a partial search over a cost function to predict future behavior of the search algorithm on that cost function, and time-varying cost functions. We conclude with some discussion of the justifiability of biologically-inspired search methods."
Wolpert and Macready (1995)
Free Lunches?
- Second moments and higher of algorithms' generalization error
- coevolution (for self-play by constructing a pair of search algorithms such that one explicitly has a performance equal to or better than the other for all possible payoff functions)
- confidence intervals can give a priori distinctions between algorithms