Representation Learning: A Review and New Perspectives
Yoshua Bengio†, Aaron Courville, and Pascal Vincent†
Department of computer science and operations research, U. Montreal
† also, Canadian Institute for Advanced Research (CIFAR)
An AI must fundamentally understand the world around us, and we argue that this can only be achieved if it can learn to identify and disentangle the underlying explanatory factors hidden in the observed milieu of low-level sensory data. Less human ingenuity involved while extracting feature from raw data.
Representation learning has had a great impact in Speech Recognition both in academic and industrial labs. Microsoft released there software based on deep learning.
Also in Object recognition(MNIST digit classification problem sota by covnet model) and NLP(word embdding, where learning distributed represetation for each word).
And for Multi-task tranfer learning and domain adaptation, representation learning model showed great result, which means that the strengths of representation model has been confirmed empiricaly.
then
But, we have to use the data to find the function, not by smooth interpolation that generalizes the outcome.
Parameters understands huge number of configurations not only local generalization.
More abstract concepts are in turns of less abstract ones. This leds to re-use or extracted feature and invariance of most local changes that covers more varied phenomena.
Hence representations that are useful for P(X) tend to be useful when learning P(Y |X), allowing sharing of statistical strength between the unsupervised and supervised learning tasks.
With many Y(target) ’s of interest or many learning tasks in general, tasks (e.g., the corresponding P(Y |X,task)) are explained by factors that are shared with
other tasks,
Probability mass concentrates near regions that have a much smaller dimensionality than the original space where the data lives.
Small changes still preserve the data's category. Categorical variables are associated with separate manifold.
Different factors change at different temporal and spatial scales, and many categorical concepts of interest change slowly.
Most of the extracted features are insensitive to small variations of given observation x
In good high-level representations, the factors are related to each other through simple, typically linear dependencies.