Lee Jimin, Dept. Information System 18 Hanyang Univ. syjmlove@hanyang.ac.kr
Kim Joonhee, Dept. Information System 18 Hanyang Univ. pjoonheeq@hanyang.ac.kr
Oh Yunseok, Dept. Information System 17 Hanyang Univ. grade854@hanyang.ac.kr
Using historical values of time series data, learning data is created to predict values just after one step.The error vector is calculated using a multivariate Gaussian distribution.If the error between the predicted value and the actual value is located at the end of the multivariate Gaussian distribution, it is generally considered to be above the predictable range of values.
Extreme outliers values must be removed to be good for the model.
Novelties are good for the model only if left.
Replace with true value
Replace with interpolation
elimination
Premise: Most of the data is true, and only a fraction of the data
Find and compare different sources of the same data
Compare each data to find other data.
Same data classified as normal
Categorize 'weirder' data among different data as outliers
Learn an outlier navigation model with labeled classification results
Leverage your data's own characteristics to discover
Finding out outliers on the premise that most data is true
Analyze your data's own characteristics
Categorize data 'weirder' than specific criteria as outliers
💡 Supervised learning : Normal data + Abnormal data + correct answers exist
💡 Unsupervised learning : Normal data + correct answer does not exist-> Learn features by yourself
Algorithms such as CNN exist in supervised learning, and algorithms such as Autoencoder and GAN exist in unsupervised learning. Although supervised learning has higher accuracy of the learned model than unsupervised learning, unsupervised learning field is being relatively actively studied due to the following disadvantages.
- It is difficult to obtain an abnormal sample.
- If a new abnormal pattern occurs, a new learning process should be conducted.
Autoencoder
Autoencoder consists of encoder and decoder
The encoder extracts important information (Compressed Feature Vector) from the input data.
In this process, data in a compressed form is obtained more than input data.
The decoder generates a form similar to the input data with important information.
The autoencoder is an algorithm that has a function of reconstructing (restoring) input data. In other words, when a feature of normal data, which is input data, is learned and data is embedded in the learned model, the difference between the reconstructed result and the learned normal feature is compared to determine whether it is abnormal. If the encoder is good at extracting important information, the decoder can generate almost the same input data. This will make abnormal data equally difficult for the decoder to generate.
We decided to practice an algorithm called LSTM Autoencoder among unsupervised learning techniques.