Timeseries Anomaly Detection-(1) Introduction

Yunkun·2022년 5월 27일

Team Member

Lee Jimin, Dept. Information System 18 Hanyang Univ. syjmlove@hanyang.ac.kr
Kim Joonhee, Dept. Information System 18 Hanyang Univ. pjoonheeq@hanyang.ac.kr
Oh Yunseok, Dept. Information System 17 Hanyang Univ. grade854@hanyang.ac.kr

Link 🔗

Github

Introduction

Anomaly Detection

Anomaly detection refers to finding objects or data that show a different pattern than expected in data. In other words, it is a method of creating a model for finding data having different characteristics from existing data based on learning data.
Timeseries Anomaly Detection determines that the expected future value has a deviation from the actual obtained value. The learning steps are as follows.

Using historical values of time series data, learning data is created to predict values just after one step.The error vector is calculated using a multivariate Gaussian distribution.If the error between the predicted value and the actual value is located at the end of the multivariate Gaussian distribution, it is generally considered to be above the predictable range of values.

Distinguish Outliers

Outlier = Extreme outlier + Novelties

Extreme outliers values must be removed to be good for the model.
Novelties are good for the model only if left.
Extreme Outlier
It refers to the value of some observations having very small or large extreme values that are significantly out of the range of the entire data. This causes problems in estimating the population mean or sum of the data, and is recommended to eliminate it because it excessively increases the variance, reducing the accuracy of analysis or modeling.
Calibration method

Replace with true value
Replace with interpolation
elimination
Novelties
Outlier due to normal collection process (patterns or data not previously seen)
Novelties do not modify the original data, unlike extreme outlier values. This is because if there is another novelties later, the model will have to deal with it. For example, in the KOSPI data, the fall in stocks due to COVID-19 can be seen as a kind of novelties.

How to navigate outliers

Premise: Most of the data is true, and only a fraction of the data

Remeasurement
Due to the nature of the time series data, it is impossible to go back to the past and measure it again
Supervised Learning

Find and compare different sources of the same data
Compare each data to find other data.
Same data classified as normal
Categorize 'weirder' data among different data as outliers
Learn an outlier navigation model with labeled classification results
Unsupervised Learning

Leverage your data's own characteristics to discover
Finding out outliers on the premise that most data is true
Analyze your data's own characteristics
Categorize data 'weirder' than specific criteria as outliers

💡 Supervised learning : Normal data + Abnormal data + correct answers exist

💡 Unsupervised learning : Normal data + correct answer does not exist-> Learn features by yourself

Algorithms such as CNN exist in supervised learning, and algorithms such as Autoencoder and GAN exist in unsupervised learning. Although supervised learning has higher accuracy of the learned model than unsupervised learning, unsupervised learning field is being relatively actively studied due to the following disadvantages.
1. It is difficult to obtain an abnormal sample.
2. If a new abnormal pattern occurs, a new learning process should be conducted.
Autoencoder

Autoencoder consists of encoder and decoder
The encoder extracts important information (Compressed Feature Vector) from the input data.
In this process, data in a compressed form is obtained more than input data.
The decoder generates a form similar to the input data with important information.

The autoencoder is an algorithm that has a function of reconstructing (restoring) input data. In other words, when a feature of normal data, which is input data, is learned and data is embedded in the learned model, the difference between the reconstructed result and the learned normal feature is compared to determine whether it is abnormal. If the encoder is good at extracting important information, the decoder can generate almost the same input data. This will make abnormal data equally difficult for the decoder to generate.

We decided to practice an algorithm called LSTM Autoencoder among unsupervised learning techniques.

Yunkun

다음 포스트

Timeseries Anomaly Detection-(1) Introduction

Team Member

Link 🔗

Introduction

Anomaly Detection

Distinguish Outliers

How to navigate outliers

Timeseries Anomaly Detection-(1) Introduction

0개의 댓글