A survey of network anomaly detection techniques 정리

daeungdaeung·2021년 10월 13일
0

논문리뷰

목록 보기
2/3
  • 제목에서 보여지듯이 네트워크 상에서 이상 징후를 감지하는 기술들에 대한 survey 논문을 정리한 글입니다.

  • 읽으면서 나중에 다시 읽어볼만하거나 중요하다고 생각되는 부분만 작성했습니다.

1. Introduction

  • Computer security has become a necessity.
  • Following are the research challenges.
    • A lack of universally applicable anomaly detection technique
    • Data containing noise (which tends to be an actual anomaly)
    • A lack of publicly available labeled dataset
    • Current intrusion detection technique may not be useful in the future.

2. Preliminary discussion

2.1 Types of anomalies

  • Anomalies are referred to as patterns in data that do not conform to a well-defined characteristic of normal patterns.
  • An anomaly can be categorized in the following ways.
    • Point anomaly
      • ex) A person's normal car fuel usage is five liters per day but if it becomes fifty liters in any random day, then it is a point anomaly.
    • Contextual anomaly
      • When a data instance behaves anomalously in a particular context, it is termed a contextual anomaly.
    • Collective anomaly
      • A collection of similar data instances behave anomalously with respect to the entire dataset.
      • ex) Electro Cardiogram output, existence of low values for a long period of time...

2.2 Output of anomaly detection techniques

  • One important issue is how anomalies are represented as output.
  • Scores
  • Label/Binary

2.3 Types of network attacks

  • Denial of service (DoS)
  • Probe: It is used to gather information about a targeted network or host.
  • User to Root (U2R): aiming to gain illegal access to an administrative account
  • Remote to User (R2U): It is launched when an attacker wants to gain local access as a user of a targeted machine to have the privilege of sending packets over its network.

2.4 Mapping of network attack with anomalies

  • Identifying the relationship among the attacks and anomalies

  • DoS - collective anomalies

  • Probe - contextual anomalies

3. Classification based network anomaly detection

  • Classification-based techniques are vulnerable to new attacks.
  • It is extremely difficult to keep a normal profile up-to-date.
  • We discuss four major techniques.

3.1 Support vector machine

  • SVM is to derive a hyperplane that maximizes the separating margin between the positive and negative classes.
  • In a paper "A geometric framework for unsupervised anomaly detection", the unsupervised SVM is used to detect anomalous events.
  • Using a similar concept to that of the One-class SVM but in a supervised manner, Registry Anomaly Detection (RAD) is developed to monitor Windows registry queries.
  • In a paper "Robust Support Vector Machines for Anomaly Detection in Computer Security", an anomaly detection method which ignores noisy data is developed using the Robust SVM.
  • In practice, training data often contain noise which invalidates the main assumption of the SVM that all the sample data for training are independently and identically distributed.

3.2 Bayesian network

  • A Bayesian network is an efficient approach for modeling a domain containing uncertainty.
  • As bayesian network can be used for an event classification scheme, it is also applicable for network anomaly detection.
  • In a paper "Bayesian event classification for intrusion detection", tow major problems caused in high false positives in anomaly detection techniques are identified.
    • The first problem: Models require the anomaly detection system to aggregate their different outputs which result in high false positives.
    • The second problem: Anomaly detection systems cannot handle behaviors which are unusual but legitimate.

3.3 Neural network

  • For network anomaly detection, a neural network has been merged with other techniques, such as a statistical approach and variants of it.

  • The outlier factor is defined using the trained RNN as follows.

    • OFi=1nj=1n(xijoij)2OF_i = \frac{1}{n} \sum_{j=1}^{n}(x_{ij} - o_{ij})^2
    • x_ij: the input value

    • o_ij: the output value

  • Self-organizing Maps (SOM) are used for network anomaly detection.

  • Ranadas et al. (2003) suggested that, using SOM, network traffic can be classified in real time.

3.4 Rule-based

  • Rule-based anomaly detection techniques are widely used in supervised learning algorithms.

4. Statistical anomaly detection

  • chi-square theory

    • X2=i=1n(XiEi)2EiX^2 = \sum^{n}_{i=1} \frac{(X_i-E_i)^2}{E_i}
    • X_i: the observed value of the ith variable

    • E_i: the expected value of the ith variable

4.1 Mixture model

  • A mixture model was proposed for detecting anomalies from noisy data.

4.2 Signal processing technique

  • Using signal processing for anomaly detection has hardly been explored.
  • In Thottan and Ji (2003), a statistical signal processing technique based on an abrupt change detection is presented.
  • In Thottan and Ji (2003), management information bases are used to produce a network health function that can be used to raise alarms corresponding to anomalous networks.

4.3 Principal component analysis (PCA)

  • Shye et al. (2003) presented an easier way to analyze high dimensional network traffic dataset using PCA, PCAs are linear combinations of p random variables (A1, A2, ..., Ap) and can be characterized:

    1. uncorrelated
    2. with their variances sorted in order from high to low
    3. their total variance equal to the variance of the original data
  • A brief mathematical formulation of PCA:

    • A: data matrix (np)(np)n observations on each of p variablesS: covariance matrix (pp)If (λ1,ei),...,(λp,ep) are the p eigenvalue-eigenvector pairs, the ith principal component is as follows, where i=1,2,...,p and λ1λ2 ... λp0.yi=ei(xxˉ)=ei1(x1x1ˉ)+ ... +eip(xpxpˉ)\bold{A} \text{: data matrix } (n \cdot p) \\ (n \cdot p) \text{: } n \text{ observations on each of } p \text{ variables} \\ \bold{S} \text{: covariance matrix } (p \cdot p) \\ \text{If } (\lambda_1, e_i), ..., (\lambda_p, e_p) \text{ are the } p \text{ eigenvalue-eigenvector pairs, the } i \text{th principal component is as follows, where } i=1, 2, ..., p \text{ and } \lambda_1 \ge \lambda_2 \ge \text{ ... } \ge \lambda_p \ge 0 \text{.} \\ y_i = e_i(x-\bar{x}) = e_{i1}(x_1 - \bar{x_1}) + \text{ ... } + e_{ip}(x_p - \bar{x_p})
  • An anomaly detection technique based on PCA (Shyu et al., 2003) has the benefits of:

    • being free from any assumption of statistical distribution
    • being able to reduce the dimension of the data without losing any important information
    • having minimal computational complexity which supports real-time anomaly detection
  • PCA 설명 링크

5. Information theory

  • Information-theoretic measures can be used to create an appropriate anomaly detection model.

  • definitions of several measures:

    • Entropy is a basic concept of information theory which measures the uncertainty of a collection of data items.

      • 데이터의 불확실성을 측정한다는 건 뭘까...?

      • H(D)=xCDP(x)log1P(x)H(D) = \sum_{x \in C_D} P(x)log\frac{1}{P(x)}
      • where P(x) is the probability of x in D

    • Conditional entropy is the entropy of D given that Y is the entropy of the probability distribution P(x|y)

      • H(DY)=x,yCD,CYP(x,y)log1P(xy)H(D|Y) = \sum_{x,y \in C_D, C_Y}P(x, y)log\frac{1}{P(x|y)}
    • Information gain is a measure of the information gain of an attribute or feature A in a dataset D.

      • Gain(D,A)=H(D)vValues(A)DvDH(Dv)Gain(D, A)=H(D)-\sum_{v \in Values(A)} \frac{|D_v|}{|D|}H(D_v)

5.1 Correlation analysis

  • In Ambusaidi et al. (2014) a nonlinear correlation coefficient-based (NCC) similariry measure is suggested to extract both linear and nonlinear correlations between network traffic.
  • In Tan et al. (2014a), for DoS attack detection a system is proposed that uses multivariate correlation analysis (MCA) for accurate network traffic characterization by extracting the geometrical correlations between network traffic features.

6. Clustering-based

  • The difference between regular clustering and co-clustering is the processing of rows and columns. Regular clustering techniques such as k-means (Ahmed and Naser, 2013) clusters the data considering the rows of the dataset whereas the co-clustering considers both rows and columns of the dataset simultaneously to produce clusters (Ahmed et al. 2015d).
  • Three key assumptions (when using clustering to detect anomalies)
    1. Any subsequent new data that do not fit well with existing clusters of normal data are considered anomalies.
    2. The normal data lie close to the nearest clusters centroid but anomalies are far away from centroids.
    3. The smaller and sparser can be considered anomalies and the thicker normal.

6.1 Regular clustering

  • The approach used by Munz et al. (2007) is quite straightforward. (k-means clustering)
  • Petrovic et al. (2006) proposed a cluster-labeling strategy. The corresponding cluster in the case of a massive attack is extremely compact.
  • Joshua et al. (Old-meadow et al., 2004) proposed a solution for time-varying network intrusion detection and they also demonstrate how feature weighting can be improved in classification accuracy.

6.2 Co-clustering

  • Unlike other clustering algorithms, co-clustering defines a clustering criterion and then optimizes it.

  • It simultaneously finds the subsets of rows and columns of a data matrix using a specified criterion.

  • The benefits of co-clustering over the regular clustering are the following:

    • Simultaneous grouping of both rows and columns can provide a more compressed representation and it preserves information contained in the original data.
    • Co-clustering can be considered as a dimensionality reduction technique.
    • Significant reduction in computational complexity.
profile
개발자가 되고싶읍니다...

0개의 댓글