paper link : https://arxiv.org/abs/1607.01152

When sufficient labeled data are available, classical criteria based on Receiver Operating Characteristic (ROC) or Precision-Recall (PR) curves can be used to compare the performance of un-supervised anomaly detection algorithms. However , in many situations, few or no data are labeled. This calls for alternative criteria one can compute on non-labeled data. In this paper, two criteria that do not require labels are empirically shown to discriminate accurately (w.r.t. ROC or PR based criteria) between algorithms. These criteria are based on existing Excess-Mass (EM) and Mass-Volume (MV) curves, which generally cannot be well estimated in large dimension. A methodology based on feature sub-sampling and aggregating is also described and tested, extending the use of these criteria to high-dimensional datasets and solving major drawbacks inherent to standard EM and MV curves.
When sufficient labeled data are available,
-> 라벨 데이터가 충분히 이용가능할 때,
classical criteria based on Receiver Operating Characteristic (ROC) or Precision-Recall (PR) curves can be used to compare the performance of un-supervised anomaly detection algorithms.
-> ROC나 PR에 기반을 둔 전통적인 평가방법은 비지도학습 이상치 탐지의 성능을 비교하는데 사용할 수 있다.
However, in many situations, few or no data are labeled.
-> 그러나, 많은 상황에서, 데이터는 거의 라벨되어있지 않거나 없다.
This calls for alternative criteria one can compute on non-labeled data.
-> 대안적인 평가 방법으로 불리는 이것은 라벨링 되지 않은 데이터에서 계산할 수 있다.
In this paper, two criteria that do not require labels are empirically shown to discriminate accurately (w.r.t. ROC or PR based criteria) between algorithms.
-> 이 논문에서, 라벨이 필요없는 두 개의 평가 방법은 경험적으로 ROC나 PR 기반의 평가방법과 정확하게 차이를 보여왔다.
These criteria are based on existing Excess-Mass (EM) and Mass-Volume (MV) curves, which generally cannot be well estimated in large dimension.
-> 이 평가방법들은 EM과 MV 커브의 존재 위에 기반한다. 그것은 일반적으로 큰 차원에서 잘 평가될 수 없다.
A methodology based on feature sub-sampling and aggregating is also described and tested, extending the use of these criteria to high-dimensional datasets and solving major drawbacks inherent to standard EM and MV curves.
-> 피쳐 서브 샘플링과 집계 기반의 방법은 또한 묘사되고 테스트된다. 고차원의 데이터셋으로 이 평가들의 사용을 확장하면서 그리고 표준적인 EM과 MV 커브 에서 이어진 주요한 단점을 해결하면서