<Feature Selection: A Data Perspective>

남연하·2021년 3월 22일

Feature Selection의 종류에 대한 논문

Data mining : 여러 경우에서 판단 및 결정을 위해 데이터로부터 사물, 사건 등의 관계를 탐색, 모형화 하는 기법이다.

문제점 : Data of high dimensionality 메모리 용량 requirements를 증가시키고, data analytics 하는데 비용을 증가시킨다.

해결방안 : Dimensionality reduction 1. Feature extraction 2. Feature selection
1. Feature extraction : 원래의 고차원 features 새로운 저차원 feature space으로 만든다.
2. Feature selection : 주어진 dataset에 대해 더 나은 model을 만드는 subset of feature을 찾기 위한 과정

Feature Selection Algorithm with label perspective (label info가 selection phase동안 포함 되어 있는지의 여부에 따라)

2-1 supervised : for classification or regression problems/ feature 관련성은 class label이나 regression target과의 연관 관계로 평가된다.
Data를 training set, testing set으로 나눈 뒤, classifier나 regression model은 supervised feature selection으로 선택된 subset of features에 기반해 train된다. Train된 classifier나 regression model이 selected된 feature로 test set에서 안 보였던 sample의 class label이나 regression target을 예측한다.
2-2 unsupervised : for clustering problems / label information없이 feature 연관성을 규정할 다른 기준을 찾는다. Feature selection에서 가능한 모든 경우를 사용한다.

Feature selection algorithm with selection strategy perspective (features가 어떻게 selected되는지에 따라)

2-1 Wrapper methods : 가장 이상적인 feature의 조합을 찾는 방식이다. Feature set이 첫번째로 subset of features을 만들어내면, 어떤 기준이 만족되기 전까지 learning algorithm이 learning performance에 의거한 feature의 quality에 대해 평가한다.
2-2 Filter methods: learning algorithm이 이용되지 않는다. 그래서 selected된 feature은 target learning algorithm에 적합하지 않는다. 첫번째로 어떤 기준에 따라 feature importance 순위가 매겨지면 낮은 순위에 feature들은 filter out된다.
2-3 Embedded method : filter와 wrapper method의 tradeoff이다. Feature selection을 model learning에 끼워 넣은 것이다. Learning algorithm과 interact하지만 wrapper methods처럼 반복적으로 feature set을 평가할 필요가 없다.

Feature selection algorithm을 적용할 때의 major concerns

streaming data와 feature들이 real-word app에서 만연해지고 있는데 traditional feature selection algorithm을 적용하기 힘들다.
feature selection의 대부분 algorithm은 single data source로 task들을 다루는 것으로 design되어 있고, data는 항상 독립적이고, 잘 분포되어 있다고 추측한다. 그러나 data는 많은 app들의 다양한 sources로부터 올 수 있다.
feature selection을 할 때, feature selection이 고려되지 않는다면 intrinsic dependency는 포착되지 않고 selected feature는 target application에 적합하지 않게 된다.

Feature selection algorithm with conventional data

Similarity-based method : data similarity를 보존하는 능력으로 feature importance를 평가하는 방법
1-1 Laplacian score :Unsupervised feature selection algorithm/data manifold structure을 가장 잘 보존할 수 있는 feature들을 고른다.
1-2 SPEC : Supervised, unsupervised feature selection algorithm/Laplacian score의 연장선, data manifold structure와 일치하는 feature는 서로 근처에 있는 instances에게 비슷한 값을 할당해야 한다.
1-3 Fisher score : supervised feature selection algorithm/같은 class에 있는 sample들의 feature value는 비슷, 다른 class에 있는 sample들의 feature values는 서로 다름 이 경우와 같은 feature을 선택한다.
1-4 Trace Ratio Criterion : trace ratio norm에 의해 계산된 corresponding score에 의거한 global optimal feature subset을 선택한다.
1-5 ReliefF : 특별한 형식의 data similarity matrix를 보존하는 feature을 고르는 것
Info-theoretical based methods
Feature의 중요성을 측정하기 위해 다른 heuristic filter을 이용한다. 대부분 info-theoretical concepts는 discrete variable에만 적용 가능하다. 그러므로 feature selection algorithm은 discrete data만 연구할 수 있다. 이 method의 목적은 optimal feature들을 찾는 것이다.
2-1 MIM : class label과의 관계를 통해 feature의 importance를 측정한다. Feature가 class label과 강한 상관관계를 갖고 있을 때 좋은 classification performance를 갖는다. 모든 feature에 대해 MIM feature scores을 얻은 뒤, 높은 feature score을 갖고 있는 features을 고른 뒤 selected feature set에 추가한다. 원하는 숫자의 selected feature가 될 때까지 process를 반복한다.
2-2 MIFS : MIM의 한계점은 feature들이 서로 독립적이라 가정되어 있는 것이다. Feature들 사이에 상관관계는 최소화 되어야 한다. MIFS는 selection 할 때 feature relevance와 redundancy를 고려한다.
2-3 MRMR : selected features가 늘어날수록 feature redundancy의 효과는 점차 감소한다.
2-4 CIFE : feature redundancy를 감소시키기 위하여 selected되지 않은 feature와 class label이 주어진 이미 selected된 feature사이의 conditional redundancy는 최대화 되어야 한다.
2-5 JMI : class label이 주어진 selected feature와 unselected feature 사이에 공유된 보완적인 정보를 증가시키기 위해 제안되었다. JMI의 기본적인 idea는 class label이 주어진 현존하는 feature에 보완적인 feature들을 새로운 feature들을 포함해야한다.
2-6 CMIM : 지금까지 selected features가 있는 class label로 mutual information을 최대화하는 features을 반복적으로 선택한다.
2-7 IF (Informative Fragments)
2-8 Interaction capping
2-9 DISR : mutual info를 normalize 하기 위해 normalization technique를 사용한다.
2-10 FCBF : feature-class correlation과 feature-feature correlation을 동시에 이용하는 예시이다.
Sparse-Learning-Based method
Some sparse regularization term과 함께 fitting error를 최소화하는 것을 목표로 한다. Sparse regularization은 많은 feature coefficient를 작게, 거의 0이 되도록 만들고, 그런 다음 corresponding feature을 간단히 제거할 수 있다.
3-1 REFS : 각각의 instance가 하나의 class label을 갖고있는 multi-class classification problems를 위해 design된 방식
3-2 MCFS : feature selection process를 안내하는 class label없이, MCFS는 spectral analysis을 이용하여 다른 feature들 사이의 correlation을 측정하는 데이터의 multi-cluster 구조를 포함 할 수 있는 feature들을 고르기를 제안한다.
Statistical-Based method
Feature relevance를 평가하는데 learning algorithm대신에 다양한 statistical measure들을 이용한다. 따라서 대부분이 filter-based method이다. 또, 대부분의 statistical-based method는 feature들을 개별적으로 분석하기 때문에, feature redundancy가 selection phase동안 무시된다.
4-1 Low Variance : Low Variance는 predefined threshold 보다 낮은 variance의 feature들을 제거하는 방식이다.
4-2 T-Score : T-score은 binary classification problem에 사용된다. T-score가 높을수록, Feature의 중요도는 높아진다.
4-3 Chi-Square Score : feature이 class label으로부터 독립적인지를 평가하기위해 test of independence를 이용한다. Chi-Square이 높을수록, feature의 중요도는 높아진다.
4-4 Gini Index : feature가 다른 classes로부터 instance들을 분리할 수 있는지 여부를 quantify하기 위해 널리 사용되는 statistical measure이다.
4-5 CFS : feature subset의 가치를 평가하기 위해 correlation-based heuristic을 사용한다.
Other methods
5-1 hybrid method : set of different feature selection results로 구성한다. 그 후, 서로 다른 결과를 종합하여 일치된 결과를 도출한다.
5-2 Reconstruction based Feature Selection : selected feature로 data의 reconstruction error를최소화 시킨다. Reconstruction function은 linear와 nonlinear 둘 다 가능하다.
with structured feature
앞에서 소개된 by conventional data은 inherent feature structure들을 무시하고, features이 서로 독립적이라고 가정한 방법들이다. 그러나 실제로 많은 features는 다양한 종류의 구조들로 보일 수 있다. 그러므로 feature structure을 고려 해야 한다.
Feature selection with group feature structures
1-1 Group Lasso : Group을 전체적으로 선택하거나 선택하지 않는다.
1-2 Sparse Group Lasso : 특정 app의 경우, 선택된 group에서 representative features을 select하는 것이 바람직하다. Group selection과 feature selection을 동시에 수행한다.
1-3 Overlapping Sparse Group Lasso : 위의 방법들은 features 사이에 분리된 group structures를 고려한다. 그러나 group들은 서로 overlap 될 수 있다.
Feature selection with tree feature structures
Features는 tree structure(hierarchical)로 보일 수 있다.
2-1 Tree guided Group Lasso : Leaf nodes는 individual features이고, Internal nodes는 a group of features이다. Each internal node의 가중치는 subtree 안에서 feature들이 얼마나 밀접하게 관련되어 있는지를 나타낸다.
Feature selection with graph feature structures
많은 경우에서 features이 강력한 pairwise interaction을 갖고 있다.
3-1 Graph Lasso : Graph Lasso는 함께 연결된 feature는 비슷한 feature coefficients을 가지고 있다고 가정한다.
3-2 GFLasso : features은 negatively correlated되어 있을 수도 있다. GFLasso는 positive와 negative feature correlations 모두를 고려한다.
3-3 GOSCAR
with heterogeneous feature
Traditional feature selection algorithm은 data가 독립적이라는 가정에 기반을 두고 있다. 그러나 big data시대에서 heterogeneous data는 점점 만연해지고 있다.
Feature selection for linked data
Linked data는 Twitter, biological system, Facebook 같은 real-world application에서 아주 흔하다.
1-1 Feature selection on Networks : Content information과 class labels사이에 관계를 포착하기 위해 linear classifier을 사용하는 것을 제안하고, link information을 포함한다.
1-2 Feature selection for social Media Data : 다양한 type의 social relation을 갖고 있는 social media data에 대한 feature selection을 조사한다.
1-3 LUFS : linked data를 위한 unsupervised feature selection framework 이다. LUFS는 network structure modeling과 feature selection을 따로 따로 수행한다.
1-4 Robust Unsupervised Feature Selection for Networked Data
Multi-source feature selection
Multi view feature selection
Relations을 이용하여 서로 다른 feature spaces에서 features을 동시에 고르는 것을 목표로 한다.
with streaming feature
이전까지의 방법들은 모든 data instances와 features이 사전에 이미 알려졌다는 가정에 기반을 두었다. 그러나 많은 real-world app안에서는 data stream과 feature stream과 직면하는 경우가 더 많다. 최악의 경우에는 data나 feature의 크기가 안 알려지거나 무한대이다. 그러므로, 모든 data instances나 feature들이 feature selection에 사용할 수 있을 때까지 기다리는 것을 practical하지 않다.
Feature selection Algorithms with Feature Streams
1-1 Grafting Algorithm : Lasso와 같이, norm regularization에 따라 feature weight vector에 의해 parameterized된 다양한 model을 다룰 수 있는 기법이다.
1-2 Alpha-Investing Algorithm : 새로운 feature을 수용하는데 필수적인 error reduction의 threshold를 dynamically 변경하는 complexity penalty 방법이다.
1-3 OSFS : (1) – online relevance analysis (2) – inline redundancy analysis 이 2단계를 통해 non-redundant하고 밀접하게 연관된 features을 포착하는 방식이다.
1-4 USFS : social media의 많은 unlabeled data를 다루기 위하여 제안된 방식이다. Link info와 같은 source info를 이용한다.
Feature selection algorithm with data stream
2-1 OFS : binary classification을 위한 online feature selection algorithm이다.
2-2 FSDS: unlabeled data가 지속적으로 발생되고 있을 때, subset of relevant feature을 고르기 위해 제시되었다.

Feature weighting: 각각의 feature을 ranking을 매긴다
Feature subset selection: features가 select 되었는지 안 되었는지 만 보여준다.
Scalability : data size의 증가로, most feature selection algorithms의 확장성이 위태로워졌다.
TB scale의 data는 memory에 쉽게 load될 수 없으면 FS algorithm의 사용을 제한한다. 많은 경우에서, one pass of data만 바람직하고 the second or more pass는 비현실적이다.
Stability : training data의 작은 변화에 대한 feature selection algorithm의 민감도
feature selection의 stability는 domain experts이 selected feature에 자부심을 갖도록 만든다. Many feature selection algorithms은 small perturbation으로 인해 stability가 작아진다.
Model selection : feature weighting methods에서 selected feature의 number를 specify해야 한다. 그러나 “optimal” number을 찾는 것은 어렵다. 큰 number은 관련 없는 중복 features을 포함시켜 learning performance를 저해하는 위험을 증가시킨다. 작은 number some relevant features을 놓칠 것이다.
Solution: apply heuristics such as “grid search” strategy같은 heuristics을 적용한다.
결론적으로, 이 survey는 feature selection이 data dimensionality를 줄이고, data를 분석하는데 효과적이므로 성공적인 machine-learning과 data-mining에 필수성과 feature selection의 최근 발전에 관한 포괄적인 리뷰들을 제시했습니다. Feature selection의 다양한 방법에 대해서 공부 해볼 수 있어서 흥미로웠습니다.

남연하

Ewha Electric

이전 포스트

<AI enable wireless network for 5G>

다음 포스트

<Feature Selection: A Data Perspective>

<AI enable wireless network for 5G>

<A survey of network anomaly detection techniques>

0개의 댓글