[Paper Review] Learning from positive and unlabeled data: a survey Jessa

Hanseok Jo·2021년 8월 2일
0

Paper Review

목록 보기
3/5

Bekker, Jessa, and Jesse Davis. "Learning from positive and unlabeled data: A survey." Machine Learning 109.4 (2020): 719-760.

작성중인 포스트입니다.

PU learning methods

Two-step techniques

MethodStep 1Step 2Step 3
S-EM Liu et al. (2002)SpyEM NBΔE\Delta E
Roc-SVM Li and Liu (2003)RocchioIterative SVMFNR > 5%
Roc-Clu-SVM Li and Liu (2003)RocchioIterative SVMFNR > 5%
PEBL Yu et al. (2002); Yu et al. (2004)1-DNFIterative SVMLast
A-EM Li and Liu (2005)Augmented NegativesEM NBΔF\Delta F
LGN Li et al. (2007)Single NegativeBN/
PE_PUC Yu and Li (2007)PE(EM) NBUnspecified
WVC/PSOC Peng et al. (2007)1-DNFIterative SVMVote
CR-SVM Li et al. (2010)RocchioSVM/
MCLS Chaudhari and Shevade (2012)k-meansIterative LS-SVMLast
C-CRNE Liu and Peng (2014)C-CRNETFIPNDF/
Pulce Ienco and Pensa (2016)DILCADILCA-KNN/
PGPU HE et al. (2018)PGPUBiased SVM/
  • Assumption: separability and smoothness
    - P는 labeled example과 유사하고, N은 labeled example과 많이 다를 것

위 가정을 바탕으로 다음과 같은 방식으로 PU learning을 진행함

  1. Identify reliable negative examples. Optionally, additional positive examples can also be generated.
  2. Use (semi-)Supervised learning techniques with the positively labeled exampes, reliable negatives, and, optionally, the remaining unlabeled examples.
  3. (When applicable) Select the best classifier generated in step 2.

Step 1: Identifying Reliable Negatives (and Positives)

positive example과 negative example을 잘 구분하는 distance를 새롭게 정의하는 연구가 많음. (특히 text classification problem에서). 이 중 내 맘대로 몇개를 소개함

PGPU

probabilistic gap assumption을 이용하여, probabilistic gap이 observed P보다 크거나 같으면 P로 판단, 작으면 N으로 판단하여 labeling

k-means

k-means clustering을 통해 positive examples과 가장 멀리 있는 cluster를 negative samples로 판단함

GPU

labeled set of positives를 이용하여 generative model을 만들고(=positive distribution을 학습하고), generative model이 만들어낸 example 중 probability가 낮은 example을 reliable negative로 판단함

Step 2: (Semi-)Superivsed Learning

Support vector machine (SVM), Naive Bayes (NB), Expectation Maximization on top of Naive Bayes (EM NB) 등 기법을 통해서 classifier를 학습함

Iterative SVM

매 iteration 마다 reliable negative를 추가해가면서 학습해가는 방법

Biased learning

  • Biased learning은 unlabeled example을 class label noise가 있는 negative example로 취급하는 방법.
  • Assumption: SCAR (Selected completely at random)
  • misclassified postive example에 penalty를 주거나 hparams를 튜닝하는 식으로 noise를 고려함

Classification

SVM based

  • biased SVM: a standard SVM method that penalizes misclassified positive and negative examples differently (Liu et al. 2003)
  • biased SVM + extra penalty on misclassified unlabeled exampels (Ke et al. 2012)
  • Weighted unlabeled samples SVM (WUS-SVM) (Liu et al. 2005)

The noisiness of the negative data makes the learning harder

  • bagging techniques or least-square SVMs (LS-SVM) (Suykens and Vandewalle 1995)
  • Bagging SVM: positive examples and the subset of the negative examples are trained (Mordelet and Vert 2014)
  • Robust Ensemble SVM (RESVM): bagging SVM + resampling the positive examples and using a bootstrap approach (Claesen et al. 2015d)
  • Biased least squares SVM (BLSSVM): biased version of LS-SVM (Ke et al. 2017)
  • MD-BLSSVM: using the Mahalanobis distance instead of the Euclidean distance (Ke et al. 2018)

    Note: bagging SVM, RESVM은 지금 시도하고 있는 연구에 차용할만한 아이디어가 될 듯. must read

RankSVM based

Minimizing a regularized margin-based pairwise loss

  • Biased Twin SVMs (Xu et al. 2014)
  • nonparallel SVM (NPSVM) (Zhang et al. 2014)
  • Laplacian Unit-Hyperplane classifier (LUHC) (Shao et al. 2015)

Weighted logistic regression

  • larger weights on correctly classified positive examples (Lee and Lu 2003)

Clustering

  • Topic-Sensitive pLSA (probabilistic latent semantic analysis) (Zhou et al. 2010)

Matrix completion

  • Binary matrix completion can also be seen as a PU learning problem: the ones in the matrix are the known positives and the zeros are unlabeled (Hsieh et al. 2015)
    • Assumption: complete binary matrix를 generate하는 probability matrix가 존재함

Non-deterministic setting

The complete binary matrix was generated by sampling from the probability matrix.

  • Shifted Matrix Completion (ShiftMC): Minimizing an unbiased estimator for the mean square error loss.

Deterministic setting

The complete binary matrix was generated by thresholding the probability matrix. The observed matrix is generated by uniform samping from the complete binary matrix.

  • Biased Matrix Completion (BiasMC): penalizing misclassified positives more than misclassified negatives.
  • BiasMC for graphs (Natarajan et al. 2015): additional information that neighbors are likely similar

Incorporation of the class prior

Postprocessing

Preprocessing

Method modification

Relational approaches

Other methods

Comparison of PU learning methods

profile
AITRICS에서 ML, CS, Statistics를 이용해서 Drug discovery를 하고 있습니다.

11개의 댓글

The CVS Health survey is an excellent survey program from CVS Corporation to gather customer feedback and improve their products, services, and overall customer experience at CVS stores. The survey helps CVS identify locations where they can make improvements and make changes for betterment that will benefit their customers.
Here is the official survey portal https://www.cvshealthsurvey.page/ for the CVS health survey, especially for USA customers. In return, lucky customers will have a chance to win $1000 from CVS management.

답글 달기
comment-user-thumbnail
2024년 1월 22일

Come check out the website at https://cvhealthsurvey.us/ within seven days after receiving the receipt. Please fill in the 17-digit Survey ID located in the center of your receipt.

답글 달기
comment-user-thumbnail
2024년 2월 5일

Customers of CVS Pharmacy are given the chance to provide feedback on their encounters with any of its locations. Customers can provide their honest opinions on the official survey website, the CVS Health Survey. https://cvhealthsurveyy.info/
Your feedback on CVS Pharmacy locations is very important to the business. CVS Health Corporation can raise the caliber of its goods and services in response to your input.

답글 달기
comment-user-thumbnail
2024년 2월 7일

If you ask the right questions, they will be better able to understand your needs and wants. Their personnel deficit won't prevent them from providing personalized attention to each customer's specific demands. Considering the outstanding caliber of the service, a few remarks are only fitting. https://homedpotcomsurveyswin.tech/

답글 달기
comment-user-thumbnail
2024년 6월 10일

In essence, it's a customer satisfaction survey designed to gather information on how satisfied customers are with Dollar General Store services. In-depth customer satisfaction survey was created to find out how satisfied consumers . https://dg-customerfirst.store/

답글 달기
comment-user-thumbnail
2024년 6월 11일

Customer satisfaction surveys are more than just a test; they're also an inventive method to get input from devoted customers and provide them something in return for their time. In recent years, the number of fast-food franchises has skyrocketed in both the United States and Mexico. https://opinion-deltaco.online/

답글 달기
comment-user-thumbnail
2024년 6월 12일

The poll is focused on my own KFC encounter. This survey is called "Consumer Loyalty." Facilitating the gathering of significant client input and conjectures is the aim of the survey. This information is utilized to improve customer involvement and future offerings. https://mykfcexperience.online/

답글 달기
comment-user-thumbnail
2024년 6월 14일

If you work for Publix already, you want to become acquainted with the advantages of the Publix Oasis Login system. You'll have access to a plethora of tools intended to give you a head start as soon as you register. https://publix-passport.co/

답글 달기
comment-user-thumbnail
2024년 7월 12일

This week's Winn Dixie Sale Ad and the Winn Dixie Ad (Winn Dixie Ad Preview) for next week! To view every page in the Winn Dixie Weekly Flyer, use the left and right arrows.Ahead of time, schedule your shopping and prepare your coupons for the early Winn Dixie weekly circular! Make sure you are seeing all of the most recent Winn Dixie weekly discounts by frequently checking back.

https://winndixieweeklyad.shop/

답글 달기
comment-user-thumbnail
2024년 10월 26일

Use the same phone number you used to apply for the Credit One credit card, 1-877-825-3242, to activate your credit card. Following that, you will need to give the customer executive agent some personal information. Your credit card will then be activated once the information you have provided has been verified.

Don't use a different mobile number; instead, utilise the same one you provided on your application.

https://crreditonebenkactivate.autos/

답글 달기

Reduce outsourcing, risk, and cheque fees; increase employee engagement in direct deposit; and save payroll expenses. If your employees are dispersed geographically, are contract workers, hourly workers, or unbanked, the Skylight® ONE Card may help you save time and money. Payroll payments should be processed online, and Skylight ONE Cards should be distributed on-site. Additionally, employees benefit from free online account management tools.
https://skylightpeycard.info/

답글 달기