๐Ÿ”ฎ A Unified Approach to Iterpreting Model Predictions

ukkikkiaiยท2024๋…„ 3์›” 25์ผ

Euron ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

๋ชฉ๋ก ๋ณด๊ธฐ
1/13

ABSTRACT

๋ชจ๋ธ์ด ์™œ ํ•ด๋‹น prediction์„ ๋งŒ๋“œ๋Š”์ง€ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์ค‘์š”ํ•จ. ๊ทธ๋Ÿฌ๋‚˜ Ensemble ๋˜๋Š” ๋”ฅ๋Ÿฌ๋‹์ฒ˜๋Ÿผ ๋ณต์žก๋„๊ฐ€ ๋†’์€ ๋ชจ๋ธ์€ ํ•ด์„ํ•˜๊ธฐ๊ฐ€ ์–ด๋ ค์šด ๊ฒฝ์šฐ๊ฐ€ ๋Œ€๋‹ค์ˆ˜์ž„. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์€ SHAP(SHapely Additive exPlanations) ๋ถ„์„์„ ์ œ์•ˆํ•จ. SHAP์€ ๊ฐ feature์— ๋Œ€ํ•œ importance ๊ฐ’์„ ๋ถ€์—ฌํ•จ. ๊ธฐ์กด์— ์กด์žฌํ–ˆ๋˜ 6๊ฐ€์ง€ ๋ฐฉ๋ฒ•๋ก ์„ ํ†ต์ผํ•จ์œผ๋กœ์„œ ์ƒˆ๋กœ์šด ๋ถ€๊ฐ€์ ์ธ feature importance๋ฅผ ๊ฐ€์ง„ ํด๋ž˜์Šค๋ฅผ ํ™•์ธํ•จ. ์ง๊ด€์ ์œผ๋กœ ์™€๋‹ฟ๋Š” ํ•ด๋‹น ๋ฐฉ๋ฒ•๋ก ์€ ์„ฑ๋Šฅ ์ธก๋ฉด๊ณผ ์ง€์†์„ฑ ์ธก๋ฉด์—์„œ๋„ ๋›ฐ์–ด๋‚œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ž„.

1. INTRODUCTION

  • ๋‹จ์ˆœํ•œ ๋ชจ๋ธ์€ ํ•ด์„ํ•˜๋Š”๋ฐ ๋ฌด๋ฆฌ๊ฐ€ ์—†์œผ๋‚˜, ๋ณต์žกํ•œ ๋ชจ๋ธ์ผ์ˆ˜๋ก ์ง๊ด€์ ์ธ ์ดํ•ด๊ฐ€ ์–ด๋ ค์›Œ์ง. ๋”ฐ๋ผ์„œ accuracy์™€ interpretbility์— trade-off๊ฐ€ ์กด์žฌํ•จ.

๋”ฐ๋ผ์„œ ํ•ด๋‹น ๋…ผ๋ฌธ์€:
1) Explanation model: ๋ชจ๋ธ์˜ prediction์— ๋Œ€ํ•œ ์–ด๋–ค ์„ค๋ช…๋„ ๋ชจ๋ธ ๊ทธ ์ž์ฒด๋กœ ๋ณด๋Š” ๊ด€์ ์„ ์ œ์•ˆํ•จ. ๋˜ํ•œ, ํ•ด๋‹น ์ ‘๊ทผ์€ additive feature attribution method๋ผ๋Š” ํด๋ž˜์Šค๋ฅผ ์ •์˜ํ•˜๊ฒŒ ๋จ.
2) ๊ฒŒ์ž„ ์ด๋ก  ๊ฒฐ๊ณผ: additive feature attribution method์˜ ์ „์ฒด ํด๋ž˜์Šค๊ฐ€ uniqueํ•œ ์†”๋ฃจ์…˜์ด ์žˆ์Œ์„ ๋ณด์žฅํ•จ. SHAP value๋ผ๋Š” ๊ฐ’์œผ๋กœ ํ†ตํ•ฉ๋œ feature importance๋ฅผ ํ‘œํ˜„ํ•จ.
3) SHAP value ์ถ”์ • ๋ฐฉ๋ฒ•: ์ด๋ฏธ ์กด์žฌํ•˜๋Š” ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ์ง๊ด€์ ์ด๋ผ๋Š” ๊ฒƒ์„ ๋น„๊ตํ•จ.

2. ADDITIVE FEATURE ATTRIBUTION METHODS

  • ๋ชจ๋ธ์˜ prediction์„ ์ดํ•ดํ•˜๋Š” ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์€ '๋ชจ๋ธ ๊ทธ ์ž์ฒด๋ฅผ ์ดํ•ด'ํ•˜๋Š” ๊ฒƒ์ด๋‚˜ ensemble์ด๋‚˜ ๋”ฅ๋Ÿฌ๋‹ ์ˆ˜์ค€์˜ ๋ชจ๋ธ์€ ๊ทธ ์ž์ฒด๋กœ ์ดํ•ด๊ฐ€ ์–ด๋ ต๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์Œ.

=> ์›๋ž˜ ๋ชจ๋ธ์˜ ํ•ด์„ ๊ฐ€๋Šฅํ•œ ๊ทผ์‚ฌ์น˜๋กœ ์ •์˜ํ•˜๋Š” ๋” ๊ฐ„๋‹จํ•œ Explanation Model ์‚ฌ์šฉ.

  • Additive Feature Attribution Method: ์ด์ง„ variable์˜ ์„ ํ˜• function ํ˜•ํƒœ์˜ ๋ชจ๋ธ

ํŒŒ์ด๋“ค์ด ๊ฐ feature์— ์˜ํ–ฅ์„ ์ฃผ๊ณ , ์ด ๊ฐ’๋“ค์„ ๋ชจ๋‘ ๋”ํ•œ ๊ฒƒ์˜ ๊ทผ์‚ฌ๊ฐ’์ด original ๋ชจ๋ธ์˜ f(x)๋ผ๋Š” ๊ฐ€์ •์„ ๋‘” ๋ชจ๋ธ์ž„.

2.1 LIME

  • LIME ๋ฐฉ๋ฒ•๋ก ์€ ๊ฐ ๋ชจ๋ธ์˜ prediction์„ ์ง€์—ญ์  ๊ทผ์‚ฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด์„ํ•จ.
    => Additive Feature Attribution Method์™€ ๋™์ผํ•จ. ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํŒŒ์ด๊ฐ’์„ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด ์•„๋ž˜์˜ ํ•จ์ˆ˜์˜ argmin์„ ๊ตฌํ•จ.

2.2, 2.3 DeepLIFT์™€ Layer-Wise Relevance Propagation

  • ๋”ฅ๋Ÿฌ๋‹์„ ์œ„ํ•œ Recursive prediction explanation method์œผ๋กœ, 'summation to delta' ์„ฑ์งˆ์„ ์‚ฌ์šฉํ•˜์—ฌ feature์— ๋Œ€ํ•œ reference ๊ฐ’์„ ์ œ๊ณตํ•จ. ์ด ๋˜ํ•œ Additive Feature Attribution Method์™€ ๋™์ผํ•จ.
  • Layer wise relevance propagation ๋ฐฉ๋ฒ•์€ deep network๋ฅผ ์œ„ํ•œ prediction์„ ํ•ด์„ํ•จ. DeepLIFT์—์„œ ๋ชจ๋“  ๋‰ด๋Ÿฐ์˜ refernce ํ™œ์„ฑํ™”๋ฅผ 0์œผ๋กœ ์ฒ˜๋ฆฌํ•œ ์ƒํƒœ์™€ ๋™์ผํ•จ.

2.4 Classic Shapley Value Estimation

  • Shapley regression value: ์„ ํ˜• ํšŒ๊ท€์—์„œ์˜ feature importance๊ฐ’์ž„. ๊ฐ feature์— importance value๋ฅผ ํ• ๋‹นํ•˜๊ณ , ๊ทธ ์˜ํ–ฅ์„ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด ํ•ด๋‹น feauture๊ฐ€ ์žˆ์„ ๋•Œ์™€ ์—†์„ ๋•Œ๋ฅผ ๊ฐ๊ฐ ํ•™์Šตํ•˜์—ฌ ๋น„๊ตํ•จ.

  • ํŒŒ์ด๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ์œ„์˜ ์ˆ˜์‹์—์„œ ์˜ค๋ฅธ์ชฝ ์ฐจ์ด ๊ฐ’์ด ํ•ด๋‹น feature๊ฐ€ ์žˆ์„ ๋•Œ์™€ ์—†์„ ๋•Œ์˜ ๋น„๊ต๊ฐ’์ž„.
    +) ํ•ด๋‹น ๋น„๊ต๊ฐ’์„ weighted average๊ฐ’์„ ๋ถ€์—ฌํ•˜์—ฌ ๋ชจ๋‘ ๋”ํ•œ ๊ฒƒ์ด ํŒŒ์ด๊ฐ’.

3. SIMPLE PROPERTIES UNIQUELY DETERMINE ADDITIVE FEATURE ATTRIBUTIONS

  • Additive feature attribution method์˜ ์ฃผ๋ชฉํ•  ์ : ์•„๋ž˜์˜ ์„ธ ๊ฐ€์ง€ ์†์„ฑ์„ ๋ชจ๋‘ ๋งŒ์กฑํ•˜๋Š” ๊ณ ์œ ํ•œ(๋‹จ ํ•˜๋‚˜์˜) ํ•ด๊ฐ€ ์กด์žฌํ•œ๋‹ค๋Š” ๊ฒƒ์ž„.
    => 2.4์˜ Classic Shapley์—์„œ๋Š” ์•Œ๋ ค์กŒ์—ˆ์ง€๋งŒ 2.1 ~ 2.3์—์„œ๋Š” ๋ฐํ˜€์ง€์ง€ ์•Š์•˜์Œ.

PROPERTY 01. Local Accuracy

  • Original ๋ชจ๋ธ f๋ฅผ input x๊ฐ’์— ๋Œ€ํ•ด ๊ทผ์‚ฌํ•  ๋•Œ, local accuracy๋Š” x'(๋‹จ์ˆœํ™”๋œ input x๊ฐ’)์— ๋Œ€ํ•ด์„œ ์ตœ์†Œํ•œ ๊ฒฐ๊ณผ๊ฐ’์„ Original ๋ชจ๋ธ๊ณผ ๋™์ผํ•˜๋„๋ก ๋งž์ถ”๋Š” ๊ฒƒ์ž„.

x = hx(x')์ผ ๊ฒฝ์šฐ g(x'), ์ฆ‰ ์ถ”์ •๊ฐ’์ด f(x), Original๊ณผ ๋™์ผํ•จ.

PROPERTY 02. Missingness

  • x' = 0์ธ ๊ฐ’์— ๋Œ€ํ•ด์„œ๋Š” ํŒŒ์ด๋ฅผ 0์œผ๋กœ ํ•˜์—ฌ attributed impact๊ฐ€ ์—†๋„๋ก ์ œํ•œํ•จ.

PROPERTY 03. Consistency

  • z'์— ๋Œ€ํ•ด Original ๊ฐ’๊ณผ, ๋‹จ์ˆœํ™”ํ•œ z'์˜ output๊ฐ’์ด ๋™์ผํ•จ์„ ๊ฐ€์ •ํ•˜๊ณ , zi' = 0์ผ ๊ฒฝ์šฐ๋ฅผ z'\ ๋ผ๊ณ  ์„ค์ •ํ•˜์˜€์„ ๋•Œ, f' ๋ชจ๋ธ์˜ ๋‘ ๊ฐ’์— ๋Œ€ํ•œ ์ฐจ๊ฐ€ f ๋ชจ๋ธ์˜ ๋‘ ๊ฐ’์— ๋Œ€ํ•œ ์ฐจ๋ณด๋‹ค ํฌ๋ฉด, f'์˜ ํŒŒ์ด ๊ฐ’์ด f์˜ ํŒŒ์ด ๊ฐ’๋ณด๋‹ค ํผ.

4. THEOREM 01

  • 1~3๋ฒˆ ์„ฑ์งˆ์— ๋Œ€ํ•ด ๋‹จ ํ•˜๋‚˜์˜ ๊ฐ€๋Šฅํ•œ additive feature attribution method๊ฐ€ ์žˆ์Œ์„ ๋‚˜ํƒ€๋ƒ„. ์ด ๊ฒฐ๊ณผ๋Š” Shapley value๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‘์ง€ ์•Š๋Š” ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋ก ๋“ค์€ local accuracy์™€ consistency๋ฅผ ์œ„๋ฐ˜ํ•จ์„ ์•Œ ์ˆ˜ ์žˆ์Œ.

=> ๊ฒฐ๋ก ์€ ํ•ด๋‹น Additive feature attribution method๊ฐ€ ์ด ๋‘ ๊ฐ€์ง€ ์š”์†Œ๋ฅผ violateํ•จ์œผ๋กœ classic shapley method์ชฝ์„ ๋” ๋ฐœ์ „์‹œ์ผœ์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ

4. SHAP Values

  • ํ†ตํ•ฉ๋œ Feature importance๋กœ ํ•ด๋‹น SHAP value๋ฅผ ์ œ์•ˆํ•จ. SHAP ๊ฐ’์€ ๋‹จ ํ•˜๋‚˜์˜ additive feature importance measure๋ฅผ, ์„ฑ์งˆ 1~3์„ ์œ„๋ฐ˜ํ•˜์ง€ ์•Š๊ณ  ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Œ.
  • ์ •ํ™•ํ•œ SHAP ๊ฐ’์„ ์ธก์ •ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ๋ณต์žกํ•˜์—ฌ ํ•ด๋‹น ๊ฐ’์„ ๊ทผ์‚ฌํ•จ.

1) Shapley Sampling Value
2) Max SHAP, Deep SHAP

=> ํ•ด๋‹น ๋ฐฉ๋ฒ•๋ก ์„ ์ ์šฉํ•˜์—ฌ ๋‹จ์ˆœํ™”ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” feauture ๋…๋ฆฝ์„ฑ๊ณผ ๋ชจ๋ธ์˜ ์„ ํ˜•์„ฑ์„ ๊ฐ€์ •ํ•ด์•ผ ํ•จ.

Max SHAP

  • ๊ฐ input ๊ฐ’์ด ์–ผ๋งˆ๋‚˜ maximum value๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋Š”์ง€์— ๋Œ€ํ•œ ํ™•๋ฅ ์„ Shapley value๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ตฌํ•  ์ˆ˜ ์žˆ์Œ.

Deep SHAP

  • Deep LIFT์™€ Shapley Value๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•œ ๊ฒƒ์œผ๋กœ, Deep LIFT๋Š” SHAP value๋ฅผ input feature๋“ค์ด ์„œ๋กœ ๋…๋ฆฝ์ ์ด๋ฉฐ, deep model์ด ์„ ํ˜•์ ์ด๋ผ๋Š” ๊ฐ€์ •ํ•˜์— ๊ทผ์‚ฌํ•จ.
    +) Deep LIFT๊ฐ€ local accuracy์™€ missingness๋ฅผ ๋งŒ์กฑํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์ด๋ฏ€๋กœ consistency๋งŒ ๋งŒ์กฑํ•œ๋‹ค๋ฉด ํ•ด๋‹น Shapley value๊ฐ€ ๊ณ ์œ ํ•œ attribution์„ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Œ.

5. COMPUTATIONAL AND USER STUDY EXPERIMENTS

  • Computational ํšจ์œจ์„ฑ: Kernal SHAP์„ ํ†ตํ•ด ๋” ์ •ํ™•ํ•œ ๊ทผ์‚ฌ๋ฅผ ๋” ์ ์€ ๋น„๊ต๋ฅผ ํ†ตํ•ด์„œ ๊ฐ€๋Šฅํ•ด์ง. ์‹คํ—˜ ๊ฒฐ๊ณผ local accuracy์™€ consistency๋ฅผ ๋งŒ์กฑํ•˜๋Š” SHAP value์— ๋”ฐ๋ผ์„œ Kernel SHAP๊ณผ LIME์˜ ํšจ์œจ์„ฑ์˜ ์ฐจ์ด๊ฐ€ ๊ทน๋ช…ํ•ด์ง.

  • ์„ค๋ช… ์ง๊ด€์„ฑ: ์ข‹์€ ๋ชจ๋ธ = ์„ค๋ช…์˜ ์ผ๊ด€์„ฑ์ด๋ผ๋Š” ๊ฐ€์ •ํ•˜์— testing์„ ์ง„ํ–‰ํ•จ. Sickness Score, Max allocation์„ ํ†ตํ•ด ์‹คํ—˜ ์ฐธ๊ฐ€์ž๋“ค์—๊ฒŒ input๊ฐ’์— ๋Œ€ํ•œ credit์„ ๋ถ€์—ฌํ•˜๋ผ๊ณ  ์‹คํ—˜์„ ์ง„ํ–‰ํ•จ. SHAP ๋ถ„์„์˜ ๊ฒฐ๊ณผ๊ฐ’์ด ์ฐธ๊ฐ€์ž๋“ค์˜ ์ง๊ด€๊ณผ ๊ฐ€์žฅ ๋ถ€ํ•ฉํ•˜์˜€์Œ.

6. CONCLUSION

  • SHAP์€ ๋ชจ๋ธ prediction์˜ accuracy๊ณผ interpretibility ๊ฐ„์˜ trade-off๋ฅผ ํ•ด์†Œํ•˜๊ธฐ ์œ„ํ•ด ๊ฐœ๋ฐœ๋œ ๋ฉ”์†Œ๋“œ๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, ์—ฌ๋Ÿฌ ๊ธฐ์กด ๋ฐฉ๋ฒ•์„ ํฌํ•จํ•˜๋Š” ์ด์ƒ์ ์ธ ์†”๋ฃจ์…˜์„ ์ œ์‹œํ•จ. ์•ž์œผ๋กœ๋Š” ๋” ๋น ๋ฅด๊ณ  ๋ชจ๋ธ ์œ ํ˜•์— ํŠนํ™”๋œ SHAP ๊ฐ’ ์ถ”์ • ๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜๊ณ , ์ƒˆ๋กœ์šด ํ•ด์„ ๋ชจ๋ธ ํด๋ž˜์Šค๋ฅผ ์ •์˜ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•จ.

** ๊ผญ์ง€: ํ•ด๋‹น ๋ถ„์„์ด ๋ธ”๋ž™๋ฐ•์Šค์˜€๋˜ AI๋ฅผ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•œ ์ข‹์€ ์ธ์‚ฌ์ดํŠธ๊ฐ€ ๋˜์–ด ์ค„ ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์•„์„œ ์‹ ๊ธฐํ–ˆ์Œ. ๊ทธ๋Ÿฌ๋‚˜ DNN ๋ชจ๋ธ์ด๋‚˜ Ensemble ๋ชจ๋ธ์„ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด input feature๋“ค์ด ๋ชจ๋‘ ๋…๋ฆฝ์ด๋ฉฐ, ์„ ํ˜•์ ์ด๋ผ๋Š” ๊ฐ€์ •ํ•˜์— ๋ถ„์„์„ ํ•œ๋‹ค๋Š” ์ ์ด ์กฐ๊ธˆ ์˜์•„ํ–ˆ์Œ. ๊ต‰์žฅํžˆ ๋ณต์žกํ•œ NN ๋ชจ๋ธ์ด๋‚˜ ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ ๋“ฑ์—๋„ ์ ์šฉ์ด ๊ฐ€๋Šฅํ• ์ง€ ๊ถ๊ธˆํ•จ.

profile
์œ ์ •๋ฏผ

0๊ฐœ์˜ ๋Œ“๊ธ€