[DL/AD]ANOMALY TRANSFORMER: TIME SERIES ANOMALY DETECTION WITH ASSOCIATION DISCREPANCY

๊ตฌ๋งยท2024๋…„ 9์›” 18์ผ
0

[Paper Review]

๋ชฉ๋ก ๋ณด๊ธฐ
5/8

๐Ÿ“„ ์ฐธ๊ณ 

๐Ÿ“„ ์›๋ฌธ

Abstract

์‹œ๊ณ„์—ด ์ด์ƒ ํƒ์ง€๋Š” ๋ณต์žกํ•œ dynamic ํŠน์„ฑ์„ ๋‹ค๋ค„์•ผ ํ•˜๋ฏ€๋กœ ๊ธฐ์กด์˜ pointwise(ํฌ์ธํŠธ๋ณ„) ํ‘œํ˜„์ด๋‚˜ pairwise ์—ฐ๊ด€์„ฑ ํ•™์Šต๋งŒ์œผ๋กœ๋Š” ๋ถ€์กฑํ•จ.

Transformer๋Š” pointwise ํ‘œํ˜„๊ณผ ์—ฐ๊ด€์„ฑ์„ ํ†ตํ•ฉ์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•  ์ˆ˜ ์žˆ์Œ. ๊ฐ ์‹œ์ ์˜ self-attention ๊ฐ€์ค‘์น˜ ๋ถ„ํฌ๊ฐ€ ์ „์ฒด ์‹œ๊ณ„์—ด๊ณผ์˜ ํ’๋ถ€ํ•œ ์—ฐ๊ด€์„ฑ์„ ๋‹ด๊ณ  ์žˆ์Œ์„ ๋ฐœ๊ฒฌํ–ˆ์Œ.

์ด์ƒ์น˜๋Š” ๋“œ๋ฌผ๊ฒŒ ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ „์ฒด ์‹œ๊ณ„์—ด๊ณผ์˜ ์—ฐ๊ด€์„ฑ์„ ๋ฐœ๊ฒฌํ•˜๊ธฐ ์–ด๋ ค์›€. ๋”ฐ๋ผ์„œ ์ด์ƒ์น˜์˜ ์—ฐ๊ด€์„ฑ์€ ์ฃผ๋กœ ์ธ์ ‘ํ•œ ์‹œ์ ์— ์ง‘์ค‘๋จ.

์ด๋Ÿฌํ•œ ์ธ์ ‘ํ•œ ์‹œ์ ์— ์ง‘์ค‘๋œ ํŽธํ–ฅ์€ ์ด์ƒ์น˜๋ฅผ ํŒ๋ณ„ํ•˜๋Š” ์—ฐ๊ด€์„ฑ ๊ธฐ๋ฐ˜ ๊ธฐ์ค€์ด ๋  ์ˆ˜ ์žˆ์Œ. ๋”ฐ๋ผ์„œ ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ Association Discrepancy(์—ฐ๊ด€์„ฑ ์ฐจ์ด)๋ฅผ ์ด์šฉํ•ด ์ •์ƒ๊ณผ ์ด์ƒ ํฌ์ธํŠธ๋ฅผ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ฐœ๊ฒฌํ•จ. ์ด๋ฅผ ์œ„ํ•ด Anomaly Transformer๋ฅผ ์ œ์•ˆํ•˜๊ณ , ์ƒˆ๋กœ์šด Anomaly-Attention ๋ฉ”์ปค๋‹ˆ์ฆ˜๊ณผ minimax ์ „๋žต์„ ์‚ฌ์šฉํ•ด ์ด์ƒ ํƒ์ง€ ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•จ. Anomaly Transformer๋Š” 3๊ฐ€์ง€ application์—์„œ SOTA๋ฅผ ๋‹ฌ์„ฑํ•จ.

Introduction

์ด์ƒ์น˜๋Š” ๋“œ๋ฌผ๊ณ  ๋ฐœ๊ฒฌํ•˜๊ธฐ ํž˜๋“ค๊ธฐ ๋•Œ๋ฌธ์— ํ˜„์‹ค์—์„œ ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง์„ ํ•˜๋Š” ๊ฒƒ์€ ๋น„์šฉ์ ์œผ๋กœ ๊ฐ๋‹นํ•˜๊ธฐ ํž˜๋“ค์–ด ๋น„์ง€๋„ ํ•™์Šต์˜ ํ™˜๊ฒฝ์—์„œ ์‹œ๊ณ„์—ด ์ด์ƒ ํƒ์ง€๋ฅผ ํ•˜๊ณ ์ž ํ•จ.

๋น„์ง€๋„ ๊ธฐ๋ฐ˜์˜ ๊ณ ์ „์  ์ด์ƒ ํƒ์ง€ ๋ฐฉ๋ฒ•๋“ค๋กœ๋Š” LOF๊ฐ™์€ ๋ฐ€๋„ ์ถ”์ • ๋ฐฉ๋ฒ•, OC-SVM๊ณผ SVDD ๊ฐ™์€ ํด๋Ÿฌ์Šคํ„ฐ๋ง ๋ฐฉ๋ฒ•๋“ค์ด ์กด์žฌํ•จ. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ๊ณ ์ „ ๋ฐฉ๋ฒ•๋“ค์€ ์‹œ๊ฐ„ ์ •๋ณด๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ , ์ฒ˜์Œ ๋ณด๋Š” ์‹ค์ œ ์ƒํ™ฉ์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์ด ๋ถ€์กฑํ•จ.

์ตœ๊ทผ ๋”ฅ๋Ÿฌ๋‹์„ ํ†ตํ•œ ํ‘œํ˜„ ํ•™์Šต์œผ๋กœ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๊ธฐ๋„ ํ•˜์˜€์Œ. ์ฃผ๋กœ RNN์„ ํ†ตํ•œ ํฌ์ธํŠธ๋ณ„ ํ•™์Šตํ•˜๊ณ , reconstruction์„ ํ†ตํ•ด self-supervised ํ•™์Šต์„ ํ•˜์˜€์Œ.

ํ•˜์ง€๋งŒ ์ด์ƒ์น˜๊ฐ€ ๋“œ๋ฌผ๊ธฐ ๋•Œ๋ฌธ์— ํฌ์ธํŠธ๋ณ„ ํ•™์Šต์€ ๋ณต์žกํ•œ ์‹œ๊ฐ„ ํŒจํ„ด์— ๋Œ€ํ•œ ์ถฉ๋ถ„ํ•œ ์ •๋ณด๋ฅผ ์ฃผ์ง€ ๋ชปํ•˜๊ณ , ์ •์ƒ ์‹œ์ ์— ์••๋„๋˜๊ธฐ์— ์ด์ƒ์น˜๋ฅผ ๊ตฌ๋ณ„ํ•˜๊ธฐ ์–ด๋ ค์›€. ๋˜ํ•œ reconstruction error๋Š” ํฌ์ธํŠธ๋ณ„๋กœ ๊ณ„์‚ฐ๋˜์–ด ์‹œ๊ฐ„์  ๋ฌธ๋งฅ์— ๋Œ€ํ•œ ํฌ๊ด„์  ์„ค๋ช… ์ œ๊ณต์ด ๋ถˆ๊ฐ€ํ•จ.

reconstruction error

  • ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ์ด์ƒ ํƒ์ง€์—์„œ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” ๊ฐœ๋…์œผ๋กœ, ์žฌ๊ตฌ์„ฑ ์˜ค๋ฅ˜๋ฅผ ์˜๋ฏธ
  • ์ด๋Š” ๋ชจ๋ธ์ด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ด๋‹น ๋ฐ์ดํ„ฐ๋ฅผ ์žฌ๊ตฌ์„ฑ(reconstruct)ํ•œ ํ›„, ์›๋ณธ ๋ฐ์ดํ„ฐ์™€ ์žฌ๊ตฌ์„ฑ๋œ ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ์‹
  • ์ •์ƒ ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ ์žฌ๊ตฌ์„ฑ๋œ ๊ฐ’๊ณผ ์‹ค์ œ ๊ฐ’์˜ ์ฐจ์ด๊ฐ€ ์ž‘๊ฒ ์ง€๋งŒ, ์ด์ƒ์น˜์˜ ๊ฒฝ์šฐ ์žฌ๊ตฌ์„ฑ์ด ์–ด๋ ต๊ธฐ์— ์žฌ๊ตฌ์„ฑ ์˜ค๋ฅ˜๊ฐ€ ํฌ๊ฒŒ ๋ฐœ์ƒํ•จ โ†’ ์ด์ƒ์น˜ ๊ฐ์ง€ ๊ธฐ์ค€์œผ๋กœ ํ™œ์šฉ

๋˜๋‹ค๋ฅธ ์ฃผ์š” ๋ฐฉ๋ฒ•๋“ค๋กœ๋Š” ๋ช…์‹œ์ ์ธ ์—ฐ๊ด€์„ฑ ๋ชจ๋ธ๋ง์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜์˜€์Œ. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฐ ๊ณ ์ „ ๋ฐฉ๋ฒ•๋“ค์€ ์„ธ๋ฐ€ํ•œ ์—ฐ๊ด€์„ฑ์„ ํฌ์ฐฉํ•˜๋Š” ๊ฒƒ์ด ์–ด๋ ค์› ์Œ.

๋”ฐ๋ผ์„œ ์ด๋Ÿฐ ํ•œ๊ณ„๋“ค์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด, ์ด ๋…ผ๋ฌธ์€ transformer๋ฅผ ๋น„์ง€๋„ ์‹œ๊ณ„์—ด ์ด์ƒ ํƒ์ง€์— ์ ์šฉํ•˜๊ณ  ์—ฐ๊ด€์„ฑ ํ•™์Šต์„ ์œ„ํ•œ Anomaly Transformer๋ฅผ ์ œ์•ˆํ•จ.

< ์š”์•ฝ >

  • Prior-association(local)๊ณผ series-association(global)์„ ๋™์‹œ์— ๋ชจ๋ธ๋งํ•˜์—ฌ Association Discrepancy(์—ฐ๊ด€์„ฑ ์ฐจ์ด)๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” Anomaly-Attention ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๊ฐ–์ถ˜ Anomaly Transformer๋ฅผ ์ œ์•ˆ
  • Association Discrepancy์˜ ์ •์ƒ-๋น„์ •์ƒ ๊ตฌ๋ถ„ ๊ฐ€๋Šฅ์„ฑ์„ ์ฆํญ์‹œํ‚ค๊ณ , ์ƒˆ๋กœ์šด ์—ฐ๊ด€์„ฑ ๊ธฐ๋ฐ˜ ํƒ์ง€ ๊ธฐ์ค€์„ ๋„์ถœํ•˜๊ธฐ ์œ„ํ•ด MiniMax ์ตœ์ ํ™” ์ „๋žต์„ ์ œ์•ˆ
  • Anomaly Transformer๋Š” 3๊ฐ€์ง€ ์‹ค์ œ ์‘์šฉ์—์„œ 6๊ฐœ์˜ ๋ฒค์น˜๋งˆํฌ์—์„œ ์ตœ์‹ ์˜ ์ด์ƒ ํƒ์ง€ ์„ฑ๊ณผ๋ฅผ ๋‹ฌ์„ฑํ•˜์˜€์œผ๋ฉฐ, ๊ด‘๋ฒ”์œ„ํ•œ ์†Œ๊ฑฐ ์‹คํ—˜ ๋ฐ ํ†ต์ฐฐ๋ ฅ ์žˆ๋Š” ์‚ฌ๋ก€ ์—ฐ๊ตฌ๊ฐ€ ์ œ๊ณต๋จ



  1. UNSUPERVISED TIME SERIES ANOMALY DETECTION
  2. TRANSFORMERS FOR TIME SERIES ANALYSIS


Method

๐Ÿ’ก๋น„์ง€๋„ ์‹œ๊ณ„์—ด ์ด์ƒ ํƒ์ง€์˜ ํ•ต์‹ฌ์€ ์œ ์šฉํ•œ ํ‘œํ˜„์„ ํ•™์Šตํ•˜๊ณ  ๊ตฌ๋ณ„ ๊ฐ€๋Šฅํ•œ ๊ธฐ์ค€์„ ์ฐพ๋Š” ๊ฒƒ
๋” ์œ ์šฉํ•œ ์—ฐ๊ด€์„ฑ์„ ์ฐพ์•„๋‚ด๊ณ , Association Discrepancy(์—ฐ๊ด€์„ฑ ์ฐจ์ด)๋ฅผ ํ•™์Šตํ•จ์œผ๋กœ์จ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž Anomaly Transformer๋ฅผ ์ œ์•ˆ
์ด Association Discrepancy๋Š” ๋ณธ์งˆ์ ์œผ๋กœ ์ •์ƒ๊ณผ ๋น„์ •์ƒ์„ ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์žˆ์Œ
๊ธฐ์ˆ ์ ์œผ๋กœ, Anomaly-Attention ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ œ์•ˆํ•˜์—ฌ prior-association(์šฐ์„  ์—ฐ๊ด€์„ฑ)๊ณผ series-association(์‹œ๋ฆฌ์ฆˆ ์—ฐ๊ด€์„ฑ)์„ ๊ตฌํ˜„ํ•˜๋ฉฐ, ๋” ๊ตฌ๋ณ„ ๊ฐ€๋Šฅํ•œ Association Discrepancy๋ฅผ ์–ป๊ธฐ ์œ„ํ•œ MiniMax ์ตœ์ ํ™” ์ „๋žต์„ ์‚ฌ์šฉ
์ด ์•„ํ‚คํ…์ฒ˜์™€ ํ•จ๊ป˜ ์„ค๊ณ„๋œ ์šฐ๋ฆฌ๋Š” ํ•™์Šต๋œ Association Discrepancy์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์—ฐ๊ด€์„ฑ ๊ธฐ๋ฐ˜ ๊ธฐ์ค€์„ ๋„์ถœ

  1. Anomaly Transformer

< Overall architecture >

(1) prior-association (2) series-association (3) minimax / (4) anomaly attention block

< Anomaly-Attention >

  • prior-association

    • ์ƒ๋Œ€์ ์ธ ์‹œ๊ฐ„์  ๊ฑฐ๋ฆฌ์™€ ๊ด€๋ จํ•˜์—ฌ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๊ฐ€์šฐ์‹œ์•ˆ ์ปค๋„์„ ์‚ฌ์šฉ
    • ๊ฐ€์šฐ์‹œ์•ˆ์˜ ํŠน์„ฑ ๋•๋ถ„์— ์ด๋Š” ์ธ์ ‘ํ•œ ์‹œ๊ฐ„ ์˜์—ญ์— ๋ณธ์งˆ์ ์œผ๋กœ ๋” ๋งŽ์€ attention์ด ๊ฐ€๋Šฅํ•จ
    • ๋˜ํ•œ ๊ฐ€์šฐ์‹œ์•ˆ์— ํ•™์Šต ๊ฐ€๋Šฅํ•œ ์Šค์ผ€์ผ ํŒŒ๋ผ๋ฏธํ„ฐ(ฯƒ)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, ์šฐ์„  ์—ฐ๊ด€์„ฑ(prior-association)์ด ๋‹ค์–‘ํ•œ ์‹œ๊ณ„์—ด ํŒจํ„ด, ์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค๋ฅธ ๊ธธ์ด์˜ ์ด์ƒ์น˜ ๊ตฌ๊ฐ„์— ์ ์‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•จ
  • series-association
    - ์› ์‹œ๊ณ„์—ด์—์„œ ์—ฐ๊ด€์„ฑ์„ ํ•™์Šตํ•˜์—ฌ ๊ฐ€์žฅ ํšจ๊ณผ์ ์ธ ์—ฐ๊ด€์„ฑ์„ ์ ์‘์ ์œผ๋กœ ์ฐพ์Œ

    ์ด ๋‘ ์—ฐ๊ด€์„ฑ ํฌ์ฐฉ ๋ฐฉ์‹์€ ๊ฐ๊ฐ์˜ ์‹œ์ ์˜ ์‹œ๊ฐ„์  ์˜์กด์„ฑ์„ ์œ ์ง€ํ•˜๋ฉฐ, ์ด๋Š” ๋” ์œ ์šฉํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•จ. ๋˜ํ•œ ์ด๋“ค์€ ๊ฐ๊ฐ ์ธ์ ‘ ์ง‘์ค‘ ์šฐ์„ (prior)๊ณผ ํ•™์Šต๋œ ์‹ค์ œ ์—ฐ๊ด€์„ฑ(association)์„ ๋ฐ˜์˜ํ•˜๋ฉฐ, ์ด๋“ค์˜ ์ฐจ์ด๋Š” ์ •์ƒ๊ณผ ๋น„์ •์ƒ์„ ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ์ค€์ด ๋จ

    < Association Discrepancy >

  • ์ •์ƒ ์‹œ์ ์—์„œ๋Š” ์ฐจ์ด๊ฐ€ ์ž‘๊ณ , ๋น„์ •์ƒ ์‹œ์ ์—์„œ๋Š” ์ฐจ์ด๊ฐ€ ํผ


    2. MINIMAX ASSOCIATION LEARNING

< Minimax Strategy >

  • Minimize phase
    - prior-association์ด series-association๊ณผ ๊ฐ€๊น๊ฒŒ ๋งž์ถฐ์ง€๋„๋ก ํ•จ(approximation)
    - ์ด ๊ณผ์ •์—์„œ prior-association์€ ๋‹ค์–‘ํ•œ ์‹œ๊ฐ„์  ํŒจํ„ด์— ์ ์‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•™์Šต๋จ
    - ์ด ๋‹จ๊ณ„๋Š” Gaussian ์ปค๋„์„ ์‚ฌ์šฉํ•ด ์ธ์ ‘ํ•œ ์‹œ์ ๋“ค ๊ฐ„์˜ ์—ฐ๊ด€์„ฑ์„ ํ•™์Šตํ•˜๊ณ , ์—ฐ๊ด€์„ฑ ์ฐจ์ด๋ฅผ ์ค„์ด๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋™์ž‘
  • Maximize phase
    - series-association์ด prior-association๊ณผ์˜ ์—ฐ๊ด€์„ฑ ์ฐจ์ด๋ฅผ ํ‚ค์šฐ๋„๋ก ์ตœ์ ํ™”
    - ์ฆ‰, series-association์€ ๋น„์ธ์ ‘ํ•œ ์‹œ์ ์— ๋” ๋งŽ์€ ์ฃผ์˜๋ฅผ ๊ธฐ์šธ์—ฌ ์—ฐ๊ด€์„ฑ ์ฐจ์ด๋ฅผ ํฌ๊ฒŒ ๋งŒ๋“ฆ
    - ์ด ๊ณผ์ •์—์„œ ์ •์ƒ ์‹œ์ ์—์„œ๋Š” reconstruction loss(์žฌ๊ตฌ์„ฑ ์˜ค๋ฅ˜)๊ฐ€ ์ ๊ฒŒ ๋ฐœ์ƒํ•˜์ง€๋งŒ, ๋น„์ •์ƒ ์‹œ์ ์—์„œ๋Š” ๋” ํฐ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋„๋ก ์œ ๋„

    < Association-based Anomaly Criterion >


  • -AssDis
    • (-๋ฅผ ๋ถ™์ž„์œผ๋กœ์จ)์ •์ƒ์ผ ๋•Œ ๊ฐ’์ด ์ž‘๊ณ , ๋น„์ •์ƒ์ผ ๋•Œ ๊ฐ’์ด ํฌ๋‹ค
  • Reconstruction loss
    • ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋ธ์ด ์–ผ๋งˆ๋‚˜ ์ •ํ™•ํ•˜๊ฒŒ ์žฌ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ์ธก์ •
    • ๋”ฐ๋ผ์„œ ์ •์ƒ ๋ฐ์ดํ„ฐ๋Š” ๋ชจ๋ธ์ด ์žฌ๊ตฌ์„ฑํ•˜๊ธฐ ์‰ฌ์šด ๋ฐ˜๋ฉด ๋น„์ •์ƒ ๋ฐ์ดํ„ฐ๋Š” ์žฌ๊ตฌ์„ฑ ์˜ค๋ฅ˜๊ฐ€ ํฌ๊ฒŒ ๋‚˜ํƒ€๋‚จ

Experiments

  1. Dataset
  2. Results - Quantitative results, precision/ recall / F1-score



    ์ด์ „ ๋ชจ๋ธ๋“ค์ด ์žก์•„๋‚ด์ง€ ๋ชปํ•˜๋Š” ์ด์ƒ์น˜๋ฅผ Anomaly Transformer๊ฐ€ ์žก์•„๋‚ด๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Œ
profile
๐Ÿ“ ๋ฐ์ดํ„ฐ์‚ฌ์ด์–ธ์Šค ํ•™๋ถ€์ƒ์˜ ๊ธฐ๋ก์žฅ!

0๊ฐœ์˜ ๋Œ“๊ธ€