๐Ÿคก BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer

ukkikkiaiยท2024๋…„ 4์›” 1์ผ

Euron ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

๋ชฉ๋ก ๋ณด๊ธฐ
3/13
post-thumbnail

๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ๋“ฃ๊ณ ...

  • ๊ฐ€์žฅ ๋งˆ์ง€๋ง‰์˜ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค๋Š” ๊ฒƒ => ์Šคํฌ ๋‹นํ•จX

ex) ์นด๋ฉ”๋ผ -> SD์นด๋“œ -> ์ปดํ“จํ„ฐ -> _ _ _ _ _ (์—ฌ๊ธฐ์— ์˜ฌ ๊ฒƒ์€?)

  • GELU function

  • ReLU์™€ ๋‹ค๋ฅด๊ฒŒ ์•„๋ž˜๋กœ ๋ณผ๋กํ•œ ๋ถ€๋ถ„ ์กด์žฌ


ABSTRACT

๊ธฐ์กด์˜ ์ถ”์ฒœ ๋ชจ๋ธ๋“ค์€ ์‚ฌ์šฉ์ž์˜ ๊ณผ๊ฑฐ ์ƒํ˜ธ์ž‘์šฉ ๊ธฐ๋ก์— ๋Œ€ํ•ด ๋‹จ๋ฐฉํ–ฅ ์ˆœ์ฐจ์ (์ขŒ์ธก -> ์šฐ์ธก)์œผ๋กœ ์ธ์ฝ”๋”ฉํ•จ. ๊ทธ๋Ÿฌ๋‚˜,
(a) ๋‹จ๋ฐฉํ–ฅ ์•„ํ‚คํ…์ฒ˜๋Š” ์‚ฌ์šฉ์ž์˜ ํ–‰๋™ ์‹œํ€€์Šค์— ๋‚ด์žฌ๋œ ๋Šฅ๋ ฅ์„ ์ œํ•œํ•จ.
(b) ์ˆœ์„œ๊ฐ€ ์žˆ๋Š” ์‹œํ€€์Šค๊ฐ€ ํ•ญ์ƒ ์‹ค์šฉ์ ์ธ ๊ฒƒ์€ ์•„๋‹˜.

์ด๋Ÿฌํ•œ ์ œํ•œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด deep bidirectional self-attention์œผ๋กœ ์‚ฌ์šฉ์ž ํ–‰๋™์„ ๋ชจ๋ธ๋งํ•˜๋Š” BERT4Rec ๋ชจ๋ธ์„ ์ œ์•ˆํ•จ. Cloze๋ฅผ ๋„์ž…ํ•˜์—ฌ ์ขŒ์ธก, ์šฐ์ธก ๋งฅ๋ฝ์„ ๋ชจ๋‘ ๊ณ ๋ คํ•˜์—ฌ sequence์—์„œ ์ž„์˜๋กœ mask๋œ item์„ ์˜ˆ์ธกํ•จ.


1. INTRODUCTION

๋‹จ๋ฐฉํ–ฅ ๋ชจ๋ธ์˜ ์ฃผ์š” ํ•œ๊ณ„: Historical Sequence์—์„œ ๊ฐ ํ•ญ๋ชฉ๋“ค์˜ ์ˆจ๊ฒจ์ง„ ๋Šฅ๋ ฅ๊ณผ ๋งฅ๋ฝ์„ ์ œํ•œํ•จ. ๊ฐ ํ•ญ๋ชฉ์ด ์ด์ „ ํ•ญ๋ชฉ์—์„œ์˜ ์ •๋ณด๋งŒ์„ ์ธ์ฝ”๋”ฉํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž„.

=> ์‚ฌ์šฉ์ž์˜ ์ƒํ˜ธ์ž‘์šฉ์€ ๋‹ค์–‘ํ•œ ๊ด€์ธก ๋ถˆ๊ฐ€๋Šฅํ•œ ์™ธ๋ถ€ ์š”์ธ๋“ค์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‹ค์ œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์—์„œ๋Š” ์‚ฌ์šฉ์ž ํ–‰๋™์ด ์—„๊ฒฉํ•˜๊ฒŒ ์ˆœ์„œ๋ฅผ ๋”ฐ๋ฅด์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Œ.

  • ์‚ฌ์šฉ์ž ํ–‰๋™ sequence์—์„œ ์–‘์ชฝ ๋ฐฉํ–ฅ์˜ ๋ฌธ๋งฅ์„ ๊ณ ๋ คํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•จ. ํ…์ŠคํŠธ ๋งฅ๋ฝ ์ดํ•ด์—์„œ ์„ฑ๊ณตํ•œ ๋ชจ๋ธ BERT์— ์˜๊ฐ์„ ๋ฐ›์•„ ์–‘๋ฐฉํ–ฅ self-attention ๋ชจ๋ธ์„ ์ถ”์ฒœ ๋ชจ๋ธ์— ์ ์šฉํ•จ.
  • ๋ฌธ์ œ: ๊ธฐ์กด์˜ ์ˆœ์ฐจ์  ๋ชจ๋ธ์€ ์ด์ „ ํ•ญ๋ชฉ์„ ํ†ตํ•ด ๋‹ค์Œ ํ•ญ๋ชฉ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ›ˆ๋ จ๋˜๋Š”๋ฐ, ์–‘๋ฐฉํ–ฅ์€ ์ขŒ์šฐ ๋ฌธ๋งฅ์— ๋ชจ๋‘ ์˜์กดํ•˜๊ฒŒ ๋˜๋ฉด์„œ ์˜ˆ์ธกํ•ด์•ผํ•  ํ•ญ๋ชฉ์„ ๋ณด๊ฒŒ ๋˜์–ด๋ฒ„๋ฆฌ๋Š” ์ •๋ณด ๋ˆ„์ถœ์„ ์ดˆ๋ž˜ => ๋ชจ๋ธ์ด ์•„๋ฌด๊ฒƒ๋„ ๋ฐฐ์šฐ์ง€ ๋ชปํ•˜๊ฒŒ ๋  ์ˆ˜๋„.

Cloze ๋„์ž…

  • ๋‹ค์Œ ํ•ญ๋ชฉ์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๋‹จ๋ฐฉํ–ฅ ๋ชจ๋ธ์˜ '๋ชฉ์ '์„ ๋Œ€์‹ ํ•จ. ์ž…๋ ฅ sequence์˜ ์ผ๋ถ€๋ฅผ ๋ฌด์ž‘์œ„๋กœ maskํ•˜๊ณ , ์ฃผ๋ณ€ ๋ฌธ๋งฅ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ด๋‹น masked๋œ ํ•ญ๋ชฉ์˜ ID๋ฅผ ์˜ˆ์ธกํ•จ.
    => ์ •๋ณด ๋ˆ„์ถœ์„ ํ”ผํ•˜๊ณ  ๋” ๋งŽ์€ ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ.

๊ทธ๋Ÿฌ๋‚˜ ์ตœ์†ก ์ˆœ์ฐจ์  ์ถ”์ฒœ๊ณผ ์ผ๊ด€์„ฑ์ด ์—†์Œ => ์ž…๋ ฅ sequence ๋์— ํŠน์ˆ˜ ํ† ํฐ 'mask'๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์˜ˆ์ธกํ•ด์•ผ ํ•  ํ•ญ๋ชฉ์„ ๋‚˜ํƒ€๋‚ด๊ณ , ์ตœ์ข… ์ˆจ๊ฒจ์ง„ ๋ฒกํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ถ”์ฒœ์„ ์ˆ˜ํ–‰ํ•จ.


2. BERT4REC

3.1 Problem Statement

์ˆœ์ฐจ์  ์ถ”์ฒœ์—์„œ interaction history S๊ฐ€๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์‚ฌ์šฉ์ž u๊ฐ€ ์•„์ดํ…œ v๋ฅผ time step n+1์—์„œ ๋ณผ ํ™•๋ฅ ์ž„.

3.2 Model Architecture

BERT4Rec: Bidirectional Encoder Representations from Transformers to sequenction Recommendation

  • L๊ฐœ์˜ bidirectional Transform Layer๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, ๊ฐ ๋ ˆ์ด์–ด์—์„œ๋Š” ์ด์ „ ๋ ˆ์ด์–ด์˜ ๋ชจ๋“  ์œ„์น˜ ๊ฐ„ ์ •๋ณด๋ฅผ ๊ตํ™˜ํ•˜์—ฌ ๊ฐ ์œ„์น˜์˜ ํ‘œํ˜„์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ˆ˜์ •ํ•จ.

=> self attention mechanism: ์–ด๋–ค ๊ฑฐ๋ฆฌ์˜ dependency๋„ ์ง์ ‘์ ์œผ๋กœ ์žก์•„๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ๊ฐ€์ง.
+) ๊ธฐ์กด ๋ชจ๋ธ๋“ค: CNN์€ ์ˆ˜์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์˜์—ญ์ด ์ œํ•œ์ ์ด๊ณ , self-attetion์€ ๋ณ‘๋ ฌํ™”ํ•˜๊ธฐ์— ์ง๊ด€์ ์ด์ง€ ๋ชปํ•จ.

3.3 Transformer Layer

  • Input sequence: ๊ธธ์ด t => hidden representation h๋ฅผ ๊ฐ Layer์— ๋Œ€ํ•ด ๊ณ„์‚ฐ(Attention function์ด ๋ชจ๋“  ํฌ์ง€์…˜์— ๋Œ€ํ•ด ๋™์‹œ์— ๊ณ„์‚ฐ์„ ํ•˜๋ฏ€๋กœ ๋ชจ๋“  h๋“ค์„ ํ•˜๋‚˜์˜ ํ–‰๋ ฌ๋กœ ๋ฌถ์Œ.)

  • Multi-Head Self-Attention: Representation ์Œ์— ๋Œ€ํ•˜์—ฌ ๋‘˜์˜ ๊ฑฐ๋ฆฌ์— ๊ด€๊ณ„ ์—†์ด ์˜์กด ์ •๋„๋ฅผ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์–ด ๋งŽ์€ ์ž‘์—…์— ํ™œ์šฉ๋จ.

=> ์„œ๋กœ ๋‹ค๋ฅธ representation subspace๋กœ๋ถ€ํ„ฐ ๋‹ค๋ฅธ ํฌ์ง€์…˜์˜ ์ •๋ณด๋ฅผ jointly attendํ•˜๋Š” ๊ฒƒ์˜ ์„ฑ๋Šฅ์ด ๋†’์Œ์ด ๋ฐํ˜€์ ธ ์žˆ์Œ. ์—ฌ๊ธฐ์— multi-head๋ฅผ ๋„์ž…ํ•จ.

์•ž์„œ ๊ตฌํ•œ H ํ–‰๋ ฌ => h subspace => h attention function => output => ํ•ฉ์นœ ํ›„์— ๋‹ค์‹œ ํ•œ๋ฒˆ project

  • W^Q, W^V, W^O๋Š” ๋ชจ๋‘ ํ•™์Šตํ•ด์•ผ ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ์ž„. ์ด๋ ‡๊ฒŒ ํ•™์Šต๋œ h๋ฅผ ํ™œ์šฉํ•˜์—ฌ Attention Value๋ฅผ ๊ตฌํ•˜๊ฒŒ ๋จ.

3.4 Embedding Layer

  • ํŠธ๋žœ์Šคํฌ๋จธ layer Trm์€ input sequence์˜ ์ˆœ์„œ์˜ ์˜๋ฏธ๋ฅผ ํŒŒ์•…ํ•˜์ง€ ๋ชปํ•จ. ๋”ฐ๋ผ์„œ Positional Embedding์„ ํ™œ์šฉํ•˜์—ฌ input item์— ํ•ด๋‹น ์ •๋ณด๋ฅผ ํ•จ๊ป˜ ๋„ฃ์Œ.

  • ๊ณ ์ •๋œ sinusoid embedding ๋Œ€์‹  ํ•™์Šต ๊ฐ€๋Šฅํ•œ positional embedding์„ ์‚ฌ์šฉํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๋” ๋†’์ž„.

3.5 Output Layer

L๊ฐœ์˜ layer๊ฐ€ ๊ณ„์ธต์ ์œผ๋กœ ๋ชจ๋“  ์ •๋ณด๋ฅผ ์ „ layer๋กœ๋ถ€ํ„ฐ ์ฃผ๊ณ  ๋ฐ›์€ ์ดํ›„์—, ์ตœ์ข…์ ์ธ output H^L์„ ๊ฐ item์— ๋Œ€ํ•ด์„œ ์ถœ๋ ฅํ•จ.

  • Time step t์— ๋Œ€ํ•˜์—ฌ item Vt๋ฅผ mask ์ฒ˜๋ฆฌํ•œ ํ›„์—, ํ•ด๋‹น item์„ ht๋กœ ์˜ˆ์ธกํ•˜๋Š” ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•จ. 2๊ฐœ์˜ layer๋กœ ์ด๋ฃจ์–ด์ง„ GELU activation์„ ํ™œ์šฉํ•˜์—ฌ output ๋ถ„ํฌ๋ฅผ ์ถœ๋ ฅํ•จ.

3.7 Discussion

๊ธฐ์กด Recommendtion ๋ชจ๋ธ๊ณผ์˜ ๋น„๊ต

  • SASRec: ์ขŒ -> ์šฐ๋กœ ์ด๋™ํ•˜๋Š” BERT4Rec์˜ ๋ฒ„์ „์ž„. Single head attention์„ ์‚ฌ์šฉํ•˜๊ณ , ๋‹ค์Œ ์•„์ดํ…œ์„ ์˜ˆ์ธกํ•˜ํ•จ.
  • CBOW & SG: BERT4Rec์ฒ˜๋Ÿผ ์ขŒ, ์šฐ ์ •๋ณด๋ฅผ ๋ชจ๋‘๋ฅผ ํ™œ์šฉํ•˜๋‚˜, ํ•˜๋‚˜์˜ self-attion layer๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ๋ชจ๋“  ์•„์ดํ…œ์— ๋Œ€ํ•˜์—ฌ ๋™์ผํ•œ weight๋ฅผ ๋ถ€์—ฌํ•จ.
  • BERT: Pretraining model์ธ ๋ฐ˜๋ฉด BERT4Rec์€ end-to-end ๋ชจ๋ธ์ž„. BERT์™€ ๋‹ค๋ฅด๊ฒŒ ๋‹ค์Œ ๋ฌธ์žฅ์˜ loss ๊ฐ’์„ ์ง€์šฐ๊ณ  embedding์„ ๋ถ„๋ฆฌํ•จ.

4. Experiments

  • ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹

1) Amazon Beauty
2) Steam
3) MovieLens

  • Evaluation Metrics

1) Hit Ratio
2) Normalized Discounted Cumulative Gain
3) Mean Reciprocal Rank

  • Baseline ๋ชจ๋ธ๊ณผ BERT4Rec์„ 4๊ฐœ์˜ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•˜์—ฌ ์„ฑ๋Šฅํ‰๊ฐ€ํ•œ ๊ฒฐ๊ณผ.

5. Conclusion and Future Work

๋ฏธ๋ž˜ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ: ์ œํ’ˆ์˜ ์นดํ…Œ๊ณ ๋ฆฌ์™€ ๊ฐ€๊ฒฉ, ์˜ํ™”์˜ ์บ์ŠคํŠธ์™€ ๊ฐ™์€ ํ’๋ถ€ํ•œ ํ•ญ๋ชฉ ํŠน์„ฑ์„ ๋‹จ์ˆœํ•˜๊ฒŒ ํ•ญ๋ชฉ ID๋กœ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ BERT4Rec์— ํ†ตํ•ฉํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœํ•จ.

profile
์œ ์ •๋ฏผ

0๊ฐœ์˜ ๋Œ“๊ธ€