[๋…ผ๋ฌธ]DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems(2020, Ruoxi Wang)

Carvinยท2021๋…„ 2์›” 13์ผ
0
post-thumbnail

๐Ÿ“„ DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems(2020, Ruoxi Wang)

0. ํฌ์ŠคํŒ… ๊ฐœ์š”

์ตœ๊ทผ์€ ์•„๋‹ˆ์ง€๋งŒ ์ถ”์ฒœ ์‹œ์Šคํ…œ์—์„œ๋Š” user behavior or feedback ์— ๋Œ€ํ•œ ์ •๊ตํ•˜๊ณ  ๋‹ค์–‘ํ•œ feature์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด์„œ DeepFM๊ณผ Wide & Deep๊ณผ ๊ฐ™์€ ๋ชจ๋ธ์ด ์ œ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ๋ชจ๋ธ๋“ค์˜ ํ•ต์‹ฌ์€ feature๊ฐ„์˜ ์˜๋ฏธ์žˆ๋Š” latent vector๋ฅผ ์ฐพ๊ฑฐ๋‚˜ cross-product์˜ ์ž๋™ํ™”๋ฅผ feature engineering๋ฅผ ์ ์šฉํ•จ์œผ๋กœ์จ ๊ธฐ์กด์˜ ์ž…๋ ฅ์ด์—ˆ๋˜ user์™€ rating๊ณผ์˜ ๊ด€๊ณ„ ์ด์ƒ์˜ ํ•™์Šต์ž…๋‹ˆ๋‹ค.

ํ™œ๋™ํ•˜๊ณ  ์žˆ๋Š” ๋™์•„๋ฆฌ์˜ ์ปจํผ๋Ÿฐ์Šค์˜ ์™€์ธ ์ถ”์ฒœ ํ”„๋กœ์ ํŠธ์—์„œ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ์„ ์‹คํ—˜ํ•˜๋Š” ๊ณผ์ •์—์„œ DCN์ด๋ž€ ๋ชจ๋ธ์„ ์•Œ๊ฒŒ ๋˜์—ˆ๊ณ  ์ ์šฉํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. DCN, Deep Cross Network ๋˜ํ•œ feature๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ์— ์ดˆ์ ์„ ๋งž์ถ˜ ๋ชจ๋ธ์ด๋ฉฐ cross-product๋ฅผ ๋ช…์‹œ์ ์ด๊ณ  ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋ณธ๋ž˜ CTR(ํด๋ฆญ๋ฅ )์„ ์œ„ํ•œ ๋ชจ๋ธ์ธ DCN์ด์ง€๋งŒ ์ง„ํ–‰ํ–ˆ๋˜ ์™€์ธ ์ถ”์ฒœ ํ”„๋กœ์ ํŠธ์—์„œ๋Š” ์™€์ธ ์„ ํ˜ธ๋„(0/1)๋ฅผ ์˜ˆ์ธกํ•จ์— ์žˆ์–ด ํ•ด๋‹น ๋ชจ๋ธ์„ ์ ์šฉํ•˜๊ฒŒ ๋˜์—ˆ๊ณ  ์ด๋ฅผ ์œ„ํ•ด ๋…ผ๋ฌธ์„ ์ •๋ฆฌํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

ํ•ด๋‹น ํ”„๋กœ์ ํŠธ์— ๋Œ€ํ•ด์„œ๋Š” github๊ณผ ๋ฐœํ‘œ์˜์ƒ์—์„œ ํ™•์ธํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


ABSTRACT

  • Feature engineering ๋งŽ์€ ์˜ˆ์ธก ๋ชจ๋ธ์—์„œ ์ค‘์š”ํ•œ ๊ณผ์ • ๋ฐ ์—ญํ• ์„ ํ•˜๊ณ  ์žˆ์ง€๋งŒ, ๋ณต์žกํ•˜๊ณ  ์ง์ ‘ ํ•˜๋‚˜ํ•˜๋‚˜ ์ง„ํ–‰ํ•ด์•ผ ํ•œ๋‹ค๋Š” exhaustive search์™€ ๊ฐ™์€ ๊ณผ์ •์„ ๊ฑฐ์ณ์•ผ ํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์€ ํ•™์Šตํ•˜๋Š” ๊ณผ์ •์— ์žˆ์–ด, feature ๊ฐ„์˜ interaction์ด implicitํ•˜๊ฒŒ ๋ฐ˜์˜๋˜๊ธฐ๋Š” ํ•˜์ง€๋งŒ feature cross๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์— ์žˆ์–ด์„œ๋Š” ๋ฌด๋ฆฌ๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

  • ์ด๋ฒˆ ๋…ผ๋ฌธ์—์„œ๋Š” ์ถ”์ฒœ ์‹œ์Šคํ…œ์˜ ์„ฑ๊ณต์ ์ธ ์„ฑ๋Šฅ์„ ์œ„ํ•œ ํšจ๊ณผ์ ์ธ feature cross์˜ ํ•™์Šต์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด์ฃผ๋Š” ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ Deep & Cross Network, DCN ๋ชจ๋ธ์„ ์†Œ๊ฐœํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

    • feature cross, ํŠน์„ฑ ๊ต์ฐจ๋Š” ํ•˜๋‚˜ ์ด์ƒ์˜ feature๋ฅผ ์กฐํ•ฉํ•˜์—ฌ(crossing) ๋งŒ๋“ค์–ด์ง„ ํ•ฉ์„ฑ(synthetic) feature๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
    • feature๊ฐ„ crossing(๊ฒฐํ•ฉ)์„ ํ†ตํ•œ Feature Engineering์€ ๊ธฐ์กด์˜ feature๊ฐ€ ๊ฐœ๋ณ„์ ์œผ๋กœ ์ค„ ์ˆ˜ ์žˆ๋Š” ์˜ํ–ฅ ์ด์ƒ์„ ์ฃผ๋ฉฐ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋‚˜ ๋ณดํ†ต ๋ชจ๋ธ์— ์‚ฌ์šฉ๋˜๋Š” feature๋Š” ๋งค์šฐ sparse and largeํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ํšจ๊ณผ์ ์ธ cross๋ฅผ ์กฐํ•ฉํ•˜์—ฌ ์˜๋ฏธ์žˆ๋Š” feature cross๋ฅผ ์ฐพ๋Š” ๊ณผ์ •์€ exhaustive search(์™„์ „ ํƒ์ƒ‰)์„ ์š”๊ตฌํ•˜๊ฒŒ ๋˜๋ฉฐ ๋‹น์—ฐํžˆ ๋น„ํšจ์œจ์ ์ด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

  • Deep & Cross Network, DCN์€ feature interaction(feature๊ฐ„ ์ƒํ˜ธ์ž‘์šฉ)์„ ์ž๋™์ ์œผ๋กœ ๊ทธ๋ฆฌ๊ณ  ํšจ์œจ์ ์œผ๋กœ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด์ฃผ๋Š” ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ–ˆ์—ˆ์Šต๋‹ˆ๋‹ค.(2017, Deep & Cross Network for Ad Click Predictions)

    • ๊ทธ๋Ÿฌ๋‚˜ ๋ฐฉ๋Œ€ํ•œ ํ•™์Šต ๋ฐ์ดํ„ฐ์™€ ๋ชจ๋ธ ์„œ๋น™ ๊ด€์ ์—์„œ ๊ณผ๊ฑฐ 2017๋…„ DCN์€ feature interaction์„ ํ•™์Šตํ•จ์— ์žˆ์–ด ํ•œ๊ณ„์ ์„ ๋ณด์ด๊ธฐ๋„ ํ–ˆ์Šต๋‹ˆ๋‹ค.

    • ์•„์ง๊นŒ์ง€๋„ ๋ชจ๋ธ production ๊ณผ์ •์—์„œ ๋งŽ์€ DNN ๋ชจ๋ธ๋“ค์€ ์—ฌ์ „ํžˆ ์ „ํ†ต์ ์ธ feed-forward neural network์— ์˜์กดํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ด๋Š” feature cross์— ์žˆ์–ด ์กฐ๊ธˆ ๋น„ํšจ์œจ์ ์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๊ฒฐ๋ก ์ ์œผ๋กœ 2020๋…„์— ๋ฐœํ‘œ๋œ DCN ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฐฉ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ ํ™˜๊ฒฝ์˜ large scale์—์„œ ์ ์šฉ๊ฐ€๋Šฅํ•œ DCN-Version 2๋ฅผ ์ œ์•ˆํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • DCN-V2๋Š” feature interaction์„ ํ•™์Šตํ•จ์— ์žˆ์–ด ์—ฌ์ „ํžˆ ๋น„ํšจ์œจ์ ์ธ cost ๋ฌธ์ œ๊ฐ€ ๋‚จ์•„์žˆ๊ธฐ๋Š” ํ•˜์ง€๋งŒ ๋Œ€๋ถ€๋ถ„์˜ ๋Œ€ํ‘œ์ ์ธ dataset์—์„œ SOTA์˜ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

1. INTRODUCTION

  • Learning to rank, LTR ์€ ๋ฐฉ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ์™€ ์ •๋ณด๊ฐ€ ๋„˜์ณํ๋ฅด๋Š” ํ˜„๋Œ€์—์„œ ์ •๋ณด๊ฒ€์ƒ‰ ๋ถ„์•ผ์— ์žˆ์–ด ๊ต‰์žฅํžˆ ํฅ๋ฏธ๋กœ์šด ๊ณผ์ œ๋กœ ๋ฐœ์ „ํ•  ์ˆ˜ ์žˆ์—ˆ๊ณ  ๊ฒ€์ƒ‰ ๋ถ„์•ผ, ์ถ”์ฒœ ์‹œ์Šคํ…œ, ๊ด‘๊ณ  ๋ถ„์•ผ์—์„œ ๋จธ์‹ ๋Ÿฌ๋‹๊ณผ ๋”ฅ๋Ÿฌ๋‹์ด ๊ฒฐํ•ฉ๋˜๋ฉด์„œ ๋”์šฑ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ๋ฐœ์ „ํ•˜๊ณ  ์žˆ๋Š” ๋ถ„์•ผ๊ฐ€ ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

    • LTR ๋ชจ๋ธ์˜ ์ค‘์š”ํ•œ ์š”์†Œ ์ค‘ ํ•˜๋‚˜๋Š”, ํšจ๊ณผ์ ์ธ feature crosses๋ฅผ ํ†ตํ•ด ์ˆ˜ํ•™์ ์œผ๋กœ ๊ทธ๋ฆฌ๊ณ  ์‹ค๋ฌด์ ์œผ๋กœ ๋ชจ๋‘ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์— ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ํšจ๊ณผ์ ์ธ feature crosses๋Š” ๋งŽ์€ ๋ชจ๋ธ์˜ ๋†’์€ ์„ฑ๋Šฅ์— ์žˆ์–ด ๊ต‰์žฅํžˆ ์ค‘์š”ํ•œ ์—ด์‡ ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ๊ทธ ์ด์œ ๋Š” ๊ฐœ๋ณ„ feature๊ฐ€ ๋ชจ๋ธ์— ์ „๋‹ฌํ•˜๋Š” ์˜ํ–ฅ ์ด์™ธ์˜ ์ถ”๊ฐ€์ ์ธ ์ƒํ˜ธ์ž‘์šฉ ์ •๋ณด๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

    • ์˜ˆ๋ฅผ ๋“ค์–ด 'country'์™€ 'language'์˜ ์กฐํ•ฉ์€ ๊ฐ feature๊ฐ€ ์ „๋‹ฌํ•˜๋Š” ๊ธฐ๋Šฅ ์ด์ƒ์˜ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. (ํ•œ๋ฒˆ์— ์ดํ•ด๊ฐ€ ๊ฐ€์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค..)

    • ์ถ”๊ฐ€์ ์ธ ์˜ˆ๋กœ๋Š”, ๋ฏน์„œ๊ธฐ๋ฅผ ๊ณ ๊ฐ์—๊ฒŒ ์ถ”์ฒœํ•˜๋Š” ๋ชจ๋ธ์„ ์„ค๊ณ„ํ•˜๋ ค ํ•  ๋•Œ, ๋ฐ”๋‚˜๋‚˜์™€ ์š”๋ฆฌ์ฑ… ๊ตฌ๋งค ์—ฌ๋ถ€์— ๋Œ€ํ•œ ๊ฐ feature๋Š” ๊ฐœ๋ณ„์ ์ธ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•ด์ค„ ์ˆ˜๋Š” ์žˆ์ง€๋งŒ ๋ฐ”๋‚˜๋‚˜์™€ ์š”๋ฆฌ์ฑ… ๊ตฌ๋งค ์—ฌ๋ถ€์— ๋Œ€ํ•œ feature cross๋Š” ๊ณ ๊ฐ์ด ๋ฏน์„œ๊ธฐ๋ฅผ ๊ตฌ๋งคํ•จ์— ์žˆ์–ด ๋ณด๋‹ค ์ถ”๊ฐ€์ ์ธ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•ด ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

image

y=f(x1,x2,x3)=0.1x1+0.4x2+0.7x3+0.1x1x2+3.1x2x3+0.1x1x3y = f(x_1, x_2, x_3) = 0.1x_1 + 0.4x_2 + 0.7x_3 + 0.1x_1x_2 + 3.1x_2x_3 + 0.1x_1x_3
์ถœ์ฒ˜ : https://www.tensorflow.org/recommenders/examples/dcn
  • ์„ ํ˜• ๋ชจ๋ธ์—์„œ๋Š” ๋ณดํ†ต hand-craft๊ฐ€ ์š”๊ตฌ๋˜๋Š” feature engineering์„ ํ†ตํ•ด feature crosses๋ฅผ ์ง„ํ–‰ํ•˜๊ณ  ๋ชจ๋ธ์˜ expressiveness๋ฅผ ํ™•๋Œ€ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

    • ๋˜ํ•œ ๋Œ€๋ถ€๋ถ„์˜ ๋ฐ์ดํ„ฐ๊ฐ€ categoricalํ•œ web-scale applications ์ƒ์—์„œ๋Š” large and sparse์˜ ํŠน์ง•์ธ combinatorial search space(์ˆ˜๋งŽ์€ ์กฐํ•ฉ)๋ฅผ ์ˆ˜๋ฐ˜ํ•˜๊ธฐ ๋•Œ๋ฌธ์— exhaustive ํ•ฉ๋‹ˆ๋‹ค.(๋น„ํšจ์œจ์ , ์˜ค๋ž˜๊ฑธ๋ฆฌ๋Š”....)

    • ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ๋ณดํ†ต domain ์ง€์‹์ด ์š”๊ตฌ๋˜๊ณ  ๋ชจ๋ธ์„ generalize ํ•˜๊ฒŒ ์„ค๊ณ„ํ•˜๊ธฐ ๊ต‰์žฅํžˆ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

  • ์‹œ๊ฐ„์ด ์ง€๋‚˜๋ฉด์„œ, ๊ณ ์ฐจ์› ๋ฒกํ„ฐ๋ฅผ ์ €์ฐจ์› ๋ฒกํ„ฐ๋กœ ํˆฌ์˜ํ•˜๋Š” ์ž„๋ฒ ๋”ฉ ๊ธฐ์ˆ ์ด ๋ฐœ์ „ํ•˜๊ฒŒ ๋˜๋ฉด์„œ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • Factorization Machines(FMs)์€ 2๊ฐœ์˜ latent vectors(์ž ์žฌ ๋ณ€์ˆ˜)์˜ ๋‚ด์ ์„ ํ†ตํ•ด feature interactions์„ ๋ฝ‘์•„๋‚ด๊ธฐ๋„ ํ–ˆ์Šต๋‹ˆ๋‹ค.

    • ์„ ํ˜• ๋ชจ๋ธ์˜ ์ „ํ†ต์ ์ธ feature cross์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ, FM์€ ๋ณด๋‹ค generalizeํ•œ ์„ฑ๋Šฅ์„ ๋ฐ˜์˜ํ–ˆ๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
  • ์ง€๋‚œ ๋ช‡ ์‹ญ๋…„๊ฐ„, ์ปดํ“จํŒ… ํŒŒ์›Œ์™€ ๋ฐ์ดํ„ฐ์˜ ํญ๋ฐœ์  ์ฆ๊ฐ€๋กœ, ์‚ฐ์—…์—์„œ LTR ๋ชจ๋ธ์€ linear model๊ณผ FM-๊ธฐ๋ฐ˜ ๋ชจ๋ธ์—์„œ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ๋กœ ์ ์ง„์ ์œผ๋กœ ๋ณ€ํ™”ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ด๋Ÿฌํ•œ ๋ณ€ํ™” ๋ฐ ๋ฐœ์ „์€ ์ถ”์ฒœ ์‹œ์Šคํ…œ๊ณผ ๊ฒ€์ƒ‰ ๋ถ„์•ผ์˜ ์ „๋ฐ˜์ ์ธ ๋ชจ๋ธ ํ–ฅ์ƒ์„ ์ผ๊ถˆ๋ƒˆ์Šต๋‹ˆ๋‹ค.

    • ๋ณดํ†ต ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์€ feature interaction์„ ์ฐพ๋Š” ๊ฒƒ์— ๊ดœ์ฐฎ์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๊ฒƒ์œผ๋กœ ์—ฌ๊ฒจ์กŒ์ง€๋งŒ, ์‚ฌ์‹ค ์ตœ๊ทผ ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด์„œ ๋‹จ์ˆœ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ๋กœ๋Š” 2nd, 3rd feature cross๋ฅผ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์—๋Š” ๋น„ํšจ์œจ์ ์ด๋ผ๊ณ  ๋ฐํ˜€์กŒ์Šต๋‹ˆ๋‹ค.
  • ํšจ๊ณผ์ ์ธ feature crosses๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•œ ๋ณด๋‹ค ์ •ํ™•ํ•˜๊ณ  ํ†ต์ƒ์ ์ธ ํ•ด๊ฒฐ์ฑ…์œผ๋กœ๋Š” widerํ•˜๊ณ  deeperํ•œ Network๋ฅผ ํ†ตํ•ด model capacity๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

    • model capacity๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ฒŒ ๋˜๋ฉด ์ผ๋ฐ˜์ ์œผ๋กœ ๋ฌด๊ฑฐ์šด ๋ชจ๋ธ์ด๋ผ๊ณ  ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์€ ๋ชจ๋ธ์„ servingํ•˜๊ณ  productionํ•˜๋Š” ๊ณผ์ •์—์„œ ๋ฌธ์ œ์ ์ด ๋ฐœ๊ฒฌ๋˜๊ธฐ ๋•Œ๋ฌธ์— ์–‘๋‚ ์˜ ๊ฒ€์ด๋ผ๊ณ  ํ‘œํ˜„ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. (๋ฌด๊ฑฐ์šด ๋ชจ๋ธ์ด๋ž€, ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋งŽ๊ณ  ์ถ”๋ก ํ•˜๋Š”๋ฐ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๋Š” ๊ฒƒ ๋“ฑ์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.)
  • DCN ๋˜ํ•œ widerํ•˜๊ณ  deeperํ•œ Network๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ feature cross๋ฅผ ์ ์šฉํ•˜๋Š” ๊ฒƒ์—๋Š” ํšจ๊ณผ์ ์ด์—ˆ์ง€๋งŒ, large-scale ํ™˜๊ฒฝ์—์„œ์˜ production ๊ณผ์ •์—์„œ๋Š” ๊ต‰์žฅํžˆ ๋งŽ์€ ์–ด๋ ค์›€์„ ๋งˆ์ฃผ์ณค์Šต๋‹ˆ๋‹ค.

  • ๊ทธ๋ž˜์„œ 2017๋…„์— ๊ฑธ์ณ 2020๋…„์— ๋ฐœํ‘œํ•œ DCV-V2 ์—์„œ๋Š” ๊ธฐ์กด DCN์„ ๊ฐœ์„ ํ•˜๊ฒŒ ๋˜์—ˆ์œผ๋ฉฐ, ์ด๋ฏธ ์‹คํ—˜๋“ค์„ ํ†ตํ•ด learning to rank ์‹œ์Šคํ…œ์— ์„ฑ๊ณต์ ์œผ๋กœ ๋ฐฐํฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.

    • DCN-V2์˜ ํ•ต์‹ฌ์€ cross layer ๋ฅผ ํ†ตํ•œ explicitํ•œ feature interaction์„ ํ•™์Šตํ•˜๊ณ  deep layer ๋ฅผ ํ†ตํ•ด ์ถ”๊ฐ€์ ์ธ implicitํ•œ interaction์„ ํ•™์Šตํ•จ์— ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ตœ๊ทผ์˜ feature interaction์„ ํ•™์Šตํ•˜๋Š” ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ๋”ฅ๋Ÿฌ๋‹์„ ํ†ตํ•ด explicit, implicit feature crosses๋ฅผ ๋ชจ๋‘ ๋ฝ‘์•„๋‚ด๋Š” ๊ฒƒ์ด์—ˆ์Šต๋‹ˆ๋‹ค.

  • explicit crosses๋ฅผ ๋ชจ๋ธ๋งํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋Œ€๋ถ€๋ถ„์˜ ์ตœ๊ทผ ์—ฐ๊ตฌ๋“ค์€ ๋”ฅ๋Ÿฌ๋‹์—์„œ ๋น„ํšจ์œจ์ ์ธ multiplicative operations(๊ณฑ์…ˆ)์„ ์ œ์•ˆํ–ˆ์œผ๋ฉฐ, feature x1๊ณผ x2์˜ interaction์„ ํšจ๊ณผ์ ์ด๊ณ  ๋ช…๋ฐฑํ•˜๊ฒŒ ๋ชจ๋ธ๋งํ•˜๊ธฐ ์œ„ํ•ด function f(x1, x2)๋ฅผ ์„ค๊ณ„ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • ๋‹ค์Œ์€ ์–ด๋–ป๊ฒŒ explicit, implicit ํ•œ feature crosses๋ฅผ ์กฐํ•ฉํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆํ–ˆ๋˜ ์—ฐ๊ตฌ ๋ฐฉ๋ฒ•๋ก ๋“ค์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Parallel Structure

  • Wide & Deep Model:

Wide ๋ถ€๋ถ„์€ raw feature์˜ cross๋ฅผ ์ž…๋ ฅ์œผ๋กœ ํ•˜๊ณ  Deep ๋ถ€๋ถ„์€ DNN ๋ชจ๋ธ์ธ Wide & Deep model์œผ๋กœ ๋ถ€ํ„ฐ ์˜๊ฐ์„ ์–ป์–ด ํ•œ๋ฒˆ์— Parallel(๋ณ‘๋ ฌ์ )์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๊ตฌ์กฐ์ž„

  • ๊ทธ๋Ÿฌ๋‚˜ Wide ๋ถ€๋ถ„์„ ์œ„ํ•œ cross features๋ฅผ ์„ ๋ณ„ํ•˜๋Š” ๊ณผ์ •์—์„œ feature engineering ๋ฌธ์ œ๋ฅผ ๋งŒ๋‚˜๊ฒŒ ๋จ

  • ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  Wide & Deep model์€ ์ด๋Ÿฌํ•œ parallelํ•œ ๊ตฌ์กฐ๊ฐ€ ๋งŽ์€ ์—ฐ๊ตฌ์— ์ฐจ์šฉ๋˜์—ˆ๊ณ  ๋ฐœ์ „๋˜๊ณ  ์žˆ์Œ

Stacked Structure

  • ๋˜ ํ•˜๋‚˜์˜ ์—ฐ๊ตฌ๋Š” explicitํ•œ feature crosses๋ฅผ ๋ฝ‘์•„๋‚ด๋Š” interaction layer๋ฅผ ์ž„๋ฒ ๋”ฉ layer์™€ DNN ๋ชจ๋ธ ์ค‘๊ฐ„์— ์œ„์น˜๋˜์–ด ์žˆ๋Š” stacked(์Œ“์—ฌ ์žˆ๋Š”)์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๊ตฌ์กฐ์ž„

  • ์ด ๊ตฌ์กฐ๋Š” ์ดˆ๋ฐ˜์— feature interaction์„ ๋ฝ‘์•„๋‚ด๊ณ  DNN ๋ชจ๋ธ์—์„œ ์ถฉ๋ถ„ํ•œ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๊ฒŒ ๋จ

3. PROPOSED ARCHITECTURE: DCN-V2

  • ์ด๋ฒˆ ๋ชฉ์ฐจ์—์„œ๋Š” explicit, implicit feature interactions๋ฅผ ๋ชจ๋‘ ํ•™์Šตํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ชจ๋ธ์ธ DCN-V2์˜ ๊ตฌ์กฐ์— ๋Œ€ํ•ด์„œ ๋‹ค๋ฃจ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

  • DCN-V2๋Š” embedding layer๋ฅผ ์‹œ์ž‘์œผ๋กœ explicit feature interaction์„ ๋ฝ‘์•„๋‚ด๋Š” ๋ณต์ˆ˜์˜ cross layer๋ฅผ ํฌํ•จํ•˜๋Š” cross network, ๊ทธ๋ฆฌ๊ณ  implicit feature interactions๋ฅผ ๋ฝ‘์•„๋‚ด๋Š” deep network๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๊ธฐ์กด DCN์— ๋น„ํ•ด DCN-V2๊ฐ€ ๊ฐœ์„ ๋œ ์ ์€ ์‰ฌ์šด ๋ฐฐํฌ๋ฅผ ์œ„ํ•ด elegant formula๋ฅผ ์œ ์ง€ํ•จ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  web-scale production data ๊ด€์ ์—์„œ ๋ณต์žกํ•œ explicit cross๋ฅผ ๋ชจ๋ธ๋งํ•จ์œผ๋กœ์จ ํ’๋ถ€ํ•œ ํ‘œํ˜„๋ ฅ์„ ๊ฐ€์ง€๊ฒŒ ๋˜์—ˆ๊ณ  production ๊ณผ์ •์— ๋ณด๋‹ค ์ตœ์ ํ™” ๋˜์–ด ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

  • cross network์™€ deep network๋ฅผ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ์‹์˜ ์ฐจ์ด๋กœ stacked๊ณผ parallelํ•œ 2๊ฐ€์ง€ ๊ตฌ์กฐ๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

3-1. Embedding Layer

  • embedding layer์—๋Š” categorical๊ณผ dense feature๊ฐ€ ์ž…๋ ฅ์œผ๋กœ ๋“ค์–ด๊ฐ€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

  • ์ตœ์ข… embedded vector์€ categorical features์˜ ์ž„๋ฒ ๋”ฉ๊ณผ dense features์˜ ์ •๊ทœํ™”๋œ ๊ฐ’์ด concat๋˜์–ด ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.

3-2. Cross Network

  • DCN-V2์˜ ํ•ต์‹ฌ์€ explicit feature crosses๋ฅผ ๋ฝ‘์•„๋‚ด๋Š” cross layer์— ์žˆ์œผ๋ฉฐ, xl+1x_{l+1} layer๊ฐ€ ๊ฐ€์ง€๋Š” ์ˆ˜์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • x0x_{0}์€ ๊ธฐ๋ณธ feature๋ฅผ ์˜๋ฏธํ•˜๋Š” embedding layer์˜ ์ถœ๋ ฅ ๋ฒกํ„ฐ๋ผ๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • xl,xl+1x_{l}, x_{l+1}์€ ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ณ„์‚ฐ๋˜๋Š” ๊ฐ layer์˜ ์ž…๋ ฅ ๋ฐ ์ถœ๋ ฅ ๋ฒกํ„ฐ๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ Wl,blW_{l}, b_{l}๋Š” ๊ฐ layer์—์„œ ํ•™์Šต๋˜๋Š” weight์™€ bias ๋ฒกํ„ฐ์ž…๋‹ˆ๋‹ค.

xl+1=x0โŠ™(Wlxl+bl)+xlx_{l+1} = x_{0} \odot (W_{l}x_{l} + b_{l}) + x_{l}
  • ๊ฐ layer์—์„œ ๋ฐœ์ƒํ•˜๋Š” cross layer function์„ ์‹œ๊ฐํ™”ํ•˜์—ฌ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • ๊ฐ layer์˜ ์ž…๋ ฅ์ธ xlx_{l}์€ ๋˜‘๊ฐ™์ด weight์™€ bias๋กœ linear ๊ณ„์‚ฐ์ด ์ด๋ค„์ง€๊ฒŒ ๋œ ๋’ค์—, embedding layer์˜ ์ถœ๋ ฅ์ธ x0x_{0}์„ element-wise product ํ•ด์คŒ์œผ๋กœ์จ feature๊ฐ„ interaction์ด ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค.

    • ์ฒซ๋ฒˆ์งธ cross layer์˜ ๊ณ„์‚ฐ ๊ณผ์ •์€, x0x_{0}๊ฐ€ linear ๊ณ„์‚ฐ์„ ํ†ต๊ณผํ•œ ๊ฒฐ๊ณผ์™€ x0x_{0}๊ฐ€ element-wise product๋˜๋ฉด์„œ update๋˜๋Š” weight๊ฐ€ ๊ธฐ์กด feature๊ฐ„์˜ interaction ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • 1๊ฐœ์˜ cross layer๊ฐ€ ๊ณ„์‚ฐ๋˜๋Š” ๊ณผ์ •์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ์‚ดํŽด๋ณด๋ฉด, 2nd feature interaction์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” weight๋ฅผ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜๋ฉฐ weight๋ฅผ 2์ฐจ์›์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
x_0 = [v1, v2, v3] # feature embedding vector

W = [[w11, w12, w13],
     [w21, w22, w23],
     [w31, W32, W33]]

x_1 = [[W11*V1*V1, W12*V1*V2, W13*V1*V3],
       [W21*V2*V1, W22*V2*V2, W23*V2*V3],
       [W31*V3*V1, W32*V3*V2, W33*V3*V3]]

  • ํ•ด๋‹น heatmap์€ ์ฒซ๋ฒˆ์งธ cross layer๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” weight๋ฅผ ์‹œ๊ฐํ™”ํ•œ ๊ฒƒ์ด๋ฉฐ, ๊ฐ weight๊ฐ€ feature๊ฐ„์˜ 2์ฐจ์› ์กฐํ•ฉ์„ ๊ธฐ๋ฐ˜์œผ๋กœ update๋จ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ด ๋•Œ, weight๊ฐ€ ๋” ๋†’์„์ˆ˜๋ก ํ•ด๋‹น feature ์กฐํ•ฉ์ด ํ•ด๋‹น ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์— ์žˆ์–ด ๊ฐ•ํ•œ ์ƒํ˜ธ์ž‘์šฉ์„ ์˜๋ฏธํ•œ๋‹ค๊ณ  ํ•ด์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3-3. Deep Network

  • Deep Network๋Š” ์ „ํ˜•์ ์ธ feed-forward neural network๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์€ linear ๊ณ„์‚ฐ๊ณผ activation fuction์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Œ
hl+1=f(Wlhl+bl)h_{l+1} = f(W_{l}h_{l} + b_{l})

3-4. Deep and Cross Combination

  • ์•ž ์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ์ด, cross network์™€ deep network๋ฅผ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ์‹์˜ ์ฐจ์ด๋กœ Stacked Structure๊ณผ Parallel Structure๋กœ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ฐ์ดํ„ฐ์— ๋”ฐ๋ผ์„œ ์„ฑ๋Šฅ์˜ ์ฐจ์ด๋ฅผ ๋ณด์ธ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

  • Stacked Structure์€ x0x_{0}๊ฐ€ cross network๋ฅผ ํ†ต๊ณผํ•œ ํ›„์— deep network๋ฅผ ํ†ต๊ณผํ•˜๋Š” ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.

    • cross network์˜ ์ถœ๋ ฅ๊ฐ’์ธ xLcx_{L_c}๊ฐ€ deep network์˜ ์ž…๋ ฅ๊ฐ’์ธ h0h_0์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.

image

  • Parallel Structure์€ x0x_{0}๊ฐ€ ๋™์‹œ์— cross network์™€ deep network์— ์ž…๋ ฅ๋˜๋Š” ๊ตฌ์กฐ๋กœ cross network์˜ ์ถœ๋ ฅ์ธ xLcx_{L_c}์™€ deep network์˜ ์ถœ๋ ฅ์ธ hLdh_{L_d}์˜ ๊ฒฐํ•ฉ(concatenate)์œผ๋กœ ์ตœ์ข… output layer๋ฅผ ํ†ต๊ณผํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

image

  • ํ•™์Šต์„ ์œ„ํ•œ ์†์‹คํ•จ์ˆ˜, loss function์œผ๋กœ๋Š” binary label์˜ learning to rank system์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” Log Loss๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
loss=โˆ’1Nโˆ‘i=1Nyilog(yi^)+(1โˆ’yi)log(1โˆ’yi^)+ฮปโˆ‘lโˆฃโˆฃWlโˆฃโˆฃ22loss = - \frac {1}{N} \sum^{N}_{i = 1} y_{i} log(\hat{y_i}) + (1-y_i)log(1-\hat{y_i}) + \lambda \sum_l ||W_l||^2_2

โ—๏ธLog Loss ์„ค๋ช…: ์ถœ์ฒ˜ - [๋ฐ์ด์ฝ˜ ํ‰๊ฐ€์‚ฐ์‹] log loss์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž, ๋ฐ์ด์ฝ˜

์ฐธ๊ณ  ์ž๋ฃŒ

0๊ฐœ์˜ ๋Œ“๊ธ€