๐Ÿ”ฅ[KHUDA_RecSys] ํ”„๋กœ์ ํŠธ ์ค€๋น„(7)๐Ÿ”ฅ

nothingismeยท2022๋…„ 11์›” 27์ผ

[KHUDA_RecSys]

๋ชฉ๋ก ๋ณด๊ธฐ
7/8
post-thumbnail

๐Ÿ—“๏ธ 1126 ~ nํŠธ ๊ธฐ๋ก

โ‡๏ธ ๋‚˜๋Š” ์–ด์ œ Colab Pro ์—์„œ ์ œ๊ณตํ•˜๋Š” ์ปดํ“จํŒ… ๋‹จ์œ„๋ฅผ ๋‹ค ์จ๋ฒ„๋ ธ๋‹ค. ํ”„๋ฆฌ๋ฏธ์—„ GPU ๋Œ๋ ธ๋”๋‹ˆ, ์™œ์ผ€ ๋น ๋ฅผ๊นŒ ์‹ถ์—ˆ๋Š”๋ฐ ์ด์œ ๊ฐ€ ์žˆ์—ˆ๊ตฌ๋‚˜. ๊ทธ๋ž˜์„œ ๊ฐœ๋งํ–ˆ๋‹ค. ์ผ๋ฐ˜ GPU ์“ฐ๋‹ˆ๊นŒ ๋ชจ์ž๋ฅด๋‹ค๊ณ  ๋‚˜์˜จ๋‹ค.
๐Ÿ—“๏ธ 1126
๐Ÿ“Œ 0ํŠธ(base) : 0.5646211434906451

๐Ÿ“Œ 1ํŠธ : 0.5627
weights์—์„œ logspace ๊ฐ’ np.flip() ์ ์šฉ

๐Ÿ“Œ 2ํŠธ : 0.5647 (base + 0.000091 )
"Buy2Buy" Co-visitation Matrix์—์„œ ๊ธฐ๊ฐ„ 14์ผ->10์ผ

๐Ÿ“Œ 3ํŠธ : 0.5647 ( base + 0.000094 )
"Buy2Buy" Co-visitation Matrix์—์„œ ๊ธฐ๊ฐ„ 14์ผ->7์ผ

๐Ÿ“Œ 4ํŠธ : 0.5647 ( base + 0.000110)
"Buy2Buy" Co-visitation Matrix์—์„œ ๊ธฐ๊ฐ„ 14์ผ->7์ผ
"Clicks" Co-visitation Matrix - Time Weighted ๊ธฐ๊ฐ„ ํ•˜๋ฃจ -> 12์‹œ๊ฐ„

๐Ÿ“Œ 5ํŠธ : 0.5646 -> ์‹œ๊ฐ„์„ ์ค„์ด๋Š” ๊ฒƒ์ด ์ข‹๋‹ค๋Š” ๊ฒƒ์— ๋Œ€ํ•œ Validation Test
"Buy2Buy" Co-visitation Matrix์—์„œ ๊ธฐ๊ฐ„ 14์ผ->7์ผ
"Clicks" Co-visitation Matrix - Time Weighted ๊ธฐ๊ฐ„ ํ•˜๋ฃจ -> 48์‹œ๊ฐ„

์ƒˆ๋ฒฝ 5์‹œ ๋ฐ˜๊นŒ์ง€ 5ํŠธ ๋Œ๋ ธ๋‹ค..^^.. ํ–‰๋ณตํ•ด
๐Ÿ—“๏ธ 1127
๐Ÿ“Œ 6ํŠธ : 0.5647
cart recall ์˜ฌ๋ฆฌ๋ ค๊ณ  ์ด๊ฒƒ์ €๊ฒƒ ๊ณ ์ณ๋ดค๋Š”๋ฐ ๋งํ–ˆ๋‹ค.

๐Ÿ“Œ 7ํŠธ : 0.5646
6ํŠธ์—์„œ ๋ฐ”๊ฟจ๋˜ type weight๋ฅผ ๊ทธ๋ƒฅ ์›๋ž˜๋Œ€๋กœ ํ–ˆ๋‹ค. ๋” ๋งํ–ˆ๋‹ค.

๐Ÿ“Œ 8ํŠธ : 0.5646
type weight๋Š” ๊ทธ๋ƒฅ ๊ฑด๋“ค์ง€ ๋ง์•„์•ผ ํ•˜๋‚˜. ๋˜ ๋งํ–ˆ๋‹ค.

ํ•˜๋ฃจ์ข…์ผ ๋งํ•˜๊ธฐ๋งŒ ํ•˜๊ณ  ๋๋‚ฌ๋‹ค. ์˜ค๋Š˜์˜ ๊ตํ›ˆ type weight๋Š” ๊ฑด๋“ค์ง€ ๋ง์•„์•ผ๊ฒ ๋‹ค
๐Ÿ—“๏ธ 1128
๐Ÿ“Œ 9ํŠธ : 0.5649
๊ฐ€์žฅ ์„ฑ๋Šฅ์ด ์ข‹์•˜๋˜ 4๋ฒˆ์งธ ๊ธˆ์ชฝ์ด๋ฅผ ๋ฐ๋ ค์˜จ ๋‹ค์Œ ์กฐ๊ธˆ ๋” ๊ณ ์ณ๋ดค๋‹ค. ์˜ค๋Š˜์˜ ๊ธˆ์ชฝ์ด ์ถœ๋ฐœ~!
"Buy2Buy" Co-visitation Matrix์—์„œ ๊ธฐ๊ฐ„ 14์ผ->7์ผ
"Clicks" Co-visitation Matrix - Time Weighted ๊ธฐ๊ฐ„ ํ•˜๋ฃจ -> 12์‹œ๊ฐ„
"Cart-Orders" Co-visitation Matrix - ๊ธฐ๊ฐ„ ํ•˜๋ฃจ -> 12์‹œ๊ฐ„
์•„ ๋ฏธ์นœ ๊ธˆ์ชฝ์•„!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ์•„ ๋ฏธ์นœ ์šฐ๋ฆฌ 9์จฐ ๊ธˆ์ชฝ์ด ๋„ˆ๋ฌด ์ž˜ํ–ˆ๋‹คใ…œใ…œใ…œใ…œ

๐Ÿ“Œ 10ํŠธ : 0.5650
9๋ฒˆ์งธ ๊ธˆ์ชฝ์ด์—์„œ ์‹œ๊ฐ„ ๊ฐ€์ค‘์น˜ ๋ฐฉ์‹์„ ๋ฐ”๊ฟ”๋ดค๋‹ค. ์ตœ๊ทผ ๋ฐ์ดํ„ฐ ๋ฐ˜์˜ํ• ์ˆ˜๋ก ์ข‹์€ ๊ฒƒ ๊ฐ™์•„์„œ ์ตœ๊ทผ์ผ์ˆ˜๋ก ๊ฐ€์ค‘์น˜๊ฐ€ ๋†’๊ฒŒ ์„ค์ •ํ•ด์คฌ๋‹ค. ์•„์ฃผ ์กฐ๊ธˆ ๋˜‘๋˜‘ํ•ด์กŒ๋‹ค.
df['wgt'] = 1 + 5*(df.ts_x - 1659304800)/(1662328791-1659304800)

๐Ÿ“Œ 11ํŠธ : 0.5649
์ตœ๊ทผ ๊ฐ€์ค‘์น˜๋ฅผ ๋” ๋†’๊ฒŒ ์ค˜๋ดค๋‹ค. ๋–จ์–ด์งˆ ๊ฑฐ ์˜ˆ์ƒํ•˜๊ณ  ๋Œ๋ฆฐ๋‹ค. ์—ญ์‹œ ๋–จ์–ด์กŒ๋‹ค.

๐Ÿ“Œ 12ํŠธ : 0.5648
cart recall์„ ๋†’์ด๊ธฐ ์œ„ํ•ด (2) Buy2Buy matrix์— type weighting์„ ์ค˜๋ณด์ž.
order recall์€ ๋–จ์–ด์ ธ๋„ ์ƒ๊ด€์—†๋‹ค. cart recall์ด ์˜ฌ๋ผ๊ฐ€๋Š”์ง€๋งŒ ํ™•์ธํ•ด๋ณด์ž.
์˜คํžˆ๋ ค click์ด ์˜ฌ๋ผ๊ฐ€๊ณ  cart์™€ order๋Š” ๋‘˜ ๋‹ค ๋–จ์–ด์กŒ๋‹ค. ์ •๋ง ์•Œ๋‹ค๊ฐ€๋„ ๋ชจ๋ฅด๊ฒ ๋‹ค.


๐Ÿ—“๏ธ 1127

โ‡๏ธ 6ํŠธ ์ „์— Kaggle Comments ์ •๋ฆฌ ๋ฐ ๋“ฑ๋“ฑ

โœ… ํ˜„์žฌ ์‚ฌ์šฉํ•˜๋Š” co-visitation matrix
Reference์—์„œ ์‚ฌ์šฉํ•˜๋Š” 3์ข…๋ฅ˜์˜ co-visitation matrix๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • (1) Click/Cart/Order ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  type-weighting์„ ํ•˜์—ฌ Cart/Order์„ ์˜ˆ์ธก
  • (2) Cart/Order ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  weighting ์—†์ด Cart/Order ์˜ˆ์ธก
  • (3) Click/Cart/Order ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  time-weighting์„ ํ•˜์—ฌ Click์„ ์˜ˆ์ธก

๋”ฐ๋ผ์„œ ์˜ˆ์ธก์ด ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆ ์ง€๊ฒŒ ๋œ๋‹ค. Clikck ์˜ˆ์ธก๊ณผ Buy ์˜ˆ์ธก.

  • Click์„ ์˜ˆ์ธกํ•  ๋•Œ (3) Matrix๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ์˜ˆ์ธกํ•œ๋‹ค.
  • Buy(Cart, Order)๋ฅผ ์˜ˆ์ธกํ•  ๋•Œ (1), (2) Matrix๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์˜ˆ์ธกํ•œ๋‹ค.

์ฆ‰ Cart์™€ Order๋Š” ์˜ˆ์ธกํ•  aid ์ข…๋ฅ˜๊ฐ€ ๋™์ผํ•˜์ง€๋งŒ, Click์€ ๋‹ค๋ฅธ ๊ฒƒ์ด๋‹ค.


โœ… ์‹œ๊ฐ„ ๊ฐ€์ค‘์น˜ ๋งค๊ธฐ๋Š” ๋ฐฉ์‹

df['wgt'] = 1 + 3*(df.ts_x - 1659304800)/(1662328791-1659304800)

๋ชจ๋“  ๋ฐ์ดํ„ฐ์—์„œ timestamp ์ตœ์†Œ๊ฐ’ 1659304800
๋ชจ๋“  ๋ฐ์ดํ„ฐ์—์„œ timestamp ์ตœ๋Œ“๊ฐ’A 1662328791
์„ค์ •ํ•˜๊ณ ์ž ํ•˜๋Š” ์ตœ๋Œ€ ๊ฐ€์ค‘์น˜ 4, ์ตœ์†Œ ๊ฐ€์ค‘์น˜ 1

point-slope equationd์„ ํ‘ผ๋‹ค.
x2 = 1662328791
x1 = 1659304800
y2 = 4
y1 = 1

์ตœ๋Œ“๊ฐ’์€ ์šฐ๋ฆฌ๊ฐ€ ์ตœ๊ทผ ๋ฐ์ดํ„ฐ์— ์–ผ๋งˆ๋‚˜ ์ค‘์š”์„ฑ์„ ๋‘˜ ์ง€๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค.

์‚ฌ๋žŒ๋“ค์ด ๋ฌด์—‡์„ ํด๋ฆญํ•˜๊ณ  ์นดํŠธ์— ๋‹ด๊ณ  ์ฃผ๋ฌธํ• ์ง€๋Š” ํ˜„์žฌ ํŠธ๋ Œ๋“œ์— ๋”ฐ๋ผ ๋ณ€ํ•˜๊ฒŒ ๋  ๊ฒƒ์ด๋‹ค. ๋”ฐ๋ผ์„œ ์ตœ๊ทผ์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๊ฐ€์ค‘์น˜๋ฅผ ๋‘๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•œ๋‹ค.


โœ… tail 30๊ฐœ๋งŒ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ 
์ฃผ์š”ํ•œ ์ด์œ ๋Š” ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋œ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด. ๊ทธ๋Ÿฌ๋‚˜ tail์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๋ฏธ๋ž˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๊ณ  ์ตœ๊ทผ์˜ ํŠธ๋ Œ๋“œ๋ฅผ ๋ฐ˜์˜ํ•˜๋Š” ์˜๋ฏธ์žˆ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๋œ๋‹ค.


โœ… ๊ฐ€๋Šฅํ•œ User/Item Feature

  • ์‚ฌ์šฉ์ž Feature
    ์‚ฌ์šฉ์ž๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ์•„์ดํ…œ์„ ํด๋ฆญํ–ˆ์—ˆ๋Š”์ง€
    ์‚ฌ์šฉ์ž๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ์•„์ดํ…œ์„ ๊ตฌ๋งคํ–ˆ์—ˆ๋Š”์ง€
    ์‚ฌ์šฉ์ž๊ฐ€ ํด๋ฆญํ•œ ํ‰๊ท  ์‹œ๊ฐ„
    ์‚ฌ์šฉ์ž๊ฐ€ ์ฃผ๋ฌธํ•œ ํ‰๊ท  ์‹œ๊ฐ„
    ์‚ฌ์šฉ์ž๊ฐ€ ์‹ค์ œ ์„ธ์…˜์„ ๋ช‡ ๊ฐœ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”์ง€(์‹ค์ œ ์„ธ์…˜์€ ํ™œ๋™ ๊ฐ„์˜ ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ์œผ๋กœ ์ •์˜๋จ)
    ๊ฐ ์‚ฌ์šฉ์ž ์‹ค์ œ ์„ธ์…˜์˜ ํ‰๊ท  ์•„์ดํ…œ ์ˆ˜
    ์‚ฌ์šฉ์ž๊ฐ€ ๋งŒ๋“  ํ™œ๋™์˜ ๋งˆ์ง€๋ง‰ ์š”์ผ(์˜ˆ: ์›”์š”์ผ, ํ™”์š”์ผ)
    ์‚ฌ์šฉ์ž๊ฐ€ ๋งŒ๋“  ํ™œ๋™์˜ ์ฒซ ๋ฒˆ์งธ ์š”์ผ์€ ๋ฌด์—‡์ธ์ง€
    ํด๋ฆญ ์‚ฌ์ด ํ‰๊ท  ์‹œ๊ฐ„
  • ์•„์ดํ…œ Feature
    has this item already been clicked by user
    has this item already been added to cart by user
    if already clicked, what is its relative order? 1 means last clicked, 2 means second to last clicked etc
    has user clicked this item multiple times already? how many
    how many items (that user has already clicked) have recommended this item with their co-visitation matrix
    when was date that this item was first seen in train
    how many times what this item clicked in train
    what is the average hour of day that this item is clicked
    what is the average hour of day that this item is ordered
    how popular is this item on monday (i.e. what percentage of monday clicks are this item)
    how popular is this item on tuesday
    what is the most common day of week this item is clicked
    count up all unique items that were clicked immediately before and after. How many unique items have been clicked immediately before and after. (For example, maybe item only has 10 unique items that get clicked before and after. Whereas another item has 1000 unique items clicked before and after)
    what percentage of users click this item more than once
    has this item ever been bought in train data

โœ… ์™œ click:cart:order ๊ฐ€์ค‘์น˜๊ฐ€ 1:6:3์ธ์ง€
1) click vs cart ๋น„๊ตํ–ˆ์„ ๋•Œ cart๊ฐ€ ๋” ์ค‘์š”ํ•˜๋‹ค. (์ด๊ฑด ๋‹น์—ฐ)
The basic idea is this. We want to predict future behavior, so the question is what is more important "someone previously clicked an item OR someone previously put an item in their cart". I would say the second is more important. That means the user will most likely click this item again or order this item. So we give lots of weight to previous behavior of "cart".

2) cart vs order ๋น„๊ตํ–ˆ์„ ๋•Œ cart๊ฐ€ ๋” ์ค‘์š”ํ•œ ์ด์œ 
Next, we wonder what is more important "someone previously put an item in their cart OR someone previously ordered an item". In both cases, the user might buy the item. But it is more likely that a user will buy an item if they put it in their cart versus buy an item if they have already bought the item. People do buy items multiple times so previously buying an item is more important than previously clicking an item (when predicting a future purchase).
= ์ด์ „์— cart์— ๋‹ด์•˜๋˜ ๋ฐ์ดํ„ฐ์™€ ์ด์ „์— orderํ–ˆ๋˜ ์•„์ดํ…œ ๋ชจ๋‘ ๊ตฌ๋งคํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด๋ฏธ ์ƒ€๋˜ ์•„์ดํ…œ์„ ์‚ฌ๋Š” ๊ฒƒ๋ณด๋‹ค๋Š” ์นดํŠธ์— ๋‹ด์•˜๋˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ด ๊ฐ€๋Šฅ์„ฑ์ด ๋” ๋†’๋‹ค. order๊ฐ€ click๋ณด๋‹ค ์ค‘์š”ํ•œ ๊ฑด ๋‹น์—ฐํ•˜๋‹ค.

profile
๊ฐ€๋ณ๊ฒŒ ์žฌ๋ฐŒ๋˜ ๊ฑฐ ๊ธฐ๋กํ•ด์š”

0๊ฐœ์˜ ๋Œ“๊ธ€