๐Ÿ’ฏ10/13 R ์‹ค์Šต ์ •๋ฆฌ

๊น€ํƒœ์ค€ยท2022๋…„ 12์›” 4์ผ
0

R-Studio

๋ชฉ๋ก ๋ณด๊ธฐ
2/5

์‹ค์Šต ํ’€์ด ์ •๋ฆฌ (10/13)

mpg ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด ๋‹ค์Œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์„ธ์š”.
1. ์ž๋™์ฐจ ๋ฐฐ๊ธฐ๋Ÿ‰์— ๋”ฐ๋ผ ๊ณ ์†๋„๋กœ ์—ฐ๋น„๊ฐ€ ๋‹ค๋ฅธ์ง€ ์•Œ์•„๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. displ(๋ฐฐ๊ธฐ๋Ÿ‰)
์ด 4 ์ดํ•˜์ธ ์ž๋™์ฐจ์™€ 5 ์ด์ƒ์ธ ์ž๋™์ฐจ ์ค‘ ์–ด๋–ค ์ž๋™์ฐจ์˜ hwy(๊ณ ์†๋„๋กœ ์—ฐ๋น„)๊ฐ€
ํ‰๊ท ์ ์œผ๋กœ ๋” ๋†’์€์ง€ ์•Œ์•„๋ณด์„ธ์š”

  1. ์ž๋™์ฐจ ์ œ์กฐ ํšŒ์‚ฌ์— ๋”ฐ๋ผ ๋„์‹œ ์—ฐ๋น„๊ฐ€ ๋‹ค๋ฅธ์ง€ ์•Œ์•„๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. manufacturer
    ๊ฐ€ โ€œaudiโ€์™€ โ€œtoyotaโ€์˜ cty(๋„์‹œ ์—ฐ๋น„)๊ฐ€ ํ‰๊ท ์ ์œผ๋กœ ๋” ๋†’์€์ง€ ์•Œ์•„๋ณด์„ธ์š”.
  1. โ€œchevroletโ€, โ€œfordโ€, โ€œhondaโ€ ์ž๋™์ฐจ์˜ ๊ณ ์†๋„๋กœ ์—ฐ๋น„ ํ‰๊ท ์„ ์•Œ์•„๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
    ์ด ํšŒ์‚ฌ๋“ค์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•œ ํ›„ hwy์˜ ์ „์ฒด ํ‰๊ท ์„ ๊ตฌํ•˜์„ธ์š”

< ํ’€์ด ์ฝ”๋“œ >

1. 
mpg <- as.data.frame(ggplot2::mpg)
head(mpg)
df_displ4 <- mpg %>% filter(displ <= 4)
df_displ5 <- mpg %>% filter(displ >= 5)
mean(df_displ4$hwy)
mean(df_displ5$hwy)

2. 
df_audi <- mpg %>% filter(manufacturer == 'audi')
df_toyota <- mpg %>% filter(manufacturer == 'toyota')
mean(df_audi$cty)
mean(df_toyota$cty)

3. 
df_mean <- mpg %>% filter(manufacturer %in% c('chevrolet', 'ford', 'honda'))
mean(df_mean$hwy)	

mpg ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด ๋‹ค์Œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์„ธ์š”.
1 mpg ๋ฐ์ดํ„ฐ์—์„œ class(์ž๋™์ฐจ ์ข…๋ฅ˜), cty(๋„์‹œ ์—ฐ๋น„) ๋ณ€์ˆ˜๋ฅผ ์ถ”์ถœํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋ฐ
์ดํ„ฐ๋ฅผ ๋งŒ๋“œ์„ธ์š”.
2 ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์˜ ์ผ๋ถ€๋ฅผ ์ถœ๋ ฅํ•˜์—ฌ ๋‘ ๋ณ€์ˆ˜๋กœ๋งŒ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”,.
3 ์ž๋™์ฐจ ์ข…๋ฅ˜์— ๋”ฐ๋ผ ๋„์‹œ ์—ฐ๋น„๊ฐ€ ๋‹ค๋ฅธ์ง€๋ฅผ ์•Œ์•„๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์•ž์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ด
์šฉํ•˜์—ฌ class๊ฐ€ โ€œsuvโ€์ธ ์ง€๋™์ฐจ์™€ โ€œcompactโ€์ธ ์ž๋™์ฐจ ์ค‘ ์–ด๋–ค ์ž๋™์ฐจ๊ฐ€ cty๊ฐ€ ๋”
๋†’์€์ง€ ์•Œ์•„๋ณด์„ธ์š”,.

< ํ’€์ด ์ฝ”๋“œ >

1.
df_mpg_new <- mpg %>% select(class, cty)
2.
head(df_mpg_new,10)
3.
df_mpg_class_suv <- df_mpg_new %>% filter(class == 'suv')
mean(df_mpg_class_suv$cty)
df_mpg_class_compact <- df_mpg_new %>% filter(class == 'compact')
mean(df_mpg_class_compact$cty)

mpg ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด ๋‹ค์Œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์„ธ์š”.
1. โ€œaudiโ€์—์„œ ์ƒ์‚ฐํ•œ ์ž๋™์ฐจ ์ค‘์— ์–ด๋–ค ๋ชจ๋ธ(class)์ด hwy(๊ณ ์†๋„๋กœ ์—ฐ๋น„)๊ฐ€ ๋†’์€์ง€
ํ™•์ธํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. โ€œaudiโ€์—์„œ ์ƒ์‚ฐ๋œ ์ฐจ๋Ÿ‰ ์ค‘ hwp๊ฐ€ 1 ~ 5์œ„์— ํ•ด๋‹นํ•˜๋Š” ์ž๋™
์ฐจ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ถœ๋ ฅํ•˜์„ธ์š”.

< ํ’€์ด ์ฝ”๋“œ >

mpg %>% filter(manufacturer == 'audi') %>% arrange(desc(hwy)) %>% head(5)

mpg ๋ฐ์ดํ„ฐ์—์„œ hwy์™€ cty ๋ณ€์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ํ•˜๋‚˜์˜ ํ†ตํ•ฉ ์—ฐ๋น„ ๋ณ€์ˆ˜๋ฅผ ๋งŒ๋“ค
์–ด ๋ถ„์„ํ•˜์„ธ์š”.
1. mpg ๋ฐ์ดํ„ฐ ๋ณต์‚ฌ๋ณธ์„ ๋งŒ๋“ค๊ณ , cty์™€ hwy๋ฅผ ๋”ํ•œ โ€œ์ด ์—ฐ๋น„ ๋ณ€์ˆ˜โ€๋ฅผ ์ถ”๊ฐ€ํ•˜์„ธ์š”.
2. โ€œ์ด ์—ฐ๋น„ ๋ณ€์ˆ˜โ€๋ฅผ 2๋กœ ๋‚˜๋ˆˆ โ€œํ‰๊ท  ์—ฐ๋น„ ๋ณ€์ˆ˜โ€๋ฅผ ์ถ”๊ฐ€ํ•˜์„ธ์š”.
3. โ€œํ‰๊ท  ์—ฐ๋น„ ๋ณ€์ˆ˜โ€๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ์ž๋™์ฐจ 3์ข…์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ถœ๋ ฅํ•˜์„ธ์š”,.
4. ์œ„์˜ ๋ฌธ์ œ๋ฅผ ํ•˜๋‚˜์˜ ํ•˜๋‚˜์˜ ์—ฐ๊ฒฐ๋œ ๊ตฌ๋ฌธ์œผ๋กœ ๋งŒ๋“ค์–ด ์‹คํ–‰ํ•˜์„ธ์š”

< ํ’€์ด ์ฝ”๋“œ >

1. 
df <- mpg
df %>% mutate(total = cty + hwy)
2. 
df %>% mutate(total = cty + hwy, average = (cty + hwy) / 2)
3,4. df %>% mutate(total = cty + hwy, average = (cty + hwy) / 2) %>% arrange(desc(average)) %>% head(3)

mpg์—์„œ ํšŒ์‚ฌ๋ณ„๋กœ โ€œsuvโ€ ์ž๋™์ฐจ์˜ ๋„์‹œ ๋ฐ ๊ณ ์†๋„๋กœ ํ†ตํ•ฉ ์—ฐ๋น„ ํ‰๊ท ์„ ๊ตฌํ•ด ๋‚ด
๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜๊ณ , 1 ~ 5์œ„๊นŒ์ง€ ์ถœ๋ ฅํ•˜์„ธ์š”

< ํ’€์ด ์ฝ”๋“œ >

mpg <- as.data.frame(ggplot2::mpg)
mpg %>% mutate(total = (cty +hwy)) %>% group_by(manufacturer) %>% summarise(mean_total = mean(total)) %>% arrange(desc(mean_total)) %>% head(5)

mpg ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์„ธ์š”.
1. class๋Š” โ€œsuvโ€, โ€compactโ€ ๋“ฑ ์ž๋™์ฐจ์˜ ํŠน์ง•์— ๋”ฐ๋ผ 7์ข…๋ฅ˜๋กœ ๋ถ„๋ฅ˜ํ•œ ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค. ์–ด๋–ค ์ฐจ
์ข…์˜ ๋„์‹œ ์—ฐ๋น„๊ฐ€ ๋†’์€์ง€ ๋น„๊ตํ•˜๋ ค๊ณ  ํ•˜๊ธฐ ์œ„ํ•ด class๋ณ„ ctyํ‰๊ท ์„ ๊ตฌํ•ด ๋ณด์„ธ์š”.
2. ์•ž์˜ ๊ฒฐ๊ณผ๋ฌผ์—์„œ class ๊ฐ’์€ ์•ŒํŒŒ๋ฒณ ์ˆœ์œผ๋กœ ์ •๋ ฌ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์–ด๋–ค ์ฐจ์ข…์˜ ๋„์‹œ ์—ฐ๋น„๊ฐ€
๋†’์€์ง€ ์‰ฝ๊ฒŒ ์•Œ์•„๋ณผ ์ˆ˜ ์žˆ๋„๋ก cty ํ‰๊ท ์ด ๋†’์€ ์ˆœ์œผ๋กœ ์ •๋ ฌํ•ด์„œ ์ถœ๋ ฅํ•˜์„ธ์š”.
3. ์–ด๋–ค ํšŒ์‚ฌ ์ž๋™์ฐจ์˜ hwy๊ฐ€ ๊ฐ€์žฅ ๋†’์€์ง€ ์•Œ์•„๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. hwy ํ‰๊ท ์ด ๊ฐ€์žฅ ๋†’์€ ํšŒ์‚ฌ
3๊ณณ์„ ์ถœ๋ ฅํ•˜์„ธ์š”.
4. ์–ด๋–ค ํšŒ์‚ฌ์—์„œ โ€œcompactโ€ ์ฐจ์ข…์„ ๊ฐ€์žฅ ๋งŽ์ด ์ƒ์‚ฐํ•˜๋Š”์ง€ ์•Œ์•„๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ํšŒ์‚ฌ๋ณ„
โ€œcompactโ€ ์ฐจ์ข…์˜ ์ˆ˜๋ฅผ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜์—ฌ ์ถœ๋ ฅํ•˜์„ธ์š”

< ํ’€์ด ์ฝ”๋“œ >

mpg <- as.data.frame(ggplot2::mpg)
1. mpg %>% group_by(class) %>% summarise(mean_cty = mean(cty))
2. mpg %>% group_by(class) %>% summarise(mean_cty = mean(cty)) %>% arrange(desc(mean_cty))
3. mpg %>% group_by(manufacturer) %>% summarise(mean_hwy = mean(hwy)) %>% arrange(desc(mean_hwy)) %>% head(3)
4. mpg %>% group_by(manufacturer) %>% filter(class == 'compact') %>% summarise(freq = n()) %>% arrange(desc(freq))

mpg ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์„ธ์š”.
1. mpg ๋ฐ์ดํ„ฐ์˜ fl ๋ณ€์ˆ˜๋Š” ์ž๋™์ฐจ์— ์‚ฌ์šฉํ•˜๋Š” ์—ฐ๋ฃŒ๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์—ฐ๋ฃŒ์˜ ์ด๋ฆ„๊ณผ ๊ฐ€๊ฒฉํ‘œ๋Š”
์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. fuel ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์™„์„ฑํ•˜์„ธ์š”,
2. mpg์˜ fl ๋ณ€์ˆ˜์™€ fuel์˜ fl์„ ์ฐธ์กฐํ•˜์—ฌ ์—ฐ๋ฃŒ ๊ฐ€๊ฒฉ์ธ price_fl ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜์„ธ์š”.
3. ๊ฒฐ๊ณผ ํ™•์ธ์„ ์œ„ํ•ด model, fl, price_fl ๋ณ€์ˆ˜๋ฅผ ์ถ”์ถœํ•ด์„œ ์•ž๋ถ€๋ถ„ 5ํ–‰์„ ์ถœ๋ ฅํ•˜์„ธ์š”.

< ํ’€์ด ์ฝ”๋“œ >

mpg <- as.data.frame(ggplot2::mpg)
1. fuel <- data.frame(fl = c('c', 'd', 'e', 'p', 'r'), price_fl = c(2.35, 2.38, 2.11, 2.76, 2.22), stringsAsFactors = F)
2. df <- left_join(mpg, fuel, by = 'fl')
3. df %>% select(model, fl, price_fl) %>% head(5)

ggplot2 ํŒจํ‚ค์ง€์— ํฌํ•จ๋œ midwest ๋ฐ์ดํ„ฐ ์ด์šฉํ•˜์„ธ์š”.
1. popadults๋Š” ํ•ด๋‹น ์ง€์—ญ์˜ ์„ฑ์ธ ์ธ๊ตฌ, poptotal์€ ์ „์ฒด ์ธ๊ตฌ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. midwest ๋ฐ์ด
ํ„ฐ์— โ€˜์ „์ฒด ์ธ๊ตฌ ๋Œ€๋น„ ๋ฏธ์„ฑ๋…„ ์ธ๊ตฌ ๋ฐฑ๋ถ„์œจโ€˜ ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜์„ธ์š”.
2. ๋ฏธ์„ฑ๋…„ ์ธ๊ตฌ ๋ฐฑ๋ถ„์œจ์ด ๊ฐ€์žฅ ๋†’์€ ์ƒ์œ„ 5๊ฐœ county์˜ ๋ฏธ์„ฑ๋…„ ๋ฐฑ๋ถ„์œจ์„ ์ถœ๋ ฅํ•˜์„ธ์š”.
3. ๋ถ„๋ฅ˜ ๊ธฐ์ค€์— ๋”ฐ๋ผ ๋ฏธ์„ฑ๋…„ ๋“ฑ๊ธ‰ ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ , ๊ฐ ๋“ฑ๊ธ‰์— ๋ช‡ ๊ฐœ์˜ ์ง€์—ญ์ด ์žˆ๋Š”์ง€ ์ถœ๋ ฅ
ํ•˜์„ธ์š”
4. large : 40% ์ด์ƒ, middle : 30~ 40% ๋ฏธ๋งŒ, small : 30% ๋ฏธ๋งŒ
5. popasian์€ ํ•ด๋‹น ์ง€์—ญ์˜ ์•„์‹œ์•„์ธ๊ตฌ๋ฅผ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค. โ€˜์ „์ฒด ์ธ๊ตฌ ๋Œ€๋น„ ์•„์‹œ์•„์ธ ์ธ๊ตฌ ๋ฐฑ๋ถ„
์œจ๏ผ‡๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  ํ•˜์œ„ 10๊ฐœ ์ง€์—ญ์˜ state, county, ์•„์‹œ์•„์ธ ์ธ๊ตฌ ๋ฐฑ๋ถ„์œจ์„ ์ถœ๋ ฅํ•˜์„ธ
์š”

< ํ’€์ด ์ฝ”๋“œ >

midwest <- as.data.frame(ggplot2::midwest)
1. midwest <- midwest %>% mutate(minority = (poptotal - popadults) / poptotal) 
2. midwest %>% arrange(desc(minority)) %>% select(county) %>% head(5)
3. midwest %>% mutate(grade = ifelse(minority >= 0.4, 'large', ifelse(minority >= 0.3, 'middle', 'small')))
4. midwest %>% mutate(perasian = popasian / poptotal) %>% arrange(perasian) %>% select(state, county, perasian) %>% head(10)
profile
To be a DataScientist

0๊ฐœ์˜ ๋Œ“๊ธ€