Kubernetes, QoS & ResourceQuota

Jeonghak Choยท2025๋…„ 6์›” 22์ผ

Kubernetes

๋ชฉ๋ก ๋ณด๊ธฐ
17/20

๐Ÿ“— ๊ฐœ์š” - QOS

๐Ÿณ๏ธโ€๐ŸŒˆ [๊ถ๊ธˆํ•œ์ ]

  • QoS ์™œ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋‚˜
  • QoS ์™€ ResourceQuota ์ฐจ์ด

๐Ÿ”—[๋ชฉ์ฐจ]

QoS (Quality of Service) ๊ฐœ์š”

์ฟ ๋ฒ„๋„คํ‹ฐ์Šค(Kubernetes)์˜ QoS(Quality of Service) ๋Š” Pod์˜ ๋ฆฌ์†Œ์Šค ์š”์ฒญ/์ œํ•œ ์„ค์ •์— ๋”ฐ๋ผ ๋…ธ๋“œ์—์„œ์˜ ์šฐ์„ ์ˆœ์œ„์™€ ์ž์› ํ™•๋ณด ์ •๋„๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด๋‹ค. ์ด๋Š” ๋ฆฌ์†Œ์Šค ๋ถ€์กฑ ์ƒํ™ฉ์—์„œ ์–ด๋–ค Pod์ด ๋จผ์ € ์ œ๊ฑฐ๋˜๊ฑฐ๋‚˜ ์ œํ•œ๋ ์ง€๋ฅผ ํŒ๋‹จํ•˜๋Š” ๊ธฐ์ค€์ด ๋œ๋‹ค. QoS๋Š” ๋…ธ๋“œ์— ๋ฆฌ์†Œ์Šค๊ฐ€ ๋ถ€์กฑํ•  ๋•Œ, Eviction (์ถ•์ถœ) ์ˆœ์„œ์— ์˜ํ–ฅ์„ ์ค€๋‹ค. Pod ์„ฑ๋Šฅ ๋ฐ ์•ˆ์ •์„ฑ ๋ณด์žฅ์„ ์œ„ํ•ด ์ ์ ˆํ•œ QoS ๋“ฑ๊ธ‰ ์„ค์ •์ด ํ•„์š”ํ•˜๋‹ค.

QoS ํด๋ž˜์Šค์˜ ์ข…๋ฅ˜์™€ ๊ธฐ์ค€

QoS ํด๋ž˜์ŠคCPU/๋ฉ”๋ชจ๋ฆฌ ์„ค์ • ๊ธฐ์ค€์„ค๋ช…
Guaranteed๋ชจ๋“  ์ปจํ…Œ์ด๋„ˆ์— ๋Œ€ํ•ด requests == limits ๊ฐ€ ๋ช…์‹œ๋จ๋ฆฌ์†Œ์Šค๋ฅผ ๊ฐ€์žฅ ์•ˆ์ •์ ์œผ๋กœ ๋ณด์žฅ๋ฐ›์œผ๋ฉฐ, ๋…ธ๋“œ ๋ฆฌ์†Œ์Šค ๋ถ€์กฑ ์‹œ ๊ฐ€์žฅ ๋งˆ์ง€๋ง‰์— ์ถ•์ถœ(evict) ๋ฉ๋‹ˆ๋‹ค.
Burstable์ผ๋ถ€ ๋˜๋Š” ์ „๋ถ€ ์ปจํ…Œ์ด๋„ˆ์— requests โ‰  limits, ๋˜๋Š” ์ผ๋ถ€๋งŒ ๋ช…์‹œ๋จ๊ธฐ๋ณธ ๋ณด์žฅ๋Ÿ‰์€ requests์ด๊ณ , ์ƒํ™ฉ์— ๋”ฐ๋ผ limits๊นŒ์ง€ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ. ์ค‘๊ฐ„ ์šฐ์„ ์ˆœ์œ„์ž…๋‹ˆ๋‹ค.
BestEffortrequests์™€ limits๊ฐ€ ๋ชจ๋‘ ์„ค์ •๋˜์ง€ ์•Š์Œ๋ฆฌ์†Œ์Šค๊ฐ€ ๋‚จ๋Š” ํ•œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๊ฐ€์žฅ ๋จผ์ € ์ถ•์ถœ๋˜๋ฉฐ ์•ˆ์ •์„ฑ ๋‚ฎ์Œ. ์‹คํ—˜์  ์›Œํฌ๋กœ๋“œ์— ์ ํ•ฉ.

QoS ์š”์•ฝ ๋น„๊ต

ํ•ญ๋ชฉGuaranteedBurstableBestEffort
์•ˆ์ •์„ฑ์ตœ๊ณ ์ค‘๊ฐ„์ตœ์ €
๋ฆฌ์†Œ์Šค ๋ณด์žฅ์™„์ „ ๋ณด์žฅ์ผ๋ถ€ ๋ณด์žฅ์—†์Œ
์ถ•์ถœ ์šฐ์„ ์ˆœ์œ„๊ฐ€์žฅ ๋‚ฎ์Œ์ค‘๊ฐ„๊ฐ€์žฅ ๋†’์Œ
์กฐ๊ฑดrequests == limits์ผ๋ถ€๋งŒ ์„ค์ • or ๋‹ค๋ฅด๊ฑฐ๋‚˜ ์—†์Œ๋‘˜ ๋‹ค ์—†์Œ

QoS (Quality of Service) ๋ฆฌ์†Œ์Šค ์„ค์ • ๋ฐฉ์‹

  • Pod ๋‹จ์œ„ ๋ฆฌ์†Œ์Šค ๋ณด์žฅ ๋“ฑ๊ธ‰
  • ๋…ธ๋“œ ์ž์›์ด ๋ถ€์กฑํ•˜๋ฉด ์šฐ์„ ์ˆœ์œ„ ๋‚ฎ์€ Pod๋ถ€ํ„ฐ ๊ฐ•์ œ ์ข…๋ฃŒ๋จ
  • requests์™€ limits ์„ค์ • ๋ฐฉ์‹์— ๋”ฐ๋ผ ์ž๋™ ๋ถ„๋ฅ˜๋จ
๋ฆฌ์†Œ์Šค ์„ค์ • ๋ฐฉ์‹QoS ๋“ฑ๊ธ‰์„ค๋ช…
requests == limits (๋ชจ๋‘ ๋ช…์‹œ)Guaranteed๊ฐ€์žฅ ๋†’์€ ๋ณด์žฅ, ์ ˆ๋Œ€ ๋จผ์ € ์ข…๋ฃŒ ์•ˆ ๋จ
์ผ๋ถ€ requests ์ง€์ •, limits๋Š” ๋‹ค๋ฅด๊ฑฐ๋‚˜ ์—†์ŒBurstable๋ณดํ†ต ์ˆ˜์ค€์˜ ๋ณด์žฅ
requests, limits ๋‘˜ ๋‹ค ์—†์ŒBestEffort๊ฐ€์žฅ ๋‚ฎ์€ ๋ณด์žฅ, ์ž์› ๋ถ€์กฑ ์‹œ ๊ฐ€์žฅ ๋จผ์ € ์ข…๋ฃŒ

์ฃผ์š” ์šฉ๋„: eviction(๊ฐ•์ œ ์ข…๋ฃŒ), scheduling, ๋ฆฌ์†Œ์Šค ์ถฉ๋Œ ์‹œ ์šฐ์„ ์ˆœ์œ„ ํŒ๋‹จ

QoS ํ•„์š”์„ฑ

QoS๋Š” ๋‹จ์ˆœํ•œ ๋ถ„๋ฅ˜๊ฐ€ ์•„๋‹ˆ๋ผ ํด๋Ÿฌ์Šคํ„ฐ ๋ฆฌ์†Œ์Šค ๊ด€๋ฆฌ์˜ ํ•ต์‹ฌ ์›์น™์ด๋‹ค. QoS๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์œผ๋ฉด ๋ฆฌ์†Œ์Šค๊ณผ๋‹ค ์‚ฌ์šฉํ˜„์ƒ์ด ๋ฐœ์ƒํ•˜์—ฌ ํ•ต์‹ฌ ์„œ๋น„์Šค๊ฐ€ ๋จผ์ € ์ฃฝ๋Š” ์‚ฌ๊ณ ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.

๋ฆฌ์†Œ์Šค ๋ถ€์กฑ ์ƒํ™ฉ์—์„œ ์šฐ์„ ์ˆœ์œ„ ๊ฒฐ์ •

ํด๋Ÿฌ์Šคํ„ฐ์—์„œ CPU๋‚˜ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ด์ง€๋ฉด, kubelet์€ ์ผ๋ถ€ Pod์„ ๊ฐ•์ œ๋กœ ์ข…๋ฃŒ(Eviction) ์‹œ์ผœ์•ผ ํ•œ๋‹ค. ์ด๋•Œ, ์–ด๋–ค Pod์„ ๋จผ์ € ์ฃฝ์ผ์ง€ ๊ฒฐ์ •ํ•˜๋Š” ๊ธฐ์ค€์ด QoS๋‹ค.

  • BestEffort โ†’ ๊ฐ€์žฅ ๋จผ์ € ์ œ๊ฑฐ๋จ
  • Burstable โ†’ ์ค‘๊ฐ„
  • Guaranteed โ†’ ๊ฐ€์žฅ ๋งˆ์ง€๋ง‰๊นŒ์ง€ ๋ณดํ˜ธ๋จ

์•ˆ์ •์  ์„œ๋น„์Šค ์œ ์ง€๋ฅผ ์œ„ํ•ด ์ค‘์š” ์„œ๋น„์Šค๋Š” Guaranteed๋กœ ์šด์˜ํ•ด์•ผ ํ•จ

๋ฆฌ์†Œ์Šค ์‚ฌ์šฉ ์˜ˆ์ธก ๋ฐ ์ปจํŠธ๋กค

QoS๋Š” requests์™€ limits ์„ค์ •์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฒฐ์ •๋œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด Pod์ด ์–ผ๋งˆ๋‚˜ ์ž์›์„ ๋ณด์žฅ๋ฐ›๊ณ , ์ตœ๋Œ€ ์–ผ๋งˆ๊นŒ์ง€ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์ œ์–ด ๊ฐ€๋Šฅํ•˜๋‹ค. CPU ์Šค๋กœํ‹€๋ง, ๋ฉ”๋ชจ๋ฆฌ OOM ๋“ฑ ์˜ˆ์ธก ๋ถˆ๊ฐ€ ๋ฌธ์ œ๋ฅผ ์ค„์ธ๋‹ค.

์šด์˜์ž๊ฐ€ ์ž์› ๋‚ญ๋น„ ์—†์ด ํšจ์œจ์ ์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ์Œ

์ž์› ๊ฒฉ๋ฆฌ์™€ ๊ณต์ •ํ•œ ์ž์› ๋ถ„๋ฐฐ

  • QoS๊ฐ€ ์—†์œผ๋ฉด, ์–ด๋–ค Pod์ด ๊ณผ๋„ํ•˜๊ฒŒ ์ž์›์„ ์ ์œ ํ•  ์ˆ˜ ์žˆ์Œ โ†’ ๋‹ค๋ฅธ Pod ์„ฑ๋Šฅ ์ €ํ•˜ ๋ฐœ์ƒ
  • QoS ์„ค์ •์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ ๋ฆฌ์†Œ์Šค๋ฅผ ๊ฒฉ๋ฆฌํ•˜๊ณ , ๊ณต์ •ํ•˜๊ฒŒ ๋ฐฐ๋ถ„

๋ฉ€ํ‹ฐ ํ…Œ๋„Œ์‹œ ํ™˜๊ฒฝ์—์„œ ์„œ๋น„์Šค ๊ฐ„ ๊ฐ„์„ญ์„ ๋ฐฉ์ง€

์ž๋™ ์Šค์ผ€์ผ๋ง ๋ฐ ์˜คํ† ๋ฆฌ์ปค๋ฒ„๋ฆฌ ์ง€์›

  • requests ๊ธฐ๋ฐ˜์œผ๋กœ HPA(Horizontal Pod Autoscaler)๊ฐ€ ์ž‘๋™
  • QoS + Pod Priority + HPA๋Š” ์ž๋™ ๋ณต๊ตฌ ์ „๋žต์˜ ๊ธฐ๋ฐ˜

์•ˆ์ •์  Auto-healing & Scaling์„ ์œ„ํ•ด QoS ์„ค์ •์€ ํ•„์ˆ˜

์šด์˜ ํ‘œ์ค€ํ™” ๋ฐ ์ •์ฑ… ์ ์šฉ

  • ์„œ๋น„์Šค ์ข…๋ฅ˜(์˜ˆ: ๋ฐฑ์—”๋“œ, ํ”„๋ก ํŠธ์—”๋“œ, ๋ฐฐ์น˜ ๋“ฑ)์— ๋”ฐ๋ผ QoS ํ…œํ”Œ๋ฆฟ์„ ๋‚˜๋ˆ„์–ด ๊ด€๋ฆฌํ•˜๋ฉด, ์šด์˜์ด ์ผ๊ด€๋˜๊ณ  ์˜ˆ์ธก ๊ฐ€๋Šฅ
  • ํด๋Ÿฌ์Šคํ„ฐ ์ •์ฑ…(์˜ˆ: PodDisruptionBudget, Eviction ์ •์ฑ… ๋“ฑ)๊ณผ ์—ฐ๊ณ„ํ•˜์—ฌ ์„ธ๋ฐ€ํ•œ ์ œ์–ด ๊ฐ€๋Šฅ

QoS ์šด์˜ ์ ์šฉ ๊ฐ€์ด๋“œ๋ผ์ธ

์šด์˜ ํ™˜๊ฒฝ์—์„œ Kubernetes์˜ QoS(Quality of Service)๋Š” ์„œ๋น„์Šค ์•ˆ์ •์„ฑ ๋ณด์žฅ๊ณผ ๋ฆฌ์†Œ์Šค ์ตœ์ ํ™”๋ฅผ ์œ„ํ•œ ํ•ต์‹ฌ ์š”์†Œ์ด๋‹ค. ์•„๋ž˜๋Š” ์‹ค์ „ ์šด์˜ ๊ด€์ ์—์„œ QoS๋ฅผ ์–ด๋–ป๊ฒŒ ์„ค์ •ํ•ด์•ผ ํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ์ „๋žต๊ณผ ์‹ค์ฒœ ๋ฐฉ๋ฒ•์ด๋‹ค.

  • ์šด์˜ ์„œ๋น„์Šค๋Š” Guaranteed ๋˜๋Š” ์ตœ์†Œํ•œ Burstable๋กœ ์„ค์ •
  • ํ…Œ์ŠคํŠธ, ์ž„์‹œ ์ž‘์—…์€ BestEffort๋กœ ํ•ด๋„ ๋ฌด๋ฐฉํ•˜์ง€๋งŒ, ๋ถˆ์•ˆ์ •
  • ๋ฉ”๋ชจ๋ฆฌ์˜ ๊ฒฝ์šฐ limit๋ณด๋‹ค ์‹ค์ œ ์‚ฌ์šฉ๋Ÿ‰์ด ๋งŽ์•„์ง€๋ฉด OOMKilled๋  ์ˆ˜ ์žˆ์œผ๋‹ˆ ์ฃผ์˜
  • HPA(์ˆ˜ํ‰ ์ž๋™ ํ™•์žฅ)๋Š” requests ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ์ž‘๋™

์„œ๋น„์Šค ์ค‘์š”๋„์— ๋”ฐ๋ฅธ ๊ณ„์ธต์  ์ „๋žต

์„œ๋น„์Šค ์œ ํ˜•์˜ˆ์‹œ๊ถŒ์žฅ QoS์„ค์ • ์ „๋žต
ํ•ต์‹ฌ ์„œ๋น„์ŠคDB, ๊ฒฐ์ œ, ์ธ์ฆ ์„œ๋น„์Šค ๋“ฑGuaranteedrequests == limits (๋ช…์‹œ์  ๋ฆฌ์†Œ์Šค ๊ณ ์ •)
์ผ๋ฐ˜ ๋งˆ์ดํฌ๋กœ์„œ๋น„์Šค์›น์„œ๋ฒ„, API ์„œ๋ฒ„ ๋“ฑBurstable์ตœ์†Œํ•œ์˜ requests, ์—ฌ์œ  ์žˆ๋Š” limits
๋น„์ค‘์š” ๋ฐฑ๊ทธ๋ผ์šด๋“œ๋กœ๊ทธ ์ˆ˜์ง‘, ๋ฐฐ์น˜ ์ž‘์—… ๋“ฑBestEffort ๋˜๋Š” Burstable๋ฆฌ์†Œ์Šค ์„ค์ • ์•ˆ ํ•˜๊ฑฐ๋‚˜ ๋‚ฎ๊ฒŒ ์„ค์ •

ResourceQuota ๊ฐœ์š”

Kubernetes์˜ ResourceQuota๋Š” ๋„ค์ž„์ŠคํŽ˜์ด์Šค ๋‹จ์œ„๋กœ ๋ฆฌ์†Œ์Šค ์‚ฌ์šฉ๋Ÿ‰์˜ ์ƒํ•œ์„ ์„ ์„ค์ •ํ•˜๋Š” ์ •์ฑ…์ด๋‹ค. ์—ฌ๋Ÿฌ ํŒ€์ด๋‚˜ ์•ฑ์ด ๊ณต์œ  ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ๋ฆฌ์†Œ์Šค๋ฅผ ๊ณต์ •ํ•˜๊ฒŒ ๋‚˜๋ˆ„๊ณ  ๋‚จ์šฉ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋œ๋‹ค.

ResourceQuota ํ•„์š”์„ฑ

์ƒํ™ฉ๋ฌธ์ œResourceQuota์˜ ์—ญํ• 
์—ฌ๋Ÿฌ ํŒ€์ด ํ•˜๋‚˜์˜ ํด๋Ÿฌ์Šคํ„ฐ ์‚ฌ์šฉํŠน์ • ํŒ€์ด ๊ณผ๋„ํ•˜๊ฒŒ ์ž์›์„ ์ ์œ ํŒ€๋งˆ๋‹ค ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ž์› ์ œํ•œ
Pod์ด ๊ณผ๋„ํ•˜๊ฒŒ ์ƒ์„ฑ๋จ๋…ธ๋“œ ๋ฆฌ์†Œ์Šค ๊ณ ๊ฐˆ โ†’ ์ „์ฒด ์žฅ์• Pod ์ˆ˜ ์ œํ•œ ์„ค์ •
PVC ๋“ฑ ์Šคํ† ๋ฆฌ์ง€๋ฅผ ๋ฌดํ•œ ์ƒ์„ฑํด๋Ÿฌ์Šคํ„ฐ ๋””์Šคํฌ ๊ณต๊ฐ„ ์†Œ์ง„PVC ์šฉ๋Ÿ‰ ์ดํ•ฉ ์ œํ•œ

ResourceQuota ์ฃผ์š” ์ œํ•œ ํ•ญ๋ชฉ

ํ•ญ๋ชฉ์„ค๋ช…์˜ˆ์‹œ
pods์ƒ์„ฑ ๊ฐ€๋Šฅํ•œ Pod์˜ ๊ฐœ์ˆ˜pods: 20
requests.cpu์ „์ฒด ์š”์ฒญ ๊ฐ€๋Šฅํ•œ CPU ํ•ฉrequests.cpu: "2"
limits.memory์ „์ฒด ์ œํ•œ ๊ฐ€๋Šฅํ•œ ๋ฉ”๋ชจ๋ฆฌ ํ•ฉlimits.memory: "4Gi"
persistentvolumeclaimsPVC ์ˆ˜ ์ œํ•œpersistentvolumeclaims: 10
requests.storage์ „์ฒด PVC ์šฉ๋Ÿ‰ ์ œํ•œrequests.storage: "500Gi"

ResourceQuota ๋ฏธ์„ค์ • ์‹œ ์œ„ํ—˜ ์š”์†Œ

Kubernetes์—์„œ ๋„ค์ž„์ŠคํŽ˜์ด์Šค ๋‹จ์œ„๋กœ ResourceQuota๋ฅผ ์„ค์ •ํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฆฌ์Šคํฌ ๋ฐ ๋ถ€์ž‘์šฉ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค

์ž์› ๊ณผ๋‹ค ์‚ฌ์šฉ์œผ๋กœ ์ธํ•œ ๋…ธ๋“œ ์ž์› ๊ณ ๊ฐˆ

ํŠน์ • ๋„ค์ž„์ŠคํŽ˜์ด์Šค(์˜ˆ: spark)์— ์ž์› ์ œํ•œ์ด ์—†์œผ๋ฉด, ๋ฌดํ•œ๋Œ€๋กœ Pod์„ ์ƒ์„ฑํ•˜๊ฑฐ๋‚˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜์–ด ์ „์ฒด ํด๋Ÿฌ์Šคํ„ฐ ๋…ธ๋“œ ์ž์›์ด ๊ณ ๊ฐˆ๋  ์ˆ˜ ์žˆ๋‹ค.

์ด๋กœ ์ธํ•ด, ๋‹ค๋ฅธ ์„œ๋น„์Šค(Pod)๋“ค์ด ์Šค์ผ€์ค„๋ง๋˜์ง€ ์•Š๊ฑฐ๋‚˜ OOMKilled, Pod Pending, Eviction ๋ฐœ์ƒํ•œ๋‹ค. ์˜ˆ์‹œ๋“ค์–ด, Spark job์ด executor๋ฅผ ์ˆ˜์‹ญ ๊ฐœ ์ƒ์„ฑํ•˜๋ฉด์„œ 100Gi ์ด์ƒ์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์†Œ๋ชจํ•˜๋ฉด Trino, Airflow Pod์ด Pending ์ƒํƒœ๋กœ ์ „ํ™˜๋œ๋‹ค.

๋ฉ€ํ‹ฐ ํŒ€/์„œ๋น„์Šค ํ™˜๊ฒฝ์—์„œ์˜ "์ž์› ๊ฒฝ์Ÿ" ๋ฐœ์ƒ

  • ํ•œ ํŒ€์ด ํด๋Ÿฌ์Šคํ„ฐ ์ž์›์„ ๊ณผ๋‹ค ์ ์œ ํ•˜๋ฉด, ๋‹ค๋ฅธ ํŒ€์˜ ์›Œํฌ๋กœ๋“œ๊ฐ€ ์ž์› ๋ถ€์กฑ์œผ๋กœ ์ •์ƒ ๋™์ž‘ํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Œ
  • ๋„ค์ž„์ŠคํŽ˜์ด์Šค๋ณ„ ๊ฒฝ๊ณ„๋Š” ์žˆ์ง€๋งŒ, ์ž์›์€ ๊ณต์œ ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฆฌ์†Œ์Šค ์ฟผํ„ฐ ์—†์œผ๋ฉด ์ž์› ๋…์‹ ๊ฐ€๋Šฅ

๋น„์ •์ƒ์  Pod ํญ์ฆ์œผ๋กœ API Server, Scheduler ๋ถ€ํ•˜

๋ฌดํ•œ ๋ฃจํ”„ ๋“ฑ์œผ๋กœ ์ธํ•ด ์ž˜๋ชป๋œ ๋งค๋‹ˆํŽ˜์ŠคํŠธ๊ฐ€ ์ˆ˜๋ฐฑ ๊ฐœ์˜ Pod์„ ์ƒ์„ฑํ•  ๊ฒฝ์šฐ API server ๋ถ€ํ•˜ ์ฆ๊ฐ€, kube-scheduler ๊ณผ๋ถ€ํ•˜, ํด๋Ÿฌ์Šคํ„ฐ ์ „์ฒด๊ฐ€ ๋ถˆ์•ˆ์ •ํ•ด์ง„๋‹ค.
pods ๊ฐœ์ˆ˜ ์ œํ•œ์„ ํ•˜์ง€ ์•Š์œผ๋ฉด ๋ฐฉ์–ด ๋ถˆ๊ฐ€ํ•˜๋‹ค.

์„œ๋น„์Šค ๋ ˆ๋ฒจ ๊ฐ„ SLA ์ถฉ๋Œ

  • ์˜ˆ: Airflow์˜ DAG ์‹คํ–‰ ์ค‘ ์ž์› ๋ถ€์กฑ โ†’ Spark executor๊ฐ€ ์ž์›์„ ๋ชจ๋‘ ๊ฐ€์ ธ๊ฐ
    โ†’ ๋น„์ฆˆ๋‹ˆ์Šค ํฌ๋ฆฌํ‹ฐ์ปฌํ•œ ์›Œํฌํ”Œ๋กœ์šฐ ์‹คํŒจ ๊ฐ€๋Šฅ
  • ๊ฐ ์„œ๋น„์Šค ๋˜๋Š” ๋„ค์ž„์ŠคํŽ˜์ด์Šค ๋‹จ์œ„๋กœ SLA ์ˆ˜์ค€์„ ๋ณด์žฅํ•˜๋ ค๋ฉด ๋ฆฌ์†Œ์Šค ์ƒํ•œ์ด ๋ฐ˜๋“œ์‹œ ํ•„์š”

์šด์˜/ํ…Œ์ŠคํŠธ ํ™˜๊ฒฝ ๊ฐ„ ๊ฒฝ๊ณ„ ๋ฌด๋ ฅํ™”

prod ๋„ค์ž„์ŠคํŽ˜์ด์Šค์™€ test ๋„ค์ž„์ŠคํŽ˜์ด์Šค๋ฅผ ๋‚˜๋ˆด๋”๋ผ๋„ ๋ฆฌ์†Œ์Šค์ฟผํ„ฐ ์—†์œผ๋ฉด test ์ž‘์—…์ด prod ๋ฆฌ์†Œ์Šค๋ฅผ ๋ชจ๋‘ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
โ†’ ์šด์˜ ์žฅ์•  ์œ ๋ฐœ ๊ฐ€๋Šฅ

๋ฆฌ์†Œ์Šค์ฟผํ„ฐ๋ฅผ ์„ค์ •ํ•˜๋ฉด ์ƒ๊ธฐ๋Š” ์ด์ 

์ด์ ์„ค๋ช…
์ž์› ํญ์ฃผ ๋ฐฉ์ง€๊ฐ ๋„ค์ž„์ŠคํŽ˜์ด์Šค๋ณ„ ์ž์› ์ƒํ•œ ์„ค์ •
ํŒ€ ๊ฐ„ ๊ณต์ •ํ•œ ์ž์› ์‚ฌ์šฉ์„œ๋กœ ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์Œ
์•ˆ์ •์ ์ธ ์„œ๋น„์Šค ์šด์˜์˜ˆ์ธก ๊ฐ€๋Šฅํ•œ ์ž์› ๋ฐฐ๋ถ„
Pod ์ˆ˜, PVC ์ˆ˜ ๋“ฑ ์ œ์–ด ๊ฐ€๋Šฅ์˜ค๋ธŒ์ ํŠธ ๊ฐœ์ˆ˜๋„ ์ œ์–ด
HPA/Auto Scaling ์‹œ ๋ณดํ˜ธ๋ง‰๋ฌดํ•œ ํ™•์žฅ ๋ฐฉ์ง€

์ถ”์ฒœ ์ตœ์†Œํ•œ์˜ ๋ฆฌ์†Œ์Šค์ฟผํ„ฐ ํ•ญ๋ชฉ

requests.cpu, limits.cpu
requests.memory, limits.memory
pods
persistentvolumeclaims
  • ํ•„์š” ์‹œ:
services, secrets, configmaps, replicationcontrollers

ResourceQuota์™€ QoS ๋น„๊ต

ํ•ญ๋ชฉQoS (Quality of Service)ResourceQuota
๋ชฉ์ Pod ์šฐ์„ ์ˆœ์œ„ ๋ฐ ์ž์› ํšŒ์ˆ˜ ์ •์ฑ…๋„ค์ž„์ŠคํŽ˜์ด์Šค ๋‹จ์œ„ ์ž์› ์ œํ•œ
์ ์šฉ ๋Œ€์ƒ๊ฐœ๋ณ„ Pod๊ฐœ๋ณ„ Namespace
์ž๋™ ์ ์šฉ ์—ฌ๋ถ€Kubernetes๊ฐ€ ์ž๋™ ๋ถ„๋ฅ˜์‚ฌ์šฉ์ž๊ฐ€ ๋ช…์‹œ์ ์œผ๋กœ ์„ค์ •ํ•ด์•ผ ํ•จ
๊ธฐ์ค€Pod์˜ requests/limits ์„ค์ • ์กฐํ•ฉNamespace์— ํ• ๋‹น๋œ ์ด๋Ÿ‰ ์ œํ•œ
์ข…๋ฅ˜/๋ ˆ๋ฒจGuaranteed, Burstable, BestEffortCPU, Memory, Pods, PVC, ConfigMap ๊ฐœ์ˆ˜ ๋“ฑ
์ž์› ๋ถ€์กฑ ์‹œ ํ–‰๋™QoS ๋‚ฎ์€ Pod ๋จผ์ € ์ œ๊ฑฐQuota ์ดˆ๊ณผ ์‹œ ์ƒˆ Pod ์ƒ์„ฑ ๋ถˆ๊ฐ€
์‚ญ์ œ ๋Œ€์ƒ ์šฐ์„ ์ˆœ์œ„QoS: BestEffort โ†’ Burstable โ†’ GuaranteedN/A (์ดˆ๊ณผ ์‹œ ์Šค์ผ€์ค„๋ง ๋ถˆ๊ฐ€)
๊ด€๋ จ ๋ฆฌ์†Œ์ŠคPodSpec ๋‚ด resources.requests/limitsResourceQuota, LimitRange
์šด์˜ ๊ด€์  ์šฉ๋„์Šค์ผ€์ค„๋Ÿฌ/eviction ์‹œ ์šฐ์„ ์ˆœ์œ„ ๊ฒฐ์ •์ž์› ํ†ต์ œ ๋ฐ ์‚ฌ์šฉ๋Ÿ‰ ์ œํ•œ

์ด ์™ธ ๋ฆฌ์†Œ์Šค ์ œํ•œ ๋‹จ์œ„

์ž์› ์ œํ•œ์„ ์ง์ ‘์ ์œผ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋Š” ์œ ์ผํ•œ ๋‹จ์œ„๋Š” ๋„ค์ž„์ŠคํŽ˜์ด์Šค์ด๋‹ค.
ํ•˜์ง€๋งŒ ๋„ค์ž„์ŠคํŽ˜์ด์Šค ์™ธ์—๋„ ์ž์› ์‚ฌ์šฉ์„ ๊ฐ„์ ‘์ ์œผ๋กœ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ๋ช‡ ๊ฐ€์ง€ ์žˆ๋‹ค.

LimitRange (๋„ค์ž„์ŠคํŽ˜์ด์Šค ๋‚ด Pod/Container ๋‹จ์œ„ ๊ธฐ๋ณธ๊ฐ’ ๋ฐ ์ƒํ•œ ์„ค์ •)

LimitRange๋Š” Pod ๋˜๋Š” ์ปจํ…Œ์ด๋„ˆ ๋‹จ์œ„์˜ ๋ฆฌ์†Œ์Šค ๊ธฐ๋ณธ๊ฐ’๊ณผ ์ œํ•œ๊ฐ’์„ ์ •์˜ํ•œ๋‹ค. ๋„ค์ž„์ŠคํŽ˜์ด์Šค ๋‚ด์˜ ๊ฐœ๋ณ„ ์›Œํฌ๋กœ๋“œ๊ฐ€ ๋ฆฌ์†Œ์Šค๋ฅผ ๋ช…์‹œํ•˜์ง€ ์•Š์•„๋„ ์ž๋™์œผ๋กœ ์ ์šฉ๋œ๋‹ค.

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: trino
spec:
  limits:
  - default:
      cpu: "1000m"
      memory: "1Gi"
    defaultRequest:
      cpu: "500m"
      memory: "512Mi"
    type: Container

Pod-level ๋ฆฌ์†Œ์Šค ์ œํ•œ (resources.requests / resources.limits)

๋„ค์ž„์ŠคํŽ˜์ด์Šค๊ฐ€ ๊ฐ™๋”๋ผ๋„, ๊ฐ Pod/Container ์ˆ˜์ค€์—์„œ ๋ฆฌ์†Œ์Šค ์ œํ•œ์„ ๊ฐœ๋ณ„์ ์œผ๋กœ ์ง€์ • ๊ฐ€๋Šฅ.

Node ์ˆ˜์ค€ ์ž์› ์ œํ•œ

taints์™€ tolerations๋กœ ํŠน์ • ์›Œํฌ๋กœ๋“œ๋งŒ ํŠน์ • ๋…ธ๋“œ์— ๋ฐฐ์น˜

kubectl taint nodes node1 role=spark:NoSchedule

โ†’ tolerations์ด ์„ค์ •๋œ Spark ์ž‘์—…๋งŒ ์ด ๋…ธ๋“œ์— ๋ฐฐ์น˜ ๊ฐ€๋Šฅ

nodeSelector ๋˜๋Š” affinity๋กœ ๋ฐฐ์น˜ ์ œ์–ด

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node-type
            operator: In
            values:
            - spark

PriorityClass + Pod Disruption + Preemption

  • ๋†’์€ ์šฐ์„ ์ˆœ์œ„์˜ Pod๊ฐ€ ๋‚ฎ์€ ์šฐ์„ ์ˆœ์œ„ Pod๋ฅผ ๋ฐ€์–ด๋‚ด๊ณ  ๋ฆฌ์†Œ์Šค๋ฅผ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•จ
  • SLA๊ฐ€ ๋‹ค๋ฅธ ์„œ๋น„์Šค ๊ฐ„์˜ ์ž์› ์„ ์  ๊ฐ€๋Šฅ

Cgroups + RuntimeClass (๋…ธ๋“œ ์šด์˜์ฒด์ œ ์ˆ˜์ค€ ๋ฆฌ์†Œ์Šค ๊ฒฉ๋ฆฌ, advanced)

  • ํŠน์ • RuntimeClass๋ฅผ ์ด์šฉํ•ด cgroup ๋ฐ ๋ฆฌ๋ˆ…์Šค ๋„ค์ž„์ŠคํŽ˜์ด์Šค ์ˆ˜์ค€์˜ ์ž์› ๋ถ„๋ฆฌ ๊ฐ€๋Šฅ
  • ์ฃผ๋กœ gVisor, kata ๋“ฑ์œผ๋กœ ๋ณด์•ˆ ๊ฐ•ํ™”ํ•  ๋•Œ ํ™œ์šฉ

๋ฆฌ์†Œ์Šค ์ œํ•œ ๋ฐฉ๋ฒ• ์š”์•ฝ

๋‹จ์œ„์ž์› ์ œํ•œ ๊ฐ€๋Šฅ ์—ฌ๋ถ€์ฃผ์š” ๋„๊ตฌ
๋„ค์ž„์ŠคํŽ˜์ด์Šค์ง์ ‘ ์„ค์ • ๊ฐ€๋ŠฅResourceQuota, LimitRange
Pod/Container๋ฆฌ์†Œ์Šค requests/limits ์„ค์ •QoS, autoscaler
๋…ธ๋“œ์ง์ ‘ ์ œํ•œ ๋ถˆ๊ฐ€ (ํ•˜์ง€๋งŒ ๋ฐฐ์น˜ ์ œ์–ด ๊ฐ€๋Šฅ)taint/toleration, affinity
ํด๋Ÿฌ์Šคํ„ฐ ์ „์ฒด์ง์ ‘ ์ œํ•œ ๋ถˆ๊ฐ€์ปจ๋ฒค์…˜ ๋˜๋Š” ์ •์ฑ… ๊ธฐ๋ฐ˜ ํˆด ํ•„์š”
๊ฐ€์ƒ ํด๋Ÿฌ์Šคํ„ฐ ๋‹จ์œ„(vCluster, HNC ๋“ฑ์œผ๋กœ ๊ฒฉ๋ฆฌ)vCluster, Loft

0๊ฐœ์˜ ๋Œ“๊ธ€