๐Ÿ’ป Puppeteer๋กœ ํ”„๋กœ๊ทธ๋ž˜๋จธ์Šค ๋ฌธ์ œ ์ œ๋ชฉ & ๋งํฌ ์ „์ฒด ๊ธ์–ด์˜ค๊ธฐ (CSV ์ €์žฅ๊นŒ์ง€!)

dev_logยท2025๋…„ 7์›” 9์ผ
0

ํ”„๋กœ๊ทธ๋ž˜๋จธ์Šค ์ฝ”๋”ฉํ…Œ์ŠคํŠธ๋Š” ๋ฌธ์ œ์— ๋ฒˆํ˜ธ๊ฐ€ ์—†๋‹ค.
๊ทธ๋ž˜์„œ ๋‚ด๊ฐ€ ์–ด๋””๊นŒ์ง€ ํ’€์—ˆ๋Š”์ง€, ์˜ค๋Š˜์€ ์–ด๋””์„œ๋ถ€ํ„ฐ ์–ด๋””๊นŒ์ง€ ํ’€์ง€ ๊ณ„ํš ์„ธ์šฐ๋Š” ๊ฒŒ ์‰ฝ์ง€ ์•Š์•˜๋‹ค.

๋งค์ผ ๋”ฑ 30๋ถ„๋งŒ ํˆฌ์žํ•ด์„œ ๋ฌธ์ œ๋ฅผ ํ’€๊ณ  ์‹ถ์€๋ฐโ€ฆ
์ผ๋‹จ ์ฝ”๋”ฉ ๊ธฐ์ดˆ ํŠธ๋ ˆ์ด๋‹์€ ์ด 124๋ฌธ์ œ๋ฟ์ด๋‹ˆ๊นŒ
์ฒ˜์Œ์—” ๊ทธ๋ƒฅ ํ•˜๋‚˜์”ฉ ๋ฐ›์•„์“ฐ๊ธฐํ•ด์„œ ๋ชฉ๋ก ๋งŒ๋“ค๊นŒ ํ–ˆ์—ˆ๋‹ค.

๊ทผ๋ฐ ๋ฌธ๋“ ์ƒ๊ฐ๋‚ฌ๋‹ค.

โ€œ์›น์Šคํฌ๋กค๋ง์œผ๋กœ ๊ธ์–ด์˜ค๋ฉด ๋˜์ž–์•„?โ€

์‚ฌ์‹ค ์›น์Šคํฌ๋กค๋ง์— ๋Œ€ํ•ด์„œ๋Š”
โ€œ๋ญ”๊ฐ€๋ฅผ ๊ธ์–ด์˜ค๋Š” ๊ฑฐ?โ€ ์ •๋„๋งŒ ์•Œ๊ณ  ์žˆ์—ˆ๋‹ค.
๊ทผ๋ฐ ์ฑ—GPT๋ž‘ ๊ฐ™์ด ์–˜๊ธฐํ•˜๋ฉฐ ์ฝ”๋“œ ์งœ๋ดค๋”๋‹ˆ
๊ธˆ๋ฐฉ ์›ํ•˜๋Š” ๊ฒฐ๊ณผ๋ฌผ์ด ๋‚˜์™”๋‹ค.
์ด๊ฒŒ ๋ฐ”๋กœโ€ฆ ๋ฐ”์ด๋ธŒ์ฝ”๋”ฉ์ธ๊ฐ€? ๐Ÿ˜Ž


๐ŸŽฏ ๋ชฉํ‘œ

  • ํ”„๋กœ๊ทธ๋ž˜๋จธ์Šค ์ฝ”๋”ฉ ๊ธฐ์ดˆ ํŠธ๋ ˆ์ด๋‹ ๋ฌธ์ œ 7ํŽ˜์ด์ง€(์ด 124๋ฌธ์ œ) ๊ธ์–ด์˜ค๊ธฐ
  • ์ œ๋ชฉ + ๋งํฌ๋ฅผ ๋ชจ์•„์„œ CSV ํŒŒ์ผ๋กœ ์ €์žฅ

๐Ÿง‘โ€๐Ÿ’ป ์ตœ์ข… ์ฝ”๋“œ

const puppeteer = require("puppeteer");
const fs = require("fs");

(async () => {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();

  const allTitles = [];
  const totalPages = 7;

  for (let currentPage = 1; currentPage <= totalPages; currentPage++) {
    const url = `https://school.programmers.co.kr/learn/challenges/training?order=acceptance_desc&languages=javascript&page=${currentPage}`;
    console.log(`๐Ÿ“„ ํŽ˜์ด์ง€ ${currentPage} ๋กœ๋”ฉ ์ค‘...`);

    await page.goto(url, {
      waitUntil: "networkidle2",
      timeout: 120000,
    });

    await new Promise((resolve) => setTimeout(resolve, 2000));

    const titles = await page.$$eval("td.title a", (elements) =>
      elements.map((el) => ({
        title: el.innerText.trim(),
        link: el.href,
      }))
    );

    console.log(`โœ… ํŽ˜์ด์ง€ ${currentPage}์—์„œ ${titles.length}๊ฐœ ์ˆ˜์ง‘`);
    allTitles.push(...titles);
  }

  console.log(`\n๐Ÿ“ฆ ๋ชจ๋“  ๋ฌธ์ œ ์ œ๋ชฉ (์ด ${allTitles.length}๊ฐœ):`);
  allTitles.forEach((title, index) => {
    console.log(`${index + 1}. ${title.title}`);
  });

  await browser.close();

  const csvHeader = "๋ฒˆํ˜ธ,์ œ๋ชฉ,๋งํฌ\n";
  const csvContent = allTitles
    .map((p, i) => `${i + 1},"${p.title}","${p.link}"`)
    .join("\n");
  fs.writeFileSync("programmers_problems.csv", csvHeader + csvContent, "utf-8");

  console.log(`๐Ÿ’พ CSV ์ €์žฅ ์™„๋ฃŒ: programmers_problems.csv`);
})();

๐Ÿ˜… ์‚ฝ์งˆ์˜ ์‹œ์ž‘

๐Ÿ› npm init์—์„œ ์—๋Ÿฌ ํ„ฐ์ง

์ฒ˜์Œ npm init -y ํ–ˆ๋”๋‹ˆ ์ด๋Ÿฌ๋”๋ผ.

npm ERR! Invalid name: "์ฝ”๋”ฉ202507"

ํ•œ๊ธ€ ํด๋”๋ช… ๋•Œ๋ฌธ์— npm์ด ์‚์งโ€ฆ
๋ฐ”๋กœ ํด๋”๋ช…์„ coding-2507์œผ๋กœ ๋ฐ”๊ฟ”์„œ ํ•ด๊ฒฐํ–ˆ๋‹ค.


๐ŸŒ ํŽ˜์ด์ง€ ๋กœ๋”ฉ TimeoutError

์ฒซ ์‹คํ–‰ ๋•Œ ์ด ์—๋Ÿฌ๊ฐ€ ๋–ด๋‹ค.

TimeoutError: Navigation timeout of 60000 ms exceeded

Puppeteer๊ฐ€ ํŽ˜์ด์ง€์˜ ๋ชจ๋“  ์š”์ฒญ์ด ๋๋‚˜๊ธธ ๊ธฐ๋‹ค๋ฆฌ๋А๋ผ ํƒ€์ž„์•„์›ƒ ๊ฑธ๋ฆผ.
ํ•ด๊ฒฐ๋ฒ•์€ ๊ฐ„๋‹จํ–ˆ๋‹ค.

  • waitUntil: "networkidle0" โ†’ "networkidle2"
  • timeout: 120000์œผ๋กœ ๋Š˜๋ฆผ
  • ๊ทธ๋ฆฌ๊ณ  headless: false๋กœ ์‹ค์ œ ๋ธŒ๋ผ์šฐ์ € ์ผœ์„œ ๋™์ž‘ ํ™•์ธ

๐Ÿ“„ ํŽ˜์ด์ง€ 1๋ฒˆ๋งŒ ๊ธํž˜

์•„๋ฌด๋ฆฌ ๋Œ๋ ค๋„ 1ํŽ˜์ด์ง€ 20๋ฌธ์ œ๋งŒ ๊ธํž˜.
URL์— page=1์ด ๋ฐ•ํ˜€์žˆ๋˜ ๊ฑธ ๋ฐœ๊ฒฌํ•˜๊ณ  for๋ฌธ์œผ๋กœ 1~7ํŽ˜์ด์ง€ ์ˆœํšŒํ•˜๋„๋ก ์ˆ˜์ •ํ–ˆ๋‹ค.

for (let currentPage = 1; currentPage <= totalPages; currentPage++) {
  const url = `...page=${currentPage}`;
  await page.goto(url, ...);
}

โŒ CSV์— undefined ๋œธ

์ด๋ฒˆ์—” CSV ๊ฒฐ๊ณผ๊ฐ€ ์ด ๋ชจ์–‘:

"๋ฌธ์ œ์ œ๋ชฉ","function link() { [native code] }"

์›์ธ์€ ๋‘ ๊ฐ€์ง€์˜€๋‹ค.

  • el.textContent๊ฐ€ ๋™์ž‘ ์•ˆ ํ•จ โ†’ ๋™์  ํŽ˜์ด์ง€๋ผ์„œ
  • el.getAttribute("href") ๋Œ€์‹  ํ•จ์ˆ˜ ์ฐธ์กฐ๊ฐ€ ์ฐํž˜

๊ณ ์นœ ๋ฐฉ๋ฒ•:

title: el.innerText.trim(),
link: el.href,

๋ฐ”๊ฟจ๋”๋‹ˆ ๊น”๋”ํ•˜๊ฒŒ ์ œ๋ชฉ+๋งํฌ ์ถœ๋ ฅ๋จ.


๐Ÿšจ ReferenceError: fs is not defined

๋งˆ์ง€๋ง‰์œผ๋กœ fs.writeFileSync() ์ผ๋”๋‹ˆ:

ReferenceError: fs is not defined

fs ๋ชจ๋“ˆ์„ import ์•ˆ ํ•ด์„œ ์ƒ๊ธด ์‹ค์ˆ˜์˜€๋‹ค. ์ƒ๋‹จ์— ์ถ”๊ฐ€๋กœ ํ•ด๊ฒฐ.

const fs = require("fs");

๐Ÿ“ฆ ๊ฒฐ๊ณผ๋ฌผ

์ตœ์ข…์ ์œผ๋กœ ๋‚˜์˜จ programmers_problems.csv

๋ฒˆํ˜ธ์ œ๋ชฉ๋งํฌ
1๋ฌธ์ž ๋ฆฌ์ŠคํŠธ๋ฅผ ๋ฌธ์ž์—ด๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐhttps://school.programmers.co.kr/learn/courses/30/lessons/181941
2ํ™€์ง์— ๋”ฐ๋ผ ๋‹ค๋ฅธ ๊ฐ’ ๋ฐ˜ํ™˜ํ•˜๊ธฐhttps://school.programmers.co.kr/learn/courses/30/lessons/181944

์—‘์…€๋กœ ์—ด๋ฉด ๋ฐ”๋กœ ๋งํฌ ํด๋ฆญ ๊ฐ€๋Šฅ ๐Ÿ‘


โœจ ๋ฐฐ์šด ๊ฒƒ

  • Puppeteer์—์„œ ๋™์  ํŽ˜์ด์ง€๋Š” innerText, href๋กœ ์ ‘๊ทผํ•ด์•ผ ์•ˆ์ „
  • networkidle2 + setTimeout ์กฐํ•ฉ์œผ๋กœ ๋กœ๋”ฉ ์•ˆ์ •ํ™”
  • ํŽ˜์ด์ง€๋„ค์ด์…˜์€ ์ฟผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๊ฐ„๋‹จํžˆ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ
  • npm ํŒจํ‚ค์ง€ ์ด๋ฆ„์€ ์˜์–ด ์†Œ๋ฌธ์ž + ์ˆซ์ž + ํ•˜์ดํ”ˆ๋งŒ

0๊ฐœ์˜ ๋Œ“๊ธ€