Implementing Book Info Crawling from Yes24 (Spring Boot + Jsoup)

윤서·2025년 6월 24일

I developed a book management system using React and Spring Boot, and I wanted to allow users to simply enter a book title and have the rest of the information (author, publisher, price, genre) automatically filled in. To achieve this, I implemented a web scraping feature that book data from Yes24, a major Kroean online bookstore.

🤢 Problem

Manually entering all book details during registration is incovenient for users. So, I decided to build a feature that :
-Takes only the book title input

  • Scrapes the top result from Yes24's search results
  • Populates the form with book metadata automatically

💻 Tech Stack

Jsoup - HTML parsing and web scraping library in Java
Spring Boot - REST API backend
React - Frontend interface (not covered here)
REST API - Used to request and return scraped book data

📜 Final Controller

@GetMapping("/search-from-yes24")
public ResponseEntity<BookDTO> searchFromYes24(@RequestParam String title) {
    try {
        String encodedQuery = URLEncoder.encode(title, StandardCharsets.UTF_8);
        String searchUrl = "https://www.yes24.com/Product/Search?domain=BOOK&query=" + encodedQuery;

        // Step 1: Parse search result page
        Document searchDoc = Jsoup.connect(searchUrl)
                .userAgent("Mozilla/5.0")
                .get();

        Element firstItem = searchDoc.selectFirst("div.itemUnit");
        if (firstItem == null) return ResponseEntity.notFound().build();

        String detailUrl = firstItem.selectFirst("a[href]").absUrl("href");
        if (detailUrl == null || detailUrl.isEmpty()) return ResponseEntity.status(502).build();

        // Step 2: Parse book detail page
        Document detailDoc = Jsoup.connect(detailUrl)
                .userAgent("Mozilla/5.0")
                .get();

        String bookTitle = detailDoc.selectFirst("h2.gd_name") != null
                ? detailDoc.selectFirst("h2.gd_name").text().trim()
                : "Unknown Title";

        String author = detailDoc.selectFirst("span.gd_auth a") != null
                ? detailDoc.selectFirst("span.gd_auth a").text().trim()
                : "Unknown Author";

        String priceText = detailDoc.selectFirst("em.yes_m") != null
                ? detailDoc.selectFirst("em.yes_m").text().replaceAll("[^0-9]", "")
                : "0";

        String publisher = detailDoc.selectFirst("span.gd_pub a") != null
                ? detailDoc.selectFirst("span.gd_pub a").text().trim()
                : "Unknown Publisher";

        String genre = "Unknown Genre";
        Elements genreEls = detailDoc.select("div#infoset_goodsCate dl.yesAlertDl dt:contains(Category) + dd ul.yesAlertLi li a");
        if (!genreEls.isEmpty()) {
            genre = genreEls.last().text().trim();
        }

        // Build DTO
        BookDTO dto = BookDTO.builder()
                .title(bookTitle)
                .author(author)
                .publisher(publisher)
                .price(Double.parseDouble(priceText))
                .genre(genre)
                .build();

        return ResponseEntity.ok(dto);

    } catch (Exception e) {
        e.printStackTrace();
        return ResponseEntity.status(500).build();
    }
}

0개의 댓글