I developed a book management system using React and Spring Boot, and I wanted to allow users to simply enter a book title and have the rest of the information (author, publisher, price, genre) automatically filled in. To achieve this, I implemented a web scraping feature that book data from Yes24, a major Kroean online bookstore.
Manually entering all book details during registration is incovenient for users. So, I decided to build a feature that :
-Takes only the book title input
Jsoup - HTML parsing and web scraping library in Java
Spring Boot - REST API backend
React - Frontend interface (not covered here)
REST API - Used to request and return scraped book data
@GetMapping("/search-from-yes24")
public ResponseEntity<BookDTO> searchFromYes24(@RequestParam String title) {
try {
String encodedQuery = URLEncoder.encode(title, StandardCharsets.UTF_8);
String searchUrl = "https://www.yes24.com/Product/Search?domain=BOOK&query=" + encodedQuery;
// Step 1: Parse search result page
Document searchDoc = Jsoup.connect(searchUrl)
.userAgent("Mozilla/5.0")
.get();
Element firstItem = searchDoc.selectFirst("div.itemUnit");
if (firstItem == null) return ResponseEntity.notFound().build();
String detailUrl = firstItem.selectFirst("a[href]").absUrl("href");
if (detailUrl == null || detailUrl.isEmpty()) return ResponseEntity.status(502).build();
// Step 2: Parse book detail page
Document detailDoc = Jsoup.connect(detailUrl)
.userAgent("Mozilla/5.0")
.get();
String bookTitle = detailDoc.selectFirst("h2.gd_name") != null
? detailDoc.selectFirst("h2.gd_name").text().trim()
: "Unknown Title";
String author = detailDoc.selectFirst("span.gd_auth a") != null
? detailDoc.selectFirst("span.gd_auth a").text().trim()
: "Unknown Author";
String priceText = detailDoc.selectFirst("em.yes_m") != null
? detailDoc.selectFirst("em.yes_m").text().replaceAll("[^0-9]", "")
: "0";
String publisher = detailDoc.selectFirst("span.gd_pub a") != null
? detailDoc.selectFirst("span.gd_pub a").text().trim()
: "Unknown Publisher";
String genre = "Unknown Genre";
Elements genreEls = detailDoc.select("div#infoset_goodsCate dl.yesAlertDl dt:contains(Category) + dd ul.yesAlertLi li a");
if (!genreEls.isEmpty()) {
genre = genreEls.last().text().trim();
}
// Build DTO
BookDTO dto = BookDTO.builder()
.title(bookTitle)
.author(author)
.publisher(publisher)
.price(Double.parseDouble(priceText))
.genre(genre)
.build();
return ResponseEntity.ok(dto);
} catch (Exception e) {
e.printStackTrace();
return ResponseEntity.status(500).build();
}
}