[백준/Java] 22859번: HTML 파싱

리리·2024년 11월 18일

문제

https://www.acmicpc.net/problem/22859

풀이 과정

정규표현식으로 풀었다.
정규표현식에 익숙치 않아서 아래 링크들을 많이 참고하면서 공부하듯 품
- 풀이 참고
- 개념 참고

새로 배운 개념

.*? 사용법

. : 모든 문자
* : 0번 이상 반복
? : 비탐욕적 매칭 (가장 짧은 범위를 매칭)

.와 .?의 차이는 뭘까

<p>Text1</p><p>Text2</p>

.* : (탐욕적 매칭) 최대 범위를 매칭한다.
- Text1Text2 가 전부 매칭된다.
.*? : (비탐욕적 매칭) 최소 범위를 매칭한다.
- Text1 까지만 매칭된다.

풀이

파이썬 풀이

import sys
import re
input = lambda: sys.stdin.readline().rstrip()

html = input()

# 1. main parsing
s = len('<main>')
e = len('</main>')
html = html[s : -e]

# 2. div parsing
html = re.sub(r'<div +title="([\w ]*)">', r'title : \1\n', html)
html = re.sub(r'</div>', '', html)

# 3. p parsing
html = re.sub(r'<p>(.*?)</p>', r'\1\n', html)

# 4. p parsing - 모든 태그 지우기
html = re.sub(r'<([\w /]*)>', '', html)

# 5. p parsing - 맨 앞, 맨 뒤 공백 제거
html = re.sub(r' ?\n ?', r'\n', html)

# 6. p parsing - 공백이 2번 이상 나타나면 하나로 대체
html = re.sub(r' {2,}', ' ', html)

print(html)

자바 풀이

public class Main {
    public static void main(String[] args) throws IOException {
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
        String html = br.readLine();

        // 1. main parser
        int s = "<main>".length();
        int e = "</main>".length();
        html = html.substring(s, html.length() - e);

        // 2. div parser
        html = html.replaceAll("<div +title=\"([\\w ]*)\">", "title : $1\n");
        html = html.replaceAll("</div>", "");

        // 3. p parser
        html = html.replaceAll("<p>(.*?)</p>", "$1\n");

        // 4. p parser - remove all tag
        html = html.replaceAll("<([\\w /]*)>", "");

        // 5. p parser - trim
        html = html.replaceAll(" ?\n ?", "\n");
        
        // 6. p parser - two space -> one space
        html = html.replaceAll(" {2,}", " ");

        BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(System.out));
        bw.write(html);
        bw.flush();
        bw.close();
    }
}

리리

이전 포스트

모니터링 자동화를 위한 New Relic 도입기

다음 포스트

[백준/Java] 22859번: HTML 파싱

문제

풀이 과정

새로 배운 개념

.*? 사용법

.와 .?의 차이는 뭘까

풀이

파이썬 풀이

자바 풀이

모니터링 자동화를 위한 New Relic 도입기

[백준/Python] 7662번: 이중 우선순위 큐

0개의 댓글

[백준/Java] 22859번: HTML 파싱

문제

풀이 과정

새로 배운 개념

.*? 사용법

.*와 .*?의 차이는 뭘까

풀이

파이썬 풀이

자바 풀이

모니터링 자동화를 위한 New Relic 도입기

[백준/Python] 7662번: 이중 우선순위 큐

0개의 댓글

.와 .?의 차이는 뭘까