정규식 pattern 에 변수 사용하기

개발공부를해보자·2025년 5월 21일

공부 정리

목록 보기

28/33

관련 문제 LeetCode 771. Jewels and Stones

또 정규식이다.
정규식은 아직 헷갈리는 부분이 많다.
re.sub(r'[abc]', '', cat)이라 하면 cat에서 a, b, c중 일치하는 것을 빈 문자열로 바꾸어 t가 된다.
그런데 abc 패턴을 변수 pattern = 'abc'로 받아와야할 경우가 있다.
이때 re.sub(rf'[{pattern}]', '', 'cat')
또는 re.sub(rf'[{re.escape(pattern)}]', '', 'cat')라 할 수 있다.

정규식 pattern에 변수 사용하기

import re

# (1) 문자 여러 개 제거하기
# 변수 안의 문자들 'a', 'b', 'c'를 모두 지우고 싶을 때
char_to_del = 'abc'
pattern = f"[{re.escape(char_to_del)}]"  # → '[abc]' (문자 하나하나 제거)
text = "abcdefg"
print(re.sub(pattern, '', text))  # 출력: 'defg'

# (2) 단어 하나만 정확히 제거하기
# 'hello'라는 단어만 정확히 지우고 싶을 때
word_to_remove = "hello"
pattern = fr"\b{re.escape(word_to_remove)}\b"  # → '\bhello\b' (단어 경계 사용)
text = "say hello to the world"
print(re.sub(pattern, '', text))  # 출력: 'say  to the world'

# (3) 여러 단어 한 번에 제거하기
# 'a', 'the', 'an' 이라는 단어들을 모두 제거하고 싶을 때
words_to_remove = ['a', 'the', 'an']
pattern = r'\b(?:' + '|'.join(map(re.escape, words_to_remove)) + r')\b'
text = "a quick brown fox jumps over the lazy dog"
print(re.sub(pattern, '', text))  # 출력: ' quick brown fox jumps over  lazy dog'

# (4) 특수문자가 포함된 변수 처리하기
# 특수문자 '-', ']', '^'를 지우고 싶을 때
special_chars = '-]^'
pattern = f"[{re.escape(special_chars)}]"  # → '[-\]\^]' 안전하게 처리됨
text = "a-b]c^d"
print(re.sub(pattern, '', text))  # 출력: 'abcd'

패턴 앞의 `r`과 `re.escape()`는 무엇일까?

import re

# ✅ 1. r"..." (raw string)
# - 목적: 백슬래시(\)가 포함된 정규식 문자열을 그대로 표현하기 위함
# - 사용 예: 정규식에서 \b (단어 경계), \d (숫자) 등을 쓸 때 유용
pattern_raw = r"\bhello\b"  # \b는 단어 경계 (word boundary)
text1 = "hello world"
text2 = "ahello world"

# r"..."을 쓰면 백슬래시를 이중으로 쓰지 않아도 됨
print(re.search(pattern_raw, text1))  # ✅ 매칭됨
print(re.search(pattern_raw, text2))  # ❌ 단어 경계 없음 → 매칭 안 됨

# 만약 raw string을 안 쓰면 백슬래시를 두 번 써야 함
pattern_no_raw = "\\bhello\\b"  # 같은 의미지만 더 지저분함
print(re.search(pattern_no_raw, text1))  # ✅ 결과는 같음

# ✅ 2. re.escape(...)
# - 목적: 사용자가 입력한 문자열이 정규식 특수문자 (., *, +, [, ], ^ 등)를 포함할 경우,
#         이들을 문자 그대로 처리하게 하기 위해 이스케이프 처리함
user_input = "a+b*c"
escaped_pattern = re.escape(user_input)  # → 'a\\+b\\*c'

text3 = "xxx a+b*c yyy"
print(re.search(escaped_pattern, text3))  # ✅ 정확히 'a+b*c'와 매칭됨

# re.escape를 사용하지 않으면 정규식 해석이 달라짐
# - a+는 a가 1번 이상 반복, b는 문자 b, *는 b의 반복, c는 문자 c
wrong_pattern = "a+b*c"
print(re.search(wrong_pattern, text3))  # ✅ 작동하긴 하지만 의도와 다르게 동작할 수 있음

# 따라서: 사용자 입력값을 정규식에 직접 넣을 땐 re.escape로 보호해야 함!