[Python] 정규표현식 기본 re :: regular expression

메린·2022년 12월 27일

Python

목록 보기

1/2

📌라이브러리 가져오기 및 컴파일

import re
com = re.compile( ~ )

📌re 메소드

1) re. match((rgx), text) : 문자열 처음부터 정규식과 매치된건지 검색

2) re.search((rgx), text) : 문자 전체에서 정규식과 매치되는 부분이 있는지 검색

3) re.findall((rgx), text) : 정규식과 매치되는 모든 문자를 찾아 리스트로 변환

4) re.finditer((rgx), text) : 정규식과 매치되는 모든 문자를 반복가능한 객체로 변환

5) re.sub(pattern, replace, text) : 정규식과 매치되는 부분을 치환

+ 컴파일 옵션 : re.compile(str, re.DOTALL) 처럼 사용

(1) re.DOTALL or re.S : 모든 문자 매치

(2) re.IGNORECASE or re.I : 대소문자 구별 X

(3) re.MULTILINE or re.M : 여러줄 매치

(4) re.VERBOSE or re.X : verbose 모드 사용

charref = re.compile(r"""
 &[#]                # Start of a numeric entity reference
 (
     0[0-7]+         # Octal form
   | [0-9]+          # Decimal form
   | x[0-9a-fA-F]+   # Hexadecimal form
 )
 ;                   # Trailing semicolon
""", re.VERBOSE)

📌 메타 문자

1. []

문자 / [] 안에 있는 모든 문자와 매치

com = re.compile('[apple]')
com.findall('I ate apple.')
# ==> ['a', 'e', 'a', 'p', 'p', 'l', 'e']

2. -

[From-To] / 모든 알파벳, 혹은 모든 문자 / [a-z] or [a-zA-Z] or [0-9]

com = re.compile('[1-3]')
com.findall('123456789')
# ==> ['1', '2', '3']

3. \d \ D \s \S \w \W

숫자 / 숫자X / 공백 / 공백X / 문자 / 문자X

# 숫자
com = re.compile('\d') 
com.findall('I am 30 years old!')
# ==> ['3', '0']

# 숫자가 아닌 것
com = re.compile('\D') 
com.findall('I am 30 years old!')
# ==> ['I', ' ', 'a', 'm', ' ', ' ', 'y', 'e', 'a', 'r', 's', ' ', 'o', 'l', 'd', '!']

# 공백
com = re.compile('\s')
com.findall('I am 30 years old!')
# ==> [' ', ' ', ' ', ' ']

# 공백이 아닌 것(문자, 특수문자, 숫자)
com = re.compile('\S')
com.findall('I am 30 years old!')
# ==> ['I', 'a', 'm', '3', '0', 'y', 'e', 'a', 'r', 's', 'o', 'l', 'd', '!']

# 문자(문자, 숫자, _)
com = re.compile('\w') 
com.findall('I am 30 years old!')
# ==> ['I', 'a', 'm', '3', '0', 'y', 'e', 'a', 'r', 's', 'o', 'l', 'd']

# 문자가 아닌 것
com = re.compile('\W')
com.findall('I am 30 years old!')
# ==> [' ', ' ', ' ', ' ', '!']

4. .

그냥 . -> \n (개행) 제외한 모든 문자와 매치 / [.] -> 진짜 . 인식

# 모든 문자 (.)
com = re.compile('.')
com.findall('a2@.')
# ==> ['a', '2', '@', '.']

# 온점 ([.])
com = re.compile('[.]')
com.findall('a2@.')
# ==> ['.']

5. ^

그냥 ^ -> 문자열 시작 매치 / [.] -> 부정(not) 의미

# 문자열의 시작(^)
com = re.compile('^choco')
com.findall('choco cake')
# ==> [choco]

# 문자열의 시작(^)
com = re.compile('^choco')
com.findall('ice choco')
# ==> [] (시작점 이외 있으면 인식X)

# 문자열의 시작(^)
com = re.compile('^choco', re.MULTILINE)
com.findall('''choco cake. 
choco latte''')
# ==> ['choco', 'choco'] (re.MULTILINE없으면 [choco])

# 부정 not ([^])
com = re.compile('[^\d]')
com.findall('abc123')
# ==> ['a', 'b', 'c']

6. $

문자열의 끝과 매치

# 문자열의 끝($)
com = re.compile('choco$')
com.findall('ice choco')
# ==> ['choco']

# 문자열의 끝($)
com = re.compile('choco$')
com.findall('choco cake')
# ==> []

# 문자열의 끝($)
com = re.compile('choco$', re.MULTILINE)
com.findall('''ice choco
white choco''')
# ==> ['choco'] (re.MULTILINE 없으면 [choco])

7. *

반복 (0 ~ 무한)

com = re.compile('mizy*')
com.findall('miz')
# ==> ['miz']

com = re.compile('mizy*')
com.findall('mizyyyy')
# ==> ['mizyyyy']

8. +

반복 (1 ~ 무한)

com = re.compile('mizy+')
com.findall('miz')
# ==> []

com = re.compile('mizy+')
com.findall('mizyyyy')
# ==> ['mizyyyy']

9. ?

있어도 되고 없어도 됨 (0 혹은 1)

com = re.compile('mizy?')
com.findall('miz')
# ==> ['miz']

com = re.compile('mizy?')
com.findall('mizyyy')
# ==> ['mizy']

10. [m]

m번 반복

com = re.compile('mizy{3}')
com.findall('mizyyyyyyyyy')
# ==> ['mizyyy']

11. [x, y]

x~y 번 반복

com = re.compile('miz{3,5}y')
com.findall('mizy mizzzy mizzzzzy mizzzzzzzy')
# ==> ['mizzzy', 'mizzzzzy']

12. \

이스케이프, 메타문자가 아닌 일반 문자로 나타낼 때 사용

com = re.compile('[mizy]')
com.findall('[mizy]')
# ==> ['m', 'i', 'z', 'y']

com = re.compile('\[mizy\]')
com.findall('[mizy]')
# ==> ['[mizy]']

메린

포켓몬 마스터가 될거야~

다음 포스트

[Pytorch] Tensor 란? 생성, 차원, 크기

0개의 댓글