[머신러닝 인강] 1. Python Programming 기초(4)

Uno·2021년 2월 27일

패스트캠퍼스 머신러닝과 데이터 분석

목록 보기

4/27

클래스와 인스턴스

class

실세계의 것을 모델링하여 속성(attribute)와 동작(method)를 갖는 데이터 타입
python에서의 string, int, list, dict.. 모두가 다 클래스로 존재
예를들어 학생이라는 클래스를 만든다면, 학생을 나타내는 속성과 학생이 행하는 행동을 함께 정의 할 수 있음
따라서, 다루고자 하는 데이터(변수) 와 데이터를 다루는 연산(함수)를 하나로 캡슐화(encapsulation)하여 클래스로 표현
모델링에서 중요시 하는 속성에 따라 클래스의 속성과 행동이 각각 달라짐

object

클래스로 생성되어 구체화된 객체(인스턴스)
파이썬의 모든 것(int, str, list..etc)은 객체(인스턴스)
실제로 class가 인스턴스화 되어 메모리에 상주하는 상태를 의미
class가 빵틀이라면, object는 실제로 빵틀로 찍어낸 빵이라고 비유 가능

class 선언하기

객체를 생성하기 위해선 객체의 모체가 되는 class를 미리 선언해야 함

class Person:
    pass
bob = Person()
cathy = Person()

a = list()
b = list()

print(type(bob), type(cathy))
print(type(a), type(b))
# <class '__main__.Person'> <class '__main__.Person'>
# <class 'list'> <class 'list'>

init(self)

생성자, 클래스 인스턴스가 생성될 때 호출됨
self인자는 항상 첫번째에 오며 자기 자신을 가리킴
이름이 꼭 self일 필요는 없지만, 관례적으로 self로 사용
생성자에서는 해당 클래스가 다루는 데이터를 정의
- 이 데이터를 멤버 변수(member variable) 또는 속성(attribute)라고 함

class Person:
    def __init__(self):
        print(self, 'is generated')
        self.name = 'Kate'
        self.age = 10

p1 = Person()
p2 = Person()

p1.name = 'aaron'
p1.age = 20

print(p1.name, p1.age)
# <__main__.Person object at 0x000001FD7398DDC8> is generated
# <__main__.Person object at 0x000001FD7398D848> is generated
# aaron 20

class Person:
    def __init__(self, name, age=10):
        # print(self, 'is generated')
        self.name = name
        self.age = age
        
p1 = Person('Bob', 30)
p2 = Person('Kate', 20)
p3 = Person('Aaron')

print(p1.name, p1.age) # Bob 30
print(p2.name, p2.age) # Kate 20
print(p3.name, p3.age) # Aaron 10

self

파이썬의 method는 항상 첫번째 인자로 self를 전달
self는 현재 해당 메쏘드가 호출되는 객체 자신을 가리킴
C++/C#, Java의 this에 해당
역시 이름이 self일 필요는 없으나, 위치는 항상 맨 처음의 parameter이며 관례적으로 self로 사용

class Person:
    def __init__(self, name, age):
        print('self: ', self)
        self.name = name
        self.age = age
        
    def sleep(self):
        print('self:', self)
        print(self.name, '은 잠을 잡니다.')
        
a = Person('Aaron', 20)
b = Person('Bob', 30)

print(a)
print(b)

a.sleep()
b.sleep()
"""
self:  <__main__.Person object at 0x000001FD739C2C08>
self:  <__main__.Person object at 0x000001FD739C2D08>
<__main__.Person object at 0x000001FD739C2C08>
<__main__.Person object at 0x000001FD739C2D08>
self: <__main__.Person object at 0x000001FD739C2C08>
Aaron 은 잠을 잡니다.
self: <__main__.Person object at 0x000001FD739C2D08>
Bob 은 잠을 잡니다.
"""

method 정의

멤버함수라고도 하며, 해당 클래스의 object에서만 호출가능
메쏘드는 객체 레벨에서 호출되며, 해당 객체의 속성에 대한 연산을 행함
{obj}.{method}() 형태로 호출됨

# 1. 숫자를 하나 증가
# 2. 숫자를 0으로 초기화
class Counter:
    def __init__(self):
        self.num = 0
        
    def increment(self):
        self.num += 1
    
    def reset(self):
        self.num = 0
        
    def print_current_value(self):
        print('현재값은:', self.num)
        
    
c1 = Counter()
c1.print_current_value() # 현재값은: 0
c1.increment() 
c1.increment()
c1.increment()
c1.print_current_value() # 현재값은: 3

c1.reset()

c1.print_current_value() # 현재값은: 0

c2 = Counter()
c2.increment()
c2.print_current_value() # 현재값은: 1

method type

instance method - 객체로 호출
- 메쏘드는 객체 레벨로 호출 되기 때문에, 해당 메쏘드를 호출한 객체에만 영향을 미침
class method(static method) - class로 호출
- 클래스 메쏘드의 경우, 클래스 레벨로 호출되기 때문에, 클래스 멤버 변수만 변경 가능

class Math:
    @staticmethod
    def add(a, b):
        return a + b
    
    @staticmethod
    def multiply(a, b):
        return a * b

class inheritance(상속)

기존에 정의해둔 클래스의 기능을 그대로 물려받을 수 있다
기존 클래스에 기능 일부를 추가하거나, 변경하여 새로운 클래스를 정의한다
코드를 재사용할 수 있게된다
상속 받고자 하는 대상인 기존 클래스는 (parent, super, base class 라고 부른다)
의미적으로 is-a관계를 갖는다

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def eat(self, food):
        print('{}은 {}를 먹습니다.'.format(self.name, food))
    
    def sleep(self, minute):
        print('{}은 {}분동안 잡니다.'.format(self.name, minute))
    
    def work(self, minute):
        print('{}은 {}분동안 일합니다.'.format(self.name, minute))

class Student(Person):
    def __init__(self, name, age):
        self.name = name
        self.age = age
    
class Employee(Person):
    def __init__(self, name, age):
        self.name = name
        self.age = age

bob = Employee('Bob', 25)
bob.eat('BBQ') # Bob은 BBQ를 먹습니다.
bob.sleep(30) # Bob은 30분동안 잡니다.
bob.work(60) # Bob은 60분동안 일합니다.

method override

부모 클래스의 method를 재정의(override)
하위 클래스(자식 클래스) 의 인스턴스로 호출 시, 재정의된 메소드가 호출됨

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def eat(self, food):
        print('{}은 {}를 먹습니다.'.format(self.name, food))
    
    def sleep(self, minute):
        print('{}은 {}분동안 잡니다.'.format(self.name, minute))
    
    def work(self, minute):
        print('{}은 {}분동안 일합니다.'.format(self.name, minute))

class Student(Person):
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def work(self, minute):
        print('{}은 {}분동안 공부합니다.'.format(self.name, minute))
        
class Employee(Person):
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def work(self, minute):
        print('{}은 {}분동안 업무를 합니다.'.format(self.name, minute))
        
bob = Employee('Bob', 25)
bob.eat('BBQ') # Bob은 BBQ를 먹습니다.
bob.sleep(30) # Bob은 30분동안 잡니다.
bob.work(60) # Bob은 60분동안 업무를 합니다.

super

하위클래스(자식클래스)에서 부모클래스의 method를 호출할 때 사용

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def eat(self, food):
        print('{}은 {}를 먹습니다.'.format(self.name, food))
    
    def sleep(self, minute):
        print('{}은 {}분동안 잡니다.'.format(self.name, minute))
    
    def work(self, minute):
        print('{}은 {}분동안 준비를 합니다.'.format(self.name, minute))

class Student(Person):
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def work(self, minute):
        super().work(minute)
        print('{}은 {}분동안 공부합니다.'.format(self.name, minute))
        
class Employee(Person):
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def work(self, minute):
        super().work(minute)
        print('{}은 {}분동안 업무를 합니다.'.format(self.name, minute))
        
bob = Employee('Bob', 25)
bob.eat('BBQ') # Bob은 BBQ를 먹습니다.
bob.sleep(30) # Bob은 30분동안 잡니다.
bob.work(60)
# Bob은 60분동안 준비를 합니다.
# Bob은 60분동안 업무를 합니다.

special method

__로 시작 __로 끝나는 특수 함수
해당 메쏘드를 구현하면, 커스텀 객체에 여러가지 파이썬 내장함수나 연산자를 적용 가능
오버라이딩 가능한 함수 목록은 여기서 참조

# Point 
# 2차원 좌표평면 각 점(x, y) 
# 연산
# 두점 의 덧셈, 뺄셈 (1, 2) + (3, 4) = (4, 6)
# 한점과 숫자의 곱셈 (1,  2) * 3 = (3, 6)
# 그 점의 길이 (0,0) 부터의 거리
# x, y 값 가져오기
# 출력하기

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y 
        
    def __add__(self, pt):
        new_x = self.x + pt.x
        new_y = self.y + pt.y
        return Point(new_x, new_y)
    
    def __sub__(self, pt):
        new_x = self.x - pt.x
        new_y = self.y - pt.y
        return Point(new_x, new_y)
    
    def __mul__(self, factor):
        return Point(self.x * factor, self.y * factor)
    
    def __getitem__(self, index):
        if index == 0:
            return self.x
        elif index == 1:
            return self.y
        else:
            return -1
    
    def __len__(self):
        return self.x ** 2 + self.y ** 2
        
    def __str__(self):
        return '({}, {})'.format(self.x, self.y)
        
p1 = Point(3, 4)
p2 = Point(2, 7)

a = 1 + 2
p3 = p1 + p2
p4 = p1 - p2

# p5 = p1.multiply(3)
p5 = p1 * 3

# p1[0] -> x
# p1[1] -> y
print(p1[0]) # 3
print(p1[1]) # 4
print(p1) # (3, 4)
print(p2) # (2, 7)
print(p3) # (5, 11)
print(p4) # (1, -3)
print(p5) # (9, 12)

정규표현식

정규표현식(regular expression)

특정한 패턴과 일치하는 문자열을 '검색', '치환', '제거'하는 기능을 지원
정규표현식의 도움없이 패턴을 찾는 작업(Rule 기반)은 불완전 하거나 작업의 cost가 높음
e.g) 이메일 형시 판별, 전화번호 형식 판별, 숫자로만 이루어진 문자열

raw string

문자열 앞에 r이 붙으면 해당 문자열이 구성된 그대로 문자열로 변환

a = 'abcdef\n' # escapce 문자열
print(a)

b = r'abcdef\n'
print(b)
"""
abcdef

abcdef\n
"""

기본 패턴

a, X, 9 등등 문자 하나하나의 character들은 정확히 해당 문자와 일치
- e.g) 패턴 test는 test 문자열과 일치
- 대소문자의 경우 기본적으로 구별하나, 구별하지 않도록 설정 가능
몇몇 문자들에 대해서는 예외가 존재하는데, 이들은 틀별한 의미로 사용 됨
- . ^ $ * + ? { } \ | ( )
. (마침표) - 어떤 한개의 character와 일치 (newline(엔터) 제외)
\w - 문자 character와 일치 [a-zA-Z0-9_]
\s - 공백문자와 일치
\t, \n, \r - tab, newline, return
\d - 숫자 character와 일치 [0-9]
^ = 시작, $ = 끝 각각 문자열의 시작과 끝을 의미
\가 붙으면 스페셜한 의미가 없어짐. 예를들어 \.는 .자체를 의미 \\는 \를 의미
자세한 내용은 링크 참조 https://docs.python.org/3/library/re.html

search method

첫번쨰로 패턴을 찾으면 mathc 객체를 반환
패턴을 찾지 못하면 None 반환

import re
m = re.search(r'abc', '123abcdef')
print(m) # None

m = re.search(r'\d\d\d\w', '112abcdef119')
print(m) # <re.Match object; span=(0, 4), match='112a'>

m = re.search(r'..\w\w', '@#$%ABCDabcd')
print(m) # <re.Match object; span=(2, 6), match='$%AB'>

metacharacters(메타 캐릭터)

문자들의 범위를 나타내기 위해 사용
[abck] : a or b or c or k
[abc.^] : a or b or c or . or ^
[a-d] : -와 함께 사용되면 해당 문자 사이의 범위에 속하는 문자 중 하나
[0-9] : 모든 숫자
[a-z] : 모든 소문자
[A-Z] : 모든 대문자
[a-zA-Z0-9] : 모든 알파벳 문자 및 숫자
[^0-9] : ^가 맨 앞에 사용 되는 경우 해당 문자 패턴이 아닌 것과 매칭

import re
print(re.search(r'[cbm]at', 'aat')) # None
print(re.search(r'[0-4]haha', '7hahah')) # None
print(re.search(r'[abc.^]aron', 'daron')) # None
print(re.search(r'[^abc]aron', '0aron'))
# <re.Match object; span=(0, 5), match='0aron'>

****

다른 문자와 함께 사용되어 특수한 의미를 지님
- \d : [0-9]와 동일
- \D : 숫자가 아닌 문자[^0-9]와 동일
- \s : 공백 문자(띄어쓰기, 탭, 엔터 등)
- \S : 공백이 아닌 문자
- \w : 알파벳대소문자, 숫자 [a-zA-Z0-9]와 동일
- \W : non alpha-numeric 문자 [^a-zA-Z0-9]와 동일
메타 캐릭터가 캐릭터 자체를 표현하도록 할 경우 사용
- . , \

re.search(r'\Sand', 'apple land banana')
# <re.Match object; span=(6, 10), match='land'>

re.search(r'\.and', '.and')
# <re.Match object; span=(0, 4), match='.and'>

모든 문자를 의미

re.search(r'p.g', 'pig') # <re.Match object; span=(0, 3), match='pig'>

반복패턴

패턴 뒤에 위치하는 *, +, ?는 해당 패턴이 반복적으로 존재하는지 검사
- '+' → 1번 이상의 패턴이 발생
- '*' → 0번 이상의 패턴이 발생
- '?' → 0 혹은 1번의 패턴이 발생
반복을 패턴의 경우 greedy하게 검색 함, 즉 가능한 많은 부분이 매칭되도록 함
- e.g) a[bcd]*b 패턴을 abcbdccb에서 검색하는 경우
  - ab, abcb, abcbdccb 전부 가능 하지만 최대한 많은 부분이 매칭된 abcbdccb가 검색된 패턴

import re
re.search(r'a[bcd]*b', 'abcbdccb')
# <re.Match object; span=(0, 8), match='abcbdccb'>

re.search(r'b\w+a', 'banana')
# <re.Match object; span=(0, 6), match='banana'>

re.search(r'i+', 'piigiii')
# <re.Match object; span=(1, 3), match='ii'>

re.search(r'pi+g', 'pg')
# None

re.search(r'pi*g', 'pg')
# <re.Match object; span=(0, 2), match='pg'>

re.search(r'https?', 'http://www.naver.com')
# <re.Match object; span=(0, 4), match='http'>

^, $

^ 문자열의 맨 앞부터 일치하는 경우 검색
$ 문자열의 맨 뒤부터 일치하는 경우 검색

import re

re.search(r'b\w+a', 'cabana')
# <re.Match object; span=(2, 6), match='bana'>

re.search(r'^b\w+a', 'cabana')
# None

re.search(r'^b\w+a', 'babana')
# <re.Match object; span=(0, 6), match='babana'>

re.search(r'b\w+a$', 'cabana')
# <re.Match object; span=(2, 6), match='bana'>

re.search(r'b\w+a$', 'cabanap')
# None

groupoing

()을 사용하여 그루핑
매칭 결과를 각 그룹별로 분리 가능
패턴 명시 할 때, 각 그룹을 괄호() 안에 넣어 분리하여 사용

import re

m = re.search(r'(\w+)@(.+)', 'test@gmail.com')
print(m.group(1)) # test
print(m.group(2)) # gmail.com
print(m.group(0)) # test@gmail.com

{}

*, +, ?을 사용하여 반복적인 패턴을 찾는 것이 가능하나, 반복의 횟수 제한은 불가
패턴뒤에 위치하는 중괄호{}에 숫자를 명시하면 해당 숫자 만큼의 반복인 경우에만 매칭
{4} - 4번 반복
{3,4} - 3~4번 반복

import re

re.search('pi{3,5}g', 'piiiiig')
# <re.Match object; span=(0, 7), match='piiiiig'>

미니멈 매칭(non-greedy way)

기본적으로 *, +, ?를 사용하면 greedy(맥시멈 매칭)하게 동작함
*?, +?을 이용하여 해당 기능을 구현

import re

re.search(r'<.+>', '<html>haha</html>')
# <re.Match object; span=(0, 17), match='<html>haha</html>'>

re.search(r'<.+?>', '<html>haha</html>')
# <re.Match object; span=(0, 6), match='<html>'>

{}

{m,n}의 경우 m번 에서 n번 반복하나 greedy하게 동작
{m,n}?로 사용하면 non-greedy하게 동작. 즉 최소 m번만 매칭하면 만족

import re

re.search(r'a{3,5}', 'aaaaa')
# <re.Match object; span=(0, 5), match='aaaaa'>

re.search(r'a{3,5}?', 'aaaaa')
# <re.Match object; span=(0, 3), match='aaa'>

match

search와 유사하나, 주어진 문자열의 시작부터 비교하여 패턴이 있는지 확인
시작부터 해당 패턴이 존재하지 않다면 None 반환

import re

re.match(r'\d\d\d', 'my number is 123')
# None

re.match(r'\d\d\d', '123 is my number')
# <re.Match object; span=(0, 3), match='123'>

re.search(r'^\d\d\d', '123 is my number')
# <re.Match object; span=(0, 3), match='123'>

findall

search가 최초로 매칭되는 패턴만 반환한다면, findall은 매칭되는 전체의 패턴을 반환
매칭되는 모든 결과를 리스트 형태로 반환

import re

re.findall(r'[\w-]+@[\w.]+', 'test@gmail.com haha test2@gmail.com nice test test')
# ['test@gmail.com', 'test2@gmail.com']

sub

주어진 문자열에서 일치하는 모든 패턴을 replace
그 결과를 문자열로 다시 반환
두번째 인자는 특정 문자열이 될 수도 있고, 함수가 될 수 도 있음
count가 0인 경우는 전체를, 1이상이면 해당 숫자만큼 치환

import re

re.sub(r'[\w-]+@[\w.]+', 'great', 'test@gmail.com haha test2@gmail.com nice test test', count=1)
# 'great haha test2@gmail.com nice test test'

compile

동일한 정규표현식을 매번 다시 쓰기 번거로움을 해결
compile로 해당 표현식을 re.RegexObject 객체로 저장하여 사용가능

import re

email_reg = re.compile(r'[\w-]+@[\w.]+')
email_reg.search('test@gmail.com haha good')
# <re.Match object; span=(0, 14), match='test@gmail.com'>

머신러닝과 데이터 분석 A-Z 올인원 패키지 Online. 👉 https://bit.ly/3cB3C8y

Uno

이전 포스트

[머신러닝 인강] 1. Python Programming 기초(3)

다음 포스트

[머신러닝 인강] 1. Python Programming 기초(4)

패스트캠퍼스 머신러닝과 데이터 분석

클래스와 인스턴스

정규표현식

[머신러닝 인강] 1. Python Programming 기초(3)

[머신러닝 인강] 2. 데이터 수집을 위한 Python(1)

0개의 댓글

관련 채용 정보