파이썬 클린코드 발표 디스크립터 실전

HHHHH·2021년 8월 8일

개념정리

목록 보기

5/6

결국은 코드 중복을 디스크립터로 추상화하는게 관건.
중복코드를 추상화하면 클라이언트의 코드가 혁신적으로 줄어들 수 있다.

이번 코드의 목적

속성을 가진 일반적인 클래스에서 속성의 값이 달라질 때 마다 추적하려함.

구체적으로
클래스는 여행자를 표현하며 현재 어느 도시에 있는지를 속성으로 가짐.
프로그램을 통해 사용자가 방문한 도시를 추척할 것임.

구현 1번

속성의 setter 메서드에서 값이 변경될 때 검사하여 리스트와 같은 내부 변수에 값을 저장하기

import time


class Traveller:
    def __init__(self, name, current_city):
        self.name = name
        self._current_city = current_city
        self._cities_visited = [current_city]

    @property
    def current_city(self):
        return self._current_city

    @current_city.setter
    def current_city(self, new_city):
        if new_city != self._current_city:
            self._cities_visited.append(new_city)
        self._current_city = new_city

    @property
    def cities_visited(self):
        return self._cities_visited
 
    """
    >>> alice = Traveller("Alice", "Barcelona")
    >>> alice.current_city = "Paris"
    >>> alice.current_city = "Brussels"
    >>> alice.current_city = "Amsterdam"
    >>> alice.cities_visited
    ['Barcelona', 'Paris', 'Brussels', 'Amsterdam']
    >>> alice.current_city
    'Amsterdam'
    >>> alice.current_city = "Amsterdam"
    >>> alice.cities_visited
    ['Barcelona', 'Paris', 'Brussels', 'Amsterdam']
    >>> bob = Traveller("Bob", "Rotterdam")
    >>> bob.current_city = "Amsterdam"
    >>> bob.current_city
    'Amsterdam'
    >>> bob.cities_visited
    ['Rotterdam', 'Amsterdam']
    """

여러곳 에서 똑같에서 똑같은 로직을 쓰면 사용한다면 반복해야 해서 귀찮다.
데코레이터나 프로퍼티 빌더를 쓸 수 있으나 프로퍼티 빌더는 디스크립터의 더 복잡한 특별한 버전임으로 이 책에선 다루지 않음

(의문: 정작 python property builder 치면 나오는게 거의 없음;; 디자인 패턴으로 빌더 만 나옴
https://refactoring.guru/design-patterns/builder/python/example)

디스크립터를 사용한 이상적인 구현 방법


class HistoryTracedAttribute:
    """Trace the values of this attribute into another one given by the name at
    ``trace_attribute_name``.
    """

    def __init__(self, trace_attribute_name: str) -> None:
        self.trace_attribute_name = trace_attribute_name # [1]
        self._name = None

    def __set_name__(self, owner, name):
        self._name = name

    def __get__(self, instance, owner):
        if instance is None:
            return self
        return instance.__dict__[self._name]

    def __set__(self, instance, value):
        self._track_change_in_value_for_instance(instance, value)
        instance.__dict__[self._name] = value

    def _track_change_in_value_for_instance(self, instance, value):
        self._set_default(instance) # [2]
        if self._needs_to_track_change(instance, value):
            instance.__dict__[self.trace_attribute_name].append(value)

    def _needs_to_track_change(self, instance, value) -> bool:
        """Determine if the value change needs to be traced or not.
        Rules for adding a value to the trace:
            * If the value is not previously set (it's the first one).
            * If the new value is != than the current one.
        """
        try:
            current_value = instance.__dict__[self._name]
        except KeyError: # [3]
            return True
        return value != current_value # [4]

    def _set_default(self, instance):
        instance.__dict__.setdefault(self.trace_attribute_name, []) # [6]


class Traveller:
"""
A person visiting several cities.
We wish to track the path of the traveller, as he or she is visiting each
new city.
"""

    current_city = HistoryTracedAttribute("cities_visited") # [1]

    def __init__(self, name, current_city):
        self.name = name
        self.current_city = current_city # [5]

속성의 이름은 디스크립터에 할당된 변수 중 하나로 여기서는 current_city임.
그리고 이제 이에 대한 추적을 저장할 변수의 이름을 디스크립터에 전달함. 이 예에서는 cities_visited라는 속성에
current_city의 모든 값을 추적하도록 지시함.
디스크립터를 처음으로 호출할 때는 추적 값이 존재하지 않을 것이므로 나중에 추가할 수 있도록 비어있는 배열로 초기화
처음 Traveller를 호출 할 때는 방문지가 없으므로 인스턴스 사전에서 current_city의 키도 존재하지 않을 것임.
이런 경우도 새로운 여행지가 생긴 것이므로 추적의 대상이 됨. 앞에서 목록을 초기화하는 것과 비슷한 이유이다.
새 값이 현재 설정된 값과 다른 경우에만 변경 사항을 추적한다.
Traveller의 __init__ 메서드에서 디스크립터가 이미 생성된 단계이다. 할당 명령은 2단계 값을 추적하기 위한 빈 리스트 만들기를 실행하고, 3단계를 실행하여 리스트에 값을 추가하고 나중에 검색하기 위한 키를 설정한다.
사전의 setdefault 메서드는 KeyError를 피하기 위해 사용됨. setdefault는 첫 번째 파라미터가 있으면 해당 값을 반환하고 없으면 두 번째 파라미터를 반환함.

>>> alice = Traveller("Alice", "Barcelona")
>>> alice.current_city = "Paris"
>>> alice.current_city = "Brussels"
>>> alice.current_city = "Amsterdam"
>>> alice.cities_visited
['Barcelona', 'Paris', 'Brussels', 'Amsterdam']
>>> alice.current_city
'Amsterdam'
>>> alice.current_city = "Amsterdam"
>>> alice.cities_visited
['Barcelona', 'Paris', 'Brussels', 'Amsterdam']
>>> bob = Traveller("Bob", "Rotterdam")
>>> bob.current_city = "Amsterdam"
>>> bob.current_city
'Amsterdam'
>>> bob.cities_visited
['Rotterdam', 'Amsterdam']

디스크립터 코드는 다소 복잡하지만 클라이언트 클래스 코드는 상당히 간단해졌음.
디스크립터가 클라이언트 클래스와 완전히 독립적이 되었기에(어떠한 비지니스 로직도 없음, 함수 명들도 일반적으로 작성했음)
완전히 다른 어떤 클래스에 적용해도 같은 효과를 낼 것임.

디스크립터는 비지니스 로직의 구현보다는 라이브러리, 프레임워크 또는 내부 API 정의에 더 적합하다.

다른 형태의 디스크립터

전역 상태 공유 이슈

디스크립터는 클래스 속성으로 설정하기 때문에 몇가지 고려해야할 점이 있음.

클래스 속성이기에 디스크립터 객체에 데이터를 보관하면 모든 객체가 동일한 값에 접근할 수 있음.

class SharedDataDescriptor:
    def __init__(self, initial_value):
        self.value = initial_value

    def __get__(self, instance, owner):
        if instance is None:
            return self
        return self.value 

    def __set__(self, instance, value):
        self.value = value # slef 즉 ClientClass.descriptor에 받음 


class ClientClass:
    descriptor = SharedDataDescriptor("first value")

    
>>> client1 = ClientClass()
>>> client1.descriptor
'first value'
>>> client2 = ClientClass()
>>> client2.descriptor
'first value'
>>> client2.descriptor = "value for client 2"
>>> client2.descriptor
'value for client 2'
>>> client1.descriptor
'value for client 2'

한 객체의 값을 변경했는데 모든 객체의 값이 다 변경되었음. ClientClass.descriptor가 고유하기 때문임. 모든 인스턴스에 대해 동일한 속성임.

클래스의 모든 객체가 상태를 공유하는 일종의 Borg 패턴을 구현한 경우처럼 실제로 활용할수도있으나 일반적으로는 객체를 구별함( 9장 "일반적인 디자인 패턴"에서 보다 자세히 설명함

(의문)
아래 처럼 수정하면 값을 공유하지 않음. 근데 확실하지않음 아. 아래는 대체 어떻게 작용하는것일까?

instance.__dict__[self.name] = value와 차이점은?

이렇게하면 덮어 씌운다고 생각했는데 __del__ 이 없다고 삭제는 안됨..

set_name으로 바꿔보고 해보기?!?

class SharedDataDescriptor:
    def __init__(self, initial_value):
        self.value = initial_value

    def __get__(self, instance, owner):
        if instance is None:
            return self
        return self.value

    def __set__(self, instance, value):
        instance.value = value


class ClientClass:
    descriptor = SharedDataDescriptor("first value")

해결법 1. 객체 사전에 접근

디스크립터는 각 인스턴스의 값을 보관했다가 반환해야함. 인스턴스의 __dict__ 사전에 값을 설정하고 검색하는 이유임. gettar()나 settar()는 무한 루프때문에 사용할 수 없기에 __dict__ 속성을 수정하는 것이 최후의 사용 가능한 선택임.

해결법 2. 약한 참조 사용

weakref 모듈을 활용한 약한 key 사전 활용

__dict을 사용하지 않으려면 디스크립터 객체가 직접 내부에 매핑을 통해 각 인스턴스의 값을 보관하고 반환할 수 있음.

단 이러한 내부 매핑을 할 때 사전을 사용하면 안되고, 클라이언트 클ㄹ래스는 디스크립터에 대한 참조를 가지고 디스크립터는 디스크립터를 사용하는 객체에 대한 참조(인스턴스)에 대해 참조를 가짐으로 순환 종속으로 가비지 컬렉션이 되지 않는 문제가 있음. 그러므로 weakref을 사용함.

weakref 영어 설명
weakref 한국어 설명

약한 참조에 대해서

from weakref import WeakKeyDictionary


class DescriptorClass:
    def __init__(self, initial_value):
        self.value = initial_value
        self.mapping = WeakKeyDictionary()

    def __get__(self, instance, owner):
        if instance is None:
            return self
        return self.mapping.get(instance, self.value)

    def __set__(self, instance, value):
        self.mapping[instance] = value


class ClientClass:
    descriptor = DescriptorClass("default value")


>>> client1 = ClientClass()
>>> client2 = ClientClass()
>>> client1.descriptor = "new value"
client1 must have the new value, whilst client2 has to still be with the
default one:
>>> client1.descriptor
'new value'
>>> client2.descriptor
'default value'
Changing the value for client2 doesn't affect client1
>>> client2.descriptor = "value for client2"
>>> client2.descriptor
'value for client2'
>>> client2.descriptor != client1.descriptor
True

약한 참조에서 고려해야할 것들

디스크립터가 속성을 보유함.
인스턴스 객체는 더 이상 속성을 보유 하지 않기 때문에 논란의 여지가 있음. 개념적 관점에서 정확하지 않을수도. 이 세부 사항을 잊으면 객체의 사전에 있는 내용을 찾으려고 할 수 있으나(ex: vars(client)) 객체는 속성을 보유하지 않았기 때문에 완전한 데이터를 반환하지 않을 것임.
객체는 __hash__ 메서드를 구현하여 해시가 가능해야함. 만약 해시가 가능하지 않는다면 WeakKeyDictionary에 매핑할 수 가 없다. 어떤 애플리케이션에서는 엄격한 요구사항일 수가 있음.

그러므로 일반적으로 인스턴스의 __dict__ 사전을 이용하는것이 나음.

디스크립터에 대한 추가 고려 사항.

어디에 디스크립터를 사용해 어떻게 개현할 수 있을까?
디스크립터를 사용한 구현의 장단점은?

코드 재사용

프로퍼티

프로퍼티는 디스크립터의 특수한 경우이기에
프로퍼티가 필요한 구조가 반복되는 경우에 디스크립터를 활용한 추상화를 하면 코드 중복을 줄일 수 있다.

@prorety 데코레이터는 get, set 및 delete를 정의하여
디스크립터 프로토콜을 모두 구현한 디스크립터다.
즉 디스크립터는 프로퍼티보다 훨씬 더 복잡한 작업에 사용가능하다.

데코레이터와의 비교

데코레이터처럼 3의 규칙을 활용 가능.
디스크립터는 일반적으로 비지니스 로직을 포함하지않고 구현코드만을 담게함.

즉 내부 API에만 디스크립터를 사용하는게 좋음. 이는 일회성 솔루션이 아닌 라이브러리나 프레임워크의 디자인에 대해서 기능을 확장하기가 좋기 때문임.

클래스 데코레이트와 구현 비교

from datetime import datetime
from functools import partial
from typing import Any, Callable


class BaseFieldTransformation:
    """Base class to define descriptors that convert values."""

    def __init__(self, transformation: Callable[[Any, str], str]) -> None:
        self._name = None
        self.transformation = transformation

    def __get__(self, instance, owner):
        if instance is None:
            return self
        raw_value = instance.__dict__[self._name]
        return self.transformation(raw_value)

    def __set_name__(self, owner, name):
        self._name = name

    def __set__(self, instance, value):
        instance.__dict__[self._name] = value


ShowOriginal = partial(BaseFieldTransformation, transformation=lambda x: x)
HideField = partial(
    BaseFieldTransformation, transformation=lambda x: "**redacted**"
)
FormatTime = partial(
    BaseFieldTransformation,
    transformation=lambda ft: ft.strftime("%Y-%m-%d %H:%M"),
)

하위 클래스를 추가 생성하는 방법으로 functtools.partial을 사용하고 있음. 클래스 반환 함수에 호출가능한 함수를 직접 전달하여 함수의 새버전을 만듬.


class LoginEvent:
    username = ShowOriginal()
    password = HideField()
    ip = ShowOriginal()
    timestamp = FormatTime()

    def __init__(self, username, password, ip, timestamp):
        self.username = username
        self.password = password
        self.ip = ip
        self.timestamp = timestamp

    def serialize(self):
        return {
            "username": self.username,
            "password": self.password,
            "ip": self.ip,
            "timestamp": self.timestamp,
        }

   >>> le = LoginEvent(
   ...     "usr", "secret password", "127.0.0.1", datetime(2016, 7, 20, 15, 45)
   ... )
   >>> vars(le)
   {'username': 'usr', 'password': 'secret password', 'ip': '127.0.0.1', 
   'timestamp': datetime.datetime(2016, 7, 20, 15, 45)}
   >>> le.serialize()
   {'username': 'usr', 'password': '**redacted**', 'ip': '127.0.0.1', 
   'timestamp': '2016-07-20 15:45'}
   >>> le.password
   '**redacted**'

serialize() 메서드를 추가하고 핃르를 결과 사전에 표시하기 전에 숨김. 그러나 원본 값도 구할 수있음. 값을 설정할때는 미리 변환된 값을 저장하고 가져올 떄는 그대로 가져올 수 도 있음.

더 간단하게 LoginEvent 구현


class BaseEvent:
    """Abstract the serialization and the __init__"""

    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)

    def serialize(self):
        return {
            attr: getattr(self, attr) for attr in self._fields_to_serialize()
        }

    def _fields_to_serialize(self):
        for attr_name, value in vars(self.__class__).items():
            if isinstance(value, BaseFieldTransformation):
                yield attr_name


class NewLoginEvent(BaseEvent):
    username = ShowOriginal()
    password = HideField()
    ip = ShowOriginal()
    timestamp = FormatTime()

훨씬 코드가 간결해짐. 기본 클래스는 공통 메서드만 추상화하고 각 이벤트 클래스는 더 작고 간단하게 됨.
클래스 데코레이터를 사용한 원래 방식도 좋지만 디스크립터를 사용한 방식이 보다 더 뛰어남.

HHHHH

공부중

이전 포스트

파이썬 클린코드 발표 디스크립터 종류

다음 포스트