weak vs unowned (Swift)

Park Jong Ho·2022년 12월 10일

Unowned와 weak의 라이프타임과 성능

기본적으로 swift 코드를 작성할 때, 객체 간의 순환 참조를 피하기 위해서 unowned와 weak 중 하나를 선택해야 한다.

그러나 두 개의 차이점에 대해선 알지만, 단순히 생각하면 옵셔널 언래핑 과정이 필요하더라도 weak를 사용하는게 좋다는 생각이 드는데, Swift에서 제공해주는 Null safety를 포기하면서까지 unowned를 사용하는 이유가 있지 않을까?

기초

많은 언어에서, 순환 참조에 의해 인스턴스가 메모리에서 할당되지 않는 문제를 해결하기 위해 weak 참조라는 개념이 존재한다.

swift엔 두 가지의 weak 참조가 존재한다.

unowned
weak

두 가지는 같은 목적을 공유하지만, 관련된 객체의 라이프사이클과, 성능적인 측면에서 약간의 차이점을 가진다.

ARC는 3가지의 reference count를 가진다.

strong ref count
unowned ref count
weak ref coutn

또한 strong ref count가 0이 되었을 때, unowned ref count가 0이 아니면 object는 deinitialized 되어 deinit 구문이 호출되지만, 할당된 메모리가 해제되진 않는다.

왜냐하면 object의 헤더가 마지막 unowned reference가 해제될 때 까지 object에 대한 참조를 잡고 있어야 하기 때문이다.

object가 메모리에서 할당 해제되는 시점은, strong ref count 와 unowned ref count가 모두 0이 될 때이다.

또한 object가 weak reference에 의해 참조될 때, swift는 해당 object에 side table 을 할당한다.

object가 side table을 가지지 않으면, strong과 unowned reference는 모두 object에 직접 저장된다.
그러나 만약 object가 side table을 가지면, object 내부엔 side table에 대한 포인터가 저장된다. 그리고 strong, unowned, weak count, 그리고 object에 대한 참조는 모두 side table에 저장된다. (side table에 대한 추가적인 작업을 해야하기 때문에, weak 가 성능이 더 안 좋은 것으로 보인다!)
여기서 중요한게 weak reference는 object 에 대해 직접 참조하는게 아니라, side table을 참조한다. 이게 의미하는 바는, weak reference가 남아있더라도 object가 메모리에서 해제될 수 있다는 소리다.
side table은 object가 해제된다고 바로 해제되는 건 아니고, 마지막 weak reference가 남아있을 때 메모리에서 해제된다. 그 이유는 weak reference는 object가 아니라 side table에 대한 참조를 가지고 있기 때문이다.

클로저에서의 순환 참조 고려

Swift에서, 클로저를 작성할 때 기본적으로 클로저를 가지고 있는 객체, 혹은 클로저에서 참조할 객체에 대해 retain cycle을 가지지 않게 하기 위해 capture list라는 방식을 사용한다.

capture list를 사용해서 클로저의 최상단 부분에 클로저 외부 변수들을 클로저 내부에서 어떻게 참조할지에 대해서 작성할 수 있다.

var i1 = 1, i2 = 1

var fStrong = {
    i1 += 1
    i2 += 2
}

fStrong()
print(i1,i2) //Prints 2 and 3

첫 번째 예시를 보자! fStrong이라는 클로저에선 외부 변수인 i1과 i2의 값을 변경하고 있다. 이 클로저에선 capture list를 사용하지 않았는데, 따라서 외부의 변수 (i1과 i2)에 대해서 strong reference를 생성한다.

따라서 클로저 내부에서 강한 참조를 가지는 외부 변수를 변경시키는 것은 원래 값도 변경시킨다! 따라서 2와 3이 출력된 것을 볼 수 있다.

var fCopy = { [i1] in
    print(i1,i2)
}

fStrong()
print(i1,i2) //Prints 2 and 3  

fCopy()  //Prints 1 and 3

fCopy라는 새로운 클로저를 선언하였고, i1 외부 변수에 대해 capture list를 생성했다! 단순히 이런 식으로 캡쳐 리스트를 생성하면, 원래 값의 복사본이 생성되며, value type이든 reference type이든 모두 똑같이 동작한다.

따라서 fCopy에선 i1의 원래 값이었던 1을 가지는 새로운 상수가 생성되었기 때문에, 원래 값이 2로 바뀌더라도 복사된 값인 1이 출력된다.

이번엔 reference type에 대해 캡쳐 리스트를 사용해보자.

class aClass{
    var value = 1
}

var c1 = aClass()
var c2 = aClass()

var fSpec = { [unowned c1, weak c2] in
    c1.value += 1
    if let c2 = c2 {
        c2.value += 1
    }
}

fSpec()
print(c1.value,c2.value) //Prints 2 and 2

참조 타입에 대해서는, 위와 같이 클로저 내부에서 unowned, weak 캡쳐를 함으로써 retain cycle를 깨트릴 수 있다.

그러나 unowned와 weak는 다른 특성을 가지고 있다.

unowned reference의 경우, 원래 인스턴스가 클로저가 접근 가능할 때는 항상 nil 이 아님을 보장해야 하며, 따라서 unwrapped optional 이다. 따라서 만약 정확히 사용하지 않을 경우 크래쉬가 날 수 있다. (원래 인스턴스가 nil 이 되었는데 접근하면 바로 앱이 종료됨)
weak는 unowned와는 다르게, 클로저 실행도중 참조하던 인스턴스가 nil이 될 수 있을 때 사용한다. 즉 클로저가 실행 중일 때도 언제든 nil이 될 수 있으며 optional 이기 때문에 항상 optional unwrapping 과정을 걸쳐서 값이 유효한지 확인 후 사용해야 한다.

요약: 참조하는 인스턴스와 클로저의 라이프 타임이 같을 때, 즉 둘이 동시에 해제될 때는 unowned를 사용하는 것이 좋다. 참조하는 인스턴스가 nil일 때도 여전히 클로저에 접근이 가능하다면, 그 때는 weak 참조를 하는 것이 좋다.

그러나 위 내용만 보면 의문이 들 수 있다.

weak 참조를 통하면 둘 사이의 관계가 어떻게 되던지 항상 optional unwrapping 과정을 통해 안전하게 인스턴스를 사용할 수 있는데, 굳이 unowned를 써야 할 필요가 있을까? 그냥 모두 weak 참조해버리면 되는 것 아닐까?

정답은 '아니다'다. 지금부터 그 이유에 대해서 알아보자!

둘은 성능이 차이난다.

weak 참조에 대한 일반적인 구현을 한 번 살펴보자! 어떤 인스턴스 A에 대해 새로운 weak 참조가 생길 때마다, 인스턴스에 대한 모든 weak 참조들이 저장된 table에 등록되야 한다.

만약 어떤 곳에서도 A를 참조하지 않는다면, 해당 A 인스턴스는 메모리에서 해제된다. 그러나! 해제되기 전 해줘야 하는 작업이 있는데 바로 table에 저장되 있는 모든 weak reference들을 nil로 만들어줘야 한다는 것이다!

이것을 zeroing weak 라고도 부른다고 한다. 그러나 이런 접근 방식은 멀티 쓰레드에서 동시 접근이 일어날 때 많은 오버헤드가 있다고 한다. 인스턴스가 할당 해제되기 시작할 때부터 모든 상황에서 해당 인스턴스에 대한 접근을 해서는 절대 안된다.

그러나 Swift는 일반적인 위 구현 방식보다 덜 복잡하고 빠른 메커니즘을 사용한다.

Swift의 모든 object는 두 개의 레퍼런스 카운터를 유지한다.

strong reference counter - ARC로 하여금 안전하게 메모리에서 해제할 수 있게 해주는 카운터
additional weak reference counter - 얼마나 많은 unowned 혹은 weak 참조가 생겼는지를 나타내는 카운터로, 이 counter가 0에 도달했을 때, 비로소 object가 할당 해제된다. (원래 알던 개념이랑 완전히 다른데요????????)

모든 ojbect는 사실 unowned reference가 모두 release 될 때까지 해제되지 않는다. 여전히 object에 접근할 수는 있지만, uninitialized 상태가 될 뿐이다. (deinit 구문이 호출되지만, 여전히 힙에서 해제되진 않은 상태)

unowned 참조가 생성될 때마다, unowned reference counter가 atomic하게 증가되며, unowend reference를 사용할 때마다 항상 strong reference counter를 확인한다. 그러나 strong reference counter가 0일 때 접근하면 앱이 충돌나고 종료되게 된다.

추가적인 최적화 기법으로, -OFast 옵션과 함께 컴파일된다면, unowend reference는 더 이상 object의 유효성에 대한 검사를 하지 않게 되며, Objective C 에서 그랬던 것처럼 unsafe unretained 처럼 동작하게 된다. 만약 object가 더 이상 유효하지 않으면, reference가 garbage memory를 참조하는 것이다.

(정확히 이해는 안 되지만, unowned reference를 생성하기 전에 항상 strong reference counter를 확인하고, unowned reference count를 증가시킬지 말지 결정한다.

이렇게 해서 unowned reference가 생성되면, 해당 unowned reference가 해제되기 전까지 인스턴스는 메모리에서 해제되지 않는다. 즉 클로저 내부에서 unowned 참조를 생성할 때 strong reference counter를 확인하고, 만약 사용 가능하다면 해당 unowned reference는 사용이 종료, 즉 클로저가 죵로될 때 까지 유효하다는 소리 아닐까? 중간에 nil이되서 사용 불가능한게 아닌거 아닐까?)

weak reference는 추가적으로 optional container 내부에 indirection wrapping unowned refence를 추가한다는데, 뭔 소린지 모르겠다....

어쩄든 중요한 건, unowned가 사용가능 할 때는 항상 unowned를 사용하는게 좋다는 것! 그리고 OFast 옵션에 대해서도 자세히 알아봐야겠다.

Performance: A Look Under the hood

우선 Swift언어가 어떻게 컴파일 되서 목적코드로 변환되는지 알아보자...

swiftc block diagram

What is LLVM

LLVM 이란 Low Level Virtual Machine, 즉 해석하면 저수준 가상 머신이다. 그러나 가상 머신과 관련이 없다.

그냥 LLVM 자체로 생각하자.

LLVM은 중간/ 이진 기계 코드를 구성하고, 최적화 및 생성하는데 사용되는 라이브러리다.

LLVM은 위 그림과 같이 두 부분이 있는데, 첫 번째는 Objective-C, Swift, Pytho 과 같이 앱을 만들기 위해 사용하는 '프론트 엔드' 파트와, 해당 앱을 컴퓨터 코드(ISA)로 컴파일하는 '백엔드' 가 있다.

프론트 엔드

C, Objc, swift와 같은 고급 언어들을 읽고 파싱하여, IR(Intermediate Representation) 이 된다.

백엔드

백엔드는 이 IR을 가지고 최적화를 진행하고, 최종적으로 ISA 에 맞는 기계어로 만든다. Swift는 Swift AST 와 Swift IL이라는 두 가지 과정을 거쳐서 IR이 생성되는데.... 자세한 건 공식 문서에 나와있다.

https://www.swift.org/swift-compiler/#compiler-architecture

어쨌든 LLVM의 핵심은 중간 표현 (Intermediate representation)으로, 어셈블리어와 비슷한 저급 프로그래밍 언어라고 한다!

어쨌든 이 IR은 세 가지의 다른 형태로 표현될 수 있다.

in-memory representation (내부적으로 사용되는 표현임)
serialized bitcode (뭔지 모르겠음)
human-readable form

마지막 형태는 사람이 읽을 수 있는 형태의 표현으로, 마지막 과정인 특정 ISA에 의존적인 기계어로의 번역 전(LLVM 백엔드가 하는 일)의 코드를 분석할 수 있는 가장 유용한 형태다.

어쩄든 위 LLVM 사진에서 나온 것처럼, swiftc가 다른 LLVM 기반의 컴파일러와 다른 점은, 바로 IR 생성 전에 추가적인 프로세스가 존재한다는 것이다.

SILGen process

IR을 생성하기 바로 전 단계로, 우리가 작성한 소스코드를 검사하고, 최적화 작업을 수행한다. 그리고 이 과정이 끝나면 intermediate high level 표현인 SIL (Swift Intermediate Language) 형식의 표현이 나온다. 그리고 이 SIL 이 IR로 변환되게 된다.

SILGen 은 AST로 표현되는 소스 코드를 SIL 표현으로 변환하며, 그리고 나서 컴파일러가 swift diagnostic check를 진행한다.

SIL은 추가적인 요소들을 사용하여 swift 표현을 확장시킨다. (메모리 관리, 최적화 측면에서 여러 가지 swift코드를 추가한다는 뜻인듯) SIL은 여전히 Swift의 타입 시스템과 Swift 선언에 대해서 이해할 수 있지만, top Level의 swift code 혹은 function 들은 무시될 수 있다 (Inlining과 같은 최적화 과정 때문인가?)

어쩄든 위의 내용을 공부한 것은 모두 unowned 와 weak 를 사용한 코드가 SIL 에서 어떻게 변환되는지 보기 위해서다....

class aClass{
    var value = 1
}

var c1 = aClass()
var c2 = aClass()

var fSpec = { 
    [unowned c1, weak c2] in
    c1.value = 42
    if let c2o = c2 {
        c2o.value = 42
    }
}

fSpec()

SIL 을 생성하기 위해서!!

터미널에서 다음과 같은 명령어를 사용해 swift source file을 변환하자.

xcrun swiftc -emit-sil <스위프트 파일 이름>.swift

다음과 같은 SIL 코드가 나오게 된다.

/*
  This file contains canonical SIL 
*/
sil_stage canonical             

/* 
  Some special import available only internally that can be used in SIL 
*/
import Builtin                  
import Swift
import SwiftShims

/* 
 Definitions for three global variables for c1,c2 and the fSpec closure 
 @_Tv4clos2c1CS_6aClass is the symbol name for this variable and $aClass 
 its type (types start with $). Variable names are mangled here but can 
 be transformed in something more readable as we'll see below.  
*/
// c1
sil_global hidden @_Tv4sample2c1CS_6aClass : $aClass

// c2
sil_global hidden @_Tv4sample2c2CS_6aClass : $aClass

// fSpec
sil_global hidden @_Tv4sample5fSpecFT_T_ : $@callee_owned () -> ()

...

/*
  A hierarchical scope definition that refers to positions in the original source.
  Each SIL instruction will point to the sil_scope it was generated from.
*/
sil_scope 1 {  parent @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32 }
sil_scope 2 { loc "sample.swift":14:1 parent 1 }


/* 
  An autogenerated @main function that contains the code of our original global
  scope.
 
  It follows the familiar c main() structure accepting the number of
  arguments and an arguments array. The function conforms to the c calling convention.
  This function contains the instructions needed to invoke the closure above.
*/
// main
sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32 {
/*
  Registers starts with a % followed by a numeric id.
  Every time a new register is defined (or at the beginning of a function for function
  parameters) the compiler adds a trailing comment with the list of registers or instructions
  that depend on its value (called users). 
  For other instructions, the id of the current instruction is provided.

  In this case, register 0 will be used to calculate the content of register 4 and register 1
  will be used to create the value of register 10.
*/
// %0                                             // user: %4
// %1                                             // user: %10
/*
  Every function is decomposed in a series of basic blocks of instructions and each block ends 
  with a terminating instruction (a branch or a return). 
  This graph of blocks represents all the possible execution paths of the function.
*/
bb0(%0 : $Int32, %1 : $UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>):
  ...
  /*
    Each SIL instruction has a reference to the source location that contains the Swift 
    instruction from which it originated and a reference to the scope it's part of.
    We'll look to some of this below when analyzing this method.
  */
  unowned_retain %27 : $@sil_unowned aClass, loc "sample.swift":9:14, scope 2 // id: %28
  store %27 to %2 : $*@sil_unowned aClass, loc "sample.swift":9:14, scope 2 // id: %29
  %30 = alloc_box $@sil_weak Optional<aClass>, var, name "c2", loc "sample.swift":9:23, scope 2 // users: %46, %44, %43, %31
  %31 = project_box %30 : $@box @sil_weak Optional<aClass>, loc "sample.swift":9:23, scope 2 // user: %35
  %32 = load %19 : $*aClass, loc "sample.swift":9:23, scope 2 // users: %34, %33
  ...
}

...

/* 
  A series of autogenerated methods for aClass, init/deinit,  
  setter/getter and other utility methods. 
  
  The comments added by the compiler clarify what they do.
*/

/*
  Hidden function are visible only inside their module.

  @convention(method) is the default Swift method calling convention, an additional 
  parameter is added at the end to contain a reference to self.
*/
// aClass.__deallocating_deinit
sil hidden @_TFC4clos6aClassD : $@convention(method) (@owned aClass) -> () {
    ...
}

/*
  @guaranteed parameters are guaranteed to be valid for all the duration of the call.
*/
// aClass.deinit
sil hidden @_TFC4clos6aClassd : $@convention(method) (@guaranteed aClass) -> @owned Builtin.NativeObject {
    ...
}

/*
  Functions annotated with [transparent] are small function that can be inlined.
*/
// aClass.value.getter
sil hidden [transparent] @_TFC4clos6aClassg5valueSi : $@convention(method) (@guaranteed aClass) -> Int {
    ...
}

// aClass.value.setter
sil hidden [transparent] @_TFC4clos6aClasss5valueSi : $@convention(method) (Int, @guaranteed aClass) -> () {
    ...
}

// aClass.value.materializeForSet
sil hidden [transparent] @_TFC4clos6aClassm5valueSi : $@convention(method) (Builtin.RawPointer, @inout Builtin.UnsafeValueBuffer, @guaranteed aClass) -> (Builtin.RawPointer, Optional<Builtin.RawPointer>) {
    ...
}

/*
  @owned specifies that the object is owned by the caller.
*/
// aClass.init() -> aClass
sil hidden @_TFC4clos6aClasscfT_S0_ : $@convention(method) (@owned aClass) -> @owned aClass {
    ...
}

// aClass.__allocating_init() -> aClass
sil hidden @_TFC4clos6aClassCfT_S0_ : $@convention(method) (@thick aClass.Type) -> @owned aClass {
    ...
}

/* 
  The closure.
*/
// (closure #1)
sil shared @_TF4closU_FT_T_ : $@convention(thin) (@owned @sil_unowned aClass, @owned @box @sil_weak Optional<aClass>) -> () {
    ...
    /* SIL for the closure, see below */
    ...
}

...

/* 
  sil_vtable defines the virtual function table for the aClass class.

  It contains as expected all the autogenerated methods.
*/
sil_vtable aClass {
  #aClass.deinit!deallocator: _TFC4clos6aClassD	// aClass.__deallocating_deinit
  #aClass.value!getter.1: _TFC4clos6aClassg5valueSi	// aClass.value.getter
  #aClass.value!setter.1: _TFC4clos6aClasss5valueSi	// aClass.value.setter
  #aClass.value!materializeForSet.1: _TFC4clos6aClassm5valueSi	// aClass.value.materializeForSet
  #aClass.init!initializer.1: _TFC4clos6aClasscfT_S0_	// aClass.init() -> aClass
}

Action	Unowned	Weak
Pre-call #1	unowned_retain the object	Create a @box, strong_retain the object, create an optional and store it in the @box,release the optional
Pre-call #2	strong_retain_unowned, unowned_retain and strong_release	strong_retain
Closure execution	strong_retain_unowned, unowned_release	load_weak, switch on Optional, strong_release
Post-call	unowned_release	strong_release

위의 표는 SIL 표현에서 사용된 unowned, weak 에 관한 작업들을 나타냈으며, weak 참조가 unowned 참조에 비해 더 많은 작업을 하는 것을 볼 수 있다. 그리고 위 작업들에 대한 설명은 아래와 같다.

unowned_retain: Increments the unowned reference count of the heap object.
strong_retain_unowned: Asserts that the strong reference count of the object is still positive, then increases it by one.
strong_retain: Increases the strong retain count of the object.
load_weak: Not really an ARC call but it increments the strong reference count of the object referenced by the optional.
strong_release: Decrements the strong reference count of the object. If the release operation brings the strong reference count of the object to zero, the object is destroyed and the weak references are cleared. When both its strong and unowned reference counts reach zero, the object’s memory is deallocated.
unowned_release: Decrements the unowned reference count of the object. When both its strong and unowned reference counts reach zero, the object’s memory is deallocated.

이제 weak 와 unowned 의 차이점에 대해 더 자세히 알아보기 위해, 일단 unowned_retain과 unowend_release의 구현부분을 보자.

SWIFT_RT_ENTRY_VISIBILITY
void swift::swift_unownedRetain(HeapObject *object)
    SWIFT_CC(RegisterPreservingCC_IMPL) {
  if (!object)
    return;

  object->weakRefCount.increment();
}

SWIFT_RT_ENTRY_VISIBILITY
void swift::swift_unownedRelease(HeapObject *object)
    SWIFT_CC(RegisterPreservingCC_IMPL) {
  if (!object)
    return;

  if (object->weakRefCount.decrementShouldDeallocate()) {
    // Only class objects can be weak-retained and weak-released.
    auto metadata = object->metadata;
    assert(metadata->isClassObject());
    auto classMetadata = static_cast<const ClassMetadata*>(metadata);
    assert(classMetadata->isTypeMetadata());
    SWIFT_RT_ENTRY_CALL(swift_slowDealloc)
        (object, classMetadata->getInstanceSize(),
         classMetadata->getInstanceAlignMask());
  }
}

unownedRetain
- 단순히 object의 weak reference count만 atomic하게 증가시키기 때문에, 단순합니다.
unownedRelease
- unownedRetain보다는 복잡하다. 왜냐하면 weak reference count를 감소시켰을 때, 메모리에서 인스턴스를 해제해야 하기 때문이다.

즉 unowned reference는 단순히 몇 개의 atomic한 카운트 증감 연산만으로 이루어져 있습니다. 이번에는 weak에 대해서 알아보자.

HeapObject *swift::swift_weakLoadStrong(WeakReference *ref) {
  if (ref->Value == (uintptr_t)nullptr) {
    return nullptr;
  }

  // ref might be visible to other threads
  auto ptr = __atomic_fetch_or(&ref->Value, WR_READING, __ATOMIC_RELAXED);
  while (ptr & WR_READING) {
    short c = 0;
    while (__atomic_load_n(&ref->Value, __ATOMIC_RELAXED) & WR_READING) {
      if (++c == WR_SPINLIMIT) {
        std::this_thread::yield();
        c -= 1;
      }
    }
    ptr = __atomic_fetch_or(&ref->Value, WR_READING, __ATOMIC_RELAXED);
  }

  auto object = (HeapObject*)(ptr & ~WR_NATIVE);
  if (object == nullptr) {
    __atomic_store_n(&ref->Value, (uintptr_t)nullptr, __ATOMIC_RELAXED);
    return nullptr;
  }
  if (object->refCount.isDeallocating()) {
    __atomic_store_n(&ref->Value, (uintptr_t)nullptr, __ATOMIC_RELAXED);
    SWIFT_RT_ENTRY_CALL(swift_unownedRelease)(object);
    return nullptr;
  }
  auto result = swift_tryRetain(object);
  __atomic_store_n(&ref->Value, ptr, __ATOMIC_RELAXED);
  return result;
}

unowned retain, release에 비해 엄청나게 많은 연산이 들어가는 것으로 보이고, 또 이런 연산들은 모두 multi-thread 환경에서 safe하게 동작해야 하기 때문에 unowned에 비해 많은 오버헤드를 낳게 된다.

결론

무조건 weak를 쓰기보단, unowned를 사용할 수 있는 환경이면 unowned를 사용하는게 성능 상 이점을 가져온다!
또한 weak의 경우, object에 대한 직접 참조가 아니라 side table에 대한 참조를 하기 때문에, weak ref가 남아있어도 해당 메모리는 할당 해제될 수 있다.
strong ref count가 0이 되도 unowned reference가 남아있다면 object는 deinitialize 될 뿐 메모리에서 해제되지는 않는다.
다만 원래 인스턴스가 메모리에서 해제되었는데 unowned reference로 접근한다면, dangling pointer에 접근하는 것과 똑같다.
weak 는 side table에 대한 참조를 하기 때문에 원래 object가 메모리에서 해제되더라도 dangling pointer가 아니라 nil이 된다.