[Paper Review] - Soft Contrastive Learning for Time Series (SoftCLT), ICLR 2024

이승규·2025년 3월 9일

[Paper Review]

목록 보기

1/4

[Main Contribution]

시계열 데이터에 특화된 Soft Contrasive Learning 전략을 제안
다양한 다운스트림 작업에서 기존 contrasive learning 방법론 모델들보다 좋은 성능을 보임
plug - and - play 방식이기 때문에 다른 모델 프레임워크에 쉽게 적용시킬 수 있음. (모듈화)

기존 Contrasive Learning을 활용한 방법론들은 시계열 특성을 반영하기 위해 다양한 시도를 함.

instance - wise
Temporal
Hierarchical

그러나 시계열 특성을 고려하면서 hard assignment에서 soft assignment로 바꾼 시도는 없었음

이 논문에서 제안한 SoftCLT는 instance-wise, Temporal, Hierarchical, Soft assignment를 모두 고려함

또한, SoftCLT는 대부분의 Contrasive learning 들이 Embedding space 상에서 유사도를 측정하는 반면에

시계열 데이터는 Data space 상에서 유사도를 비교하는것이 효과적이라고 주장

[SoftCLT overview]

기존 Hard assignment경우, positive는 1, negative는 0으로 판별

문제점 : 임의로 threshold에 대해서 데이터가 강제적으로 분리되면서 모호함에 대한 문제

Soft assignment의 경우, 시계열 특성을 반영하여 유사도를 측정하고, 보다 더 풍부한 정보를 학습할 수 있도록 유도(모호한 부분이 없도록 유도할수 있다. )

(시계열 특성을 반영하기 위한 두가지 방법론 제시)

instance-wise CL : 시계열 데이터 간의 거리를 기반으로 인스턴스 간의 관계를 학습
Temporal CL : timestamp간의 차이를 기반으로 동일 시계열 내의 시간적 관계를 학습

[논문 Definition]

$\text{Non-linear embedding function} \Rightarrow f_{\theta}: x \to r$

비선형 임베딩 함수 $fθ$ **를 학습하여 임베딩 벡터** $r_i$ 를 생성하는 것이 목표

$\text{Time series data} \Rightarrow X = \{x_1, ..., x_N\}; \quad N: batch\_num, \quad x_i \in \mathbb{R}^{T \times D}$

$\text{Embedding vector} \Rightarrow r_i = [r_{i,1}, ..., r_{i,T}]^\top \in \mathbb{R}^{T \times M}$

$T: \text{sequence length}$

$D: \text{input feature dimension}$

$M: \text{embedded feature dimension}$

1. [Soft Instance-wise Contrasive Learning]

$w_I(i, i') = 2\alpha \cdot \sigma(-\tau_I \cdot D(x_i, x_{i'}))$ → Soft assignment Definition

$\sigma(a) \text{는 시그모이드 함수}$

$\tau_l \text{는 assignment의 sharpness를 조절하는 하이퍼파라미터}$

$\alpha \text{는 } [0,1] \text{ 범위의 soft assignment의 상한값}$
- Soft Assignment의 최대값을 제한하는 역할
$D(x_i, x_{i'}) \text{는 시계열 데이터 } x_i \text{와 } x_{i'} \text{ 간의 거리}$
- 두 샘플이 얼마나 유사한지 측정하는 거리 함수 (DTW, Euclidean distance, cosine distance, TAM(time allignment measurement)).

(positive든 negative든 sampling을 하기위해서는)

Argumentation을 통해 시계열 데이터를 증가하고, 아래를 정의를 가정

Contrasive learning은 cross entropy loss로 학습가능

따라서 유사도 학습을 위해 softmax 수식을 정의

cf) InfoNCE loss
Contrasive learning에서 주로 쓰이는 loss function

💡 Soft CLT 같은 경우는 데이터 Space 상에서 유사도를 비교하기 떄문에 따로 임베딩을 시키지 않는점이 InfoNCE loss와의 차이

loss function

$\ell_{I}^{(i,t)} = -\log p_I((i, i+N), t) - \sum_{j=1, j \neq \{i,i+N\}}^{2N} w_I(i, j \mod N) \cdot \log p_I((i, j), t)$

첫번째 Term은 instance i와 positive pair i+N의 유사성을 나타내는 소프트맥스 확률
- 첫번째 텀은 positive pair의 loss를 나타내고, 두번째 텀은 positive pair 제외하고 나머지 pair들의 loss를 soft assignment값으로 가중하여 계산하는 텀
- 이것을 통해 positive pair는 가까워지게 나머지 pair멀어지게 학습

[Soft Temporal Contrastive Learning]

$w_T(t, t') = 2 \cdot \sigma(-\tau_T \cdot |t - t'|)$ → Soft Assignment Definition

$\sigma(a) \text{는 시그모이드 함수}$

$\tau_l \text{는 assignment의 sharpness를 조절하는 하이퍼파라미터}$

$|t - t'| \text{는 두 타임스탬프 간의 차이}$

Argumentation을 통해 시계열 데이터를 증가하고, 아래를 정의를 가정

Contrasive learning은 cross entropy loss로 학습가능

따라서 유사도 학습을 위해 softmax 수식을 정의

[Soft Temporal Contrastive Learning]

TS2Vec 논문의 hierarchical contrastive loss 차용
- hierarchical contrastive loss , hierarchical representation 은 시계열 데이터의 복잡한 패턴, 구조를 효과적으로 학습할 수 있도록 함.
Max pooling을 통해 각 타임스탬프를 통합
Depth가 깊어질수록 그 의미가 점점 모호해지기 때문에 dissimiliarity가 증가함.
이러한 특성을 토대로 soft assignment를 조절하는 $\tau_T$ 를 조정
- sharpness가 낮을수록 완만하게 assignment
이런식으로 계측정 표현 특징들을 잘 학습할 수 있도록 loss function을 구성

loss function

$\ell_{T}^{(i,t)} = -\log p_T(i, (t, t+T)) - \sum_{s=1, s \neq \{t,t+T\}}^{2T} w_T(t, s \mod T) \cdot \log p_T(i, (t, s))$

최종 손실 함수 정의

$L = \frac{1}{4NT} \sum_{i=1}^{2N} \sum_{t=1}^{2T} (\lambda \ell_{I}^{(i,t)} + (1 - \lambda) \ell_{T}^{(i,t)})$
```
람다 1-람다는 각각의 가중치를 나타냄
```

실험결과

SCL자체가 plug and play 방식이기 때문에 기존의 CL 모델에 이걸 모듈처럼 넣어서 성능이 얼마나 높아졌는지를 확인 가능

classification

UCR : 단변량 시계열 벤치마크 데이터셋

UEA : 다변량 시계열 분류를 위한 벤치마크 데이터

TS2Vec(2022)은 hierarchical contrastive loss를 도입한 논문

Semi & Self-supervised classification

1% label 데이터를 사용한 경우 실험
5% label 데이터를 사용한 경우 실험

in & Cross domain transfer learning

Anomaly Detection

Conclusion

이 연구는 시계열 데이터 특성을 고려하여 Soft Contrasive Learning을 적용
plug and play 방식으로 구현해서 다른 프레임워크에 자유롭게 적용가능
- 확실히 기존 contrastive learning 방법론들보다 시계열 task에서 뛰어난 성능을 보이는것은 맞으나, 지금 현 시점에서는 SOTA인지는 확인이 필요함.

이승규

Self supervised Learning, Time Series, Multimodal Learning

다음 포스트