Foundation model in Time-series

진서연 ·2024년 2월 28일
0

Paper Reading

목록 보기
23/24
post-thumbnail

시계열 분야의 foundation model은 어떤 데이터셋으로 model을 pre-train했는지를 기준으로 나눌 수 있다.
1. LLM pre-trained model : 시계열이 아닌 데이터로 학습이 된 모델을 시계열 데이터에 맞게 fine-tuning하는 모델. (Time-LLM, LLM4TS, GPT2, UniTime, TEMPO)
2. TS pre-trained model : 시계열 데이터로부터 학습한 모델.

이번 시리즈 포스팅에서는 2번째 케이스에 대한 3가지 모델을 간단하게 비교한다.

Motivation and Contribution

1. TimeGPT

  • The first time-series forecasting foundation model.
  • Reduce uncertainty
  • TimeGPT zero-shot inference excels in performance, efficiency, and simplicity.
  1. TimesFM
  • Train real-world dataset and synthetic dataset
  • Pertaining a decoder style attention with input patching
  • TimesFM zero-shot forecasts across different domains, forecasting horizons.
  1. Lag-Llama
  • Foundation model for univariate probabilistic time-series forecasting
  • Strong zero-shot generalization capabilities
  • Strong few-shot adaptation performance of Lag-Llama

Architecture
1. TimeGPT

  • Transformer-based time series model with self-attention mechanism.
  • Input = target variables, exogenous variables
  • Rely on conformal predictions based on historic errors to estimate prediction intervals.
  1. TimesFM
  • MLP \rightarrow Token + PE \rightarrow Decoder-only model \rightarrow MLP
  • Input = Patched data
  • Decoder only based model (similar to LLM)
  1. Lag-Llama
  • MLP \rightarrow M-decoder layers \rightarrow Distribution head
    - Lag-Llama incorporates pre-normalization via RMSNorm, Rotary Positional Encoding (As in LLaMA architecture)
  • Input = lag-featured inputs (token of a univariate time series, additional covariates (시차 정보 포함) )
  • Distribution head : student’s t-distribution (Future work for distribution heads) \rightarrow wants to stay simple architecture

Foundation model for time-series

  1. TimeGPT
  • P(y[t+1:t+h]y[0:t],x[0:t+h])=fθ(y[0:t],x[0:t+h])P(y_{[t+1:t+h]}|y_{[0:t]},x_{[0:t+h]}) = f_{\theta}(y_{[0:t]},x_{[0:t+h]})
  • hh = forecast horizon, yy = target time series, xx = exogenous covariates.
  1. TimesFM
  • f:y[1:L]y^[L+1:L+h]f : y_{[1:L]} \rightarrow \hat{y}_{[L+1:L+h]}
  • hh = forecast horizon, yy = target time series, LL = input window size
  1. Lag-Llama
  • Pϕ(y[C+1:C+h]iy[1:C]i,c[1:C+h]i;θ)P_{\phi}(y^i_{[C+1:C+h]}|y^i_{[1:C]},\mathbf{c}^i_{[1:C+h]};\theta)
  • hh = forecast horizon, yy = target time series, c\mathbb{c} = time lag as covariates, CC = context window
    - C+1부터인 이유 : input 시계열의 전부를 보기 보다는, fixed context windows만 봄.

Training method

  1. TimeGPT
  • Train to minimize forecasting error
  • Conformal prediction을 이용해서는 불확실성만 측정함.
  1. TimesFM
  • Train loss = 1Nj=1NMSE(y^[pj+1:pj+h],y[pj+1:pj+h])\frac{1}{N}\sum^N_{j=1} MSE(\hat{y}_{[pj+1:pj+h]},y_{[pj+1:pj+h]})
  • For probabilistic forecasting, 각 output patch에 대해 여러 출력 헤드를 가질 수 있으며, 각 헤드는 별도의 quantile loss를 최소화할 수 있습니다.
  1. Lag-Llama
  • Train to minimize the negative log-likelihood of the predicted distribution of all predicted timesteps.
  • Self-supervised learning

Training dataset

  1. TimeGPT
  • 100 billion data points
  • Finance, economics, demographics, healthcare, weather, IoT sensor, energy, web traffic, sales, transport and banking
  1. TimesFM
  • Real world data (40%) + synthetic data (60%)
  • 각 데이터셋(real world) 마다 약 1 billion 부터 100 billion까지의 data point들로 이루어져있음.
  1. Lag-Llama
  • 6개의 분야로부터 구한 27개의 시계열 데이터 셋
  • 352 million data windows(tokens)
profile
SheoYon.Jhin

0개의 댓글