[Paper Review] Approximation and Optimization Theory for Linear Continuous-Time Recurrent Neural Networks

JaeHeon Lee, 이재헌·약 18시간 전
0

Paper Review

목록 보기
51/51
post-thumbnail

Approximation and Optimization Theory for Linear Continuous-Time Recurrent Neural Networks

Intro

This is a summary of selected parts from the paper, focused on understanding the "curse of memory," rather than a review of the entire content.

this paper addresses whether continuous-time RNNs can approximate time-dependent functionals Ht(x)H_t(x), which map input signals x(t)x(t) to outputs y(t)y(t). Unlike prior studies, this work emphasizes:

  • The "absence" of underlying dynamical systems for HtH_t.
  • The necessity of memory decay for approximation.

Problem Formulation

  • Initially described in a discrete setting with equations (1) and (2), the discussion transitions into the continuous setting.
  • The continuous RNN dynamics are expressed in equation (17), leading to the representation in equation (18).

Universal Approximation Theorem (Theorem 7)

  • Using the Riesz-Markov-Kakutani theorem, the existence of HtH_t is shown through its unique association with a measure μt\mu_t.

  • The kernel representation Ht(x)=0xtsρ(s)dsH_t(x) = \int_0^\infty x^\top_{t-s} \rho(s) \, ds (equation (23)) is central, where ρ\rho dictates smoothness and decay properties of input-output relationships --> convolution.

  • equation (23) underscores kernel ρ(t)\rho(t)'s role in input-output convolution

  • The quality of approximation depends on how well ρ(t)\rho(t) can be represented by exponential sums.

  • The eigenvalues of WW with Re(λ)<0\text{Re}(\lambda) < 0 ensure system stability.

Approximation Rates (Theorem 10) and Inverse Approximation Theorem (Theorem 11)

  • I couldn't understand the full proof process...
  • (yet,) this provides bounds for approximating functionals under smoothness (α\alpha) and decay (β\beta) conditions, leading to approximations via width-mm RNNs with bounded error rates (equation (27)).
  • In other words, although it may seem complex, the function is α\alpha-smooth (continuous and differentiable), and its derivatives are controlled by a decaying rate of β\beta as in (25). When the decaying rate and the smoothness are related as shown in (26), width-mm RNN functionals HtH_t can approximate the target under these bounds.

  • demonstrates that without memory decay, approximation is infeasible. (amazing)
  • again, i gave up understanding the proof haha..

Curse of Memory

  • When ρ(t)\rho(t) decays slowly (e.g., t(1/ω)\sim t^{-(1/\omega)}), RNNs face exponential model size growth to maintain accuracy
profile
https://jaeheon-lee486.github.io/

0개의 댓글