-
In this settings, they have
- gated setting: only one task in a time, no output in other task
- in code, they ignore loss from other tasks
- orthogonal setting: only one task in a time, zero output in other task
- parallel setting: multiple tasks in a time, no constraint in other task
-
Compared to gated settings, orthogonal and parallel setting allow ‘interaction btw tasks’
-
task examples
- from the top, fixed point, limited cycles, line-, plane- attractor (left) and attractors (right)
-
They trained two tasks in gated mode (sky blue) and orthogonal mode (blue)
-
Especially in gated mode, they tend to share the attractors — simplicity bias (why? below)
- A : gated mode resulted in a complete overlap between the tasks (even diff tasks), the opposite with orthogonal settings
- B: linear classifier to separate the neural trajectories (2) — failed in gated, success in ortho
- C: ratio variances between and within tasks (F-factor) — shared in gated, separated in ortho
- D: when we look at the spectrum of connectivity matrix (W_recurrent) - lambda
- the number of unstable eigenvalues was larger in the orthogonal settings
- oh, in orthogonal settings, there are multiple fixed points! hey then how it emerges???
-
Next question is, what is the origin of this simplicity bias??
-
consider the two task needs two fixed points each (4 points)
- In Gated mode: 2 shared 2 shared
- the recurrent dynamics cause the states to mostly depend on the “task agnostic attractors”, and less on the task-specific inputs
- In orthogonal mode: 4 all different
- two attractors have to be orthogonal to each other, forcing the network to separate them
- because the network architecture make the other zero when one is training
-
ok let’s track this hypothesis by following the attractor landscape of networks (low-rank) [13]
-
with this projection (approximation), they followed the evolution of the dynamics in parallel settings trained on two fixed-point tasks
-
in C and D,
- at epoch0 only origin is stable. (two eigenvalues are in unit circle)
- origin destabilizes, single unstable eigenvalue emerges
- more training, second eigenvalue leaves the unit circle → pair of stable fixed points emerges
-
in B, they repeated this analysis. (skyblue, blue, darkblue - gated, ortho, parallel)
- in orthogonal, parallel mode, we can see the clear sequential emergence of two outliers
- the gated setting is solved with a single outlier, as all tasks share a common attractor