- 
In this settings, they have 
- gated setting: only one task in a time, no output in other task
- in code, they ignore loss from other tasks
 
- orthogonal setting: only one task in a time, zero output in other task
- parallel setting: multiple tasks in a time, no constraint in other task
 
- 
Compared to gated settings, orthogonal and parallel setting allow ‘interaction btw tasks’  
- 
task examples
 - from the top, fixed point, limited cycles, line-, plane- attractor (left) and attractors (right)
  
 
- 
They trained two tasks in gated mode (sky blue) and orthogonal mode (blue)
  
 
- 
Especially in gated mode, they tend to share the attractors —  simplicity bias (why? below) 
- A : gated mode resulted in a complete overlap between the tasks (even diff tasks), the opposite with orthogonal settings
- B: linear classifier to separate the neural trajectories (2) — failed in gated, success in ortho
- C: ratio variances between and within tasks (F-factor) — shared in gated, separated in ortho
- D: when we look at the spectrum of connectivity matrix (W_recurrent) - lambda
- the number of unstable eigenvalues was larger in the orthogonal settings
- oh, in orthogonal settings, there are multiple fixed points! hey then how it emerges???
 
 
- 
Next question is, what is the origin of this simplicity bias?? 
- 
consider the two task needs two fixed points each (4 points) 
- In Gated mode: 2 shared 2 shared
- the recurrent dynamics cause the states to mostly depend on the “task agnostic attractors”, and less on the task-specific inputs
 
- In orthogonal mode: 4 all different
- two attractors have to be orthogonal to each other, forcing the network to separate them
- because the network architecture make the other zero when one is training
 
 
- 
ok let’s track this hypothesis by following the attractor landscape of networks (low-rank) [13]
  
 
- 
with this projection (approximation), they followed the evolution of the dynamics in parallel settings trained on two fixed-point tasks
  
 
- 
in C and D,  
- at epoch0 only origin is stable. (two eigenvalues are in unit circle)
- origin destabilizes, single unstable eigenvalue emerges
- more training, second eigenvalue leaves the unit circle → pair of stable fixed points emerges
 
- 
in B, they repeated this analysis. (skyblue, blue, darkblue - gated, ortho, parallel) 
- in orthogonal, parallel mode, we can see the clear sequential emergence of two outliers
- the gated setting is solved with a single outlier, as all tasks share a common attractor