S→cS is defined as P(y^∣x^,S)
objective is to return y such that argymaxP(y∣x^,S)
Resulting in, y^=i=1∑ka(x^,xi)yi is non-parametric(memory depens on support size k),
where a(x^,xi)=ec(f(x^),g(xi))/∑j=1kec(f(x^),g(xi)) and c(x,y) being the cosine distance between x and y