https://research.fb.com/wp-content/uploads/2017/06/imagenet1kin1h5.pdfOur goal is to use large minibatches in place of smallminibatches while mai
small-batch methods consistently converge to flat minimizers, and our experiments support a commonly held view that this is due to theinherent noise i
논문 링크그래프 형태의 데이터에 attention 메커니즘을 적용하는 모델이다attention mechanism 트랜스포머처럼 내적으로 계산하지는 않고, feed-forward neural network 이라고 볼 수 있다. weight matrix 를 노드의 임베딩