If you aren't careful, training can bottleneck on reading data and transferring to GPU!
Solutions:
Read all data into RAM
Use SSD instead of HDD
Use multiple CPU threads to prefetch data
The point of deep learning frameworks
(1) Easily build big computational graphs
(2) Easily compute gradients in computational graphs
(3) Run it all efficiently on GPU (wrap cuDNN, cuBLAS, etc)
numpy를 사용했을 때의 문제점
(1) can't run on GPU
(2) Have to compute our own gradients