Potentially data augmentation or learning rate decay (lowering the learning rate over time) can help to prevent overfitting.