Since a (and thus a^2) doesn't change..., losses do not decrease by overlapping. torch.argmax(input) : returns the indices of the maximum value of all elements in the input tensor.