Softmax is userd for multinominal classification.
The number of classifiers of the classification comes from the number of the neurons in the last layer.
And this is the number of the outputs as probability.
In Tensorflow, Softmax is a kind of Activation-Function.
Each output for the neuron of the last layer can be recognized as a logit.
In softmax, the logit vector is converted to the probability vector. And the sum of the element of the probability vector equals 1.
For binary classification, there are two ways. One is to use Sigmoid with the last one neuron, another is to use Softmax with the last two neurons.
Actually, there is only Affine-Function and no Activation-Function at output layer. Instead of that, Softmax converts the vector z to the vector p
In conclusion, Softmax is converter from logits to probabilities, Sigmoid is converter from a logit to aprobability.