How do we choose which feature to use at which node?
We use the concept of entropy to predict the impurity of the corresponding data set.
When do we stop splitting?
When a node is 100% class
When splitting a node will result in exceeding the maximum depth
When the improvement is below threshold.
If the number of examples in the node is below threshold.
Since on the right now there are 2/3 dogs, and the number of examples is small, we decide that it is not cat.