Choice of target function
Last modified by Nikita Kapchenko on 2019/10/16 14:26
Target function is what we want from network, the way it should learn, what it should maximize, minimize, keep constant etc.Even if we have some strange request, we can implement it to target function and our network will learn what we want.
Quadratic (minimize the distance between outputs and real answers)
The classic target function is sum of squares.
Unknown macro: formula. Click on this message for details.
Cross-entropy (maximize the likelyhood of the outputs)
Unknown macro: formula. Click on this message for details.
Compare
Quadratic | Cross-entropy | |
extreme cases:
| gradient -> 0 in all cases for all layers. No learning. | For output weights (or for network with 1 layer only):
For other weights from hidden layers, gradient -> 0 as we still have to multiply by σ(z) (1 - σ(z)) ->0, z->inf |