Choice of target function

Last modified by Nikita Kapchenko on 2019/10/16 14:26

Target function is what we want from network, the way it should learn, what it should maximize, minimize, keep constant etc.Even if we have some strange request, we can implement it to target function and our network will learn what we want.

Quadratic (minimize the distance between outputs and real answers)

The classic target function is sum of squares.

Unknown macro: formula. Click on this message for details.

Cross-entropy (maximize the likelyhood of the outputs)

Unknown macro: formula. Click on this message for details.

Compare

 QuadraticCross-entropy

extreme cases:

  1. big input
  2. big weights
gradient -> 0 in all cases for all layers. No learning.

For output weights (or for network with 1 layer only):

  • if output == data, gradient -> 0. No learning, but output == data 
  • if output != data (miss classify) the gradient would NOT -> 0, instead it would be >>1 and we would quickly jump to the good weights. 

For other weights from hidden layers, gradient -> 0 as we still have to multiply by σ(z) (1 - σ(z)) ->0, z->inf