Choice of target function

Last modified by Nikita Kapchenko on 2019/10/16 14:26

Target function is what we want from network, the way it should learn, what it should maximize, minimize, keep constant etc.Even if we have some strange request, we can implement it to target function and our network will learn what we want.

Quadratic (minimize the distance between outputs and real answers)

The classic target function is sum of squares.

Unknown macro: formula. Click on this message for details.

Cross-entropy (maximize the likelyhood of the outputs)

Unknown macro: formula. Click on this message for details.

[+] Cross-entropy

Compare

	Quadratic	Cross-entropy
extreme cases: big input big weights	gradient -> 0 in all cases for all layers. No learning.	For output weights (or for network with 1 layer only): if output == data, gradient -> 0. No learning, but output == data if output != data (miss classify) the gradient would NOT -> 0, instead it would be >>1 and we would quickly jump to the good weights. For other weights from hidden layers, gradient -> 0 as we still have to multiply by σ(z) (1 - σ(z)) ->0, z->inf

Backpropagation formulas
target	cross-entropy
activation	sigmoid
summatory	sum

Choice of target function

Quadratic (minimize the distance between outputs and real answers)

Cross-entropy (maximize the likelyhood of the outputs)

Compare

Navigation