Regularization

Last modified by Nikita Kapchenko on 2019/10/16 00:51

Big weights is bad.

Why? Recall sigmoid function, if huge weights, summatory function would be also big => sigmoid derivative near 0 => no learning at all

So we want to penalize our network from learning huge weights. How we can do it?

L2 penalization

We can ask network to minimize the squares of weights in addition to target function

1571169726118-669.png

So now the "direction" to learn becomes 

Unknown macro: formula. Click on this message for details.

we asked our network to penalize big weights, here we are, the learning process now is:

Unknown macro: formula. Click on this message for details.

So regardless the direction of gradient, the weights always decrease. 

So for succesfull learning the gradient should overtake the weights decreasing.

L1 Penalization

We can also ask our network to minimize the weights themself. In this case

Unknown macro: formula. Click on this message for details.

 L1 PenalizationL2 Penalization
Learningconsatntly decrease weight by \lambda (regardless its value)Lineary decrease weight by its value
Choice of weightsthe weights are choosen by their importanceAll weights can almost freely improve the results (see pict.)
late trainingAll weights adjustment are equally penalized: less important weights remain zero while more important weights are increasedBig weights are **2 penalized: network would rather try to play with less important weights than continue to increase the more important but already big weight

1571179390804-123.png

Red L2 Regularization allows all weights to push off from zero by their gradient directions and thus improve the target function.

But then punishes hard the large weights

Green L1 Regularization remains constantly to the change of both small and large weights.

L1 & L2 Mix

The mix of two could train the important weights and allow less important weights to be slightly improved too.

You should think of the choice of two hyper parameters lambdaL1 and lambdaL2.

Now the free increasing of L2 weights will not be passed bu L1.