1 min readOct 3, 2019
Maximizing the width between the samples was our idea for finding the optimization, so, width is maximum when w is lowest, thus our idea is to minimize 1/2 * (W)**2. Now in ML problems we use square loss and not the residual. The reason needs another blog post but, if you are interested, you can read a fairly good answer in here.
Regards!