1 min readApr 15, 2019
I think I replied back before but just to make it clear again, the starting point can be random and, apart from the maximization condition I mentioned in the post, a penalizing term is added for misclassification. You can read this paper (http://pyml.sourceforge.net/doc/howto.pdf) by Professor Ben-Hur, which will answer your question comprehensively.
It is quite comparable to the way centroids are decided and optimized with each iteration in K-Means clustering (You can check some tutorial, ex: https://databricks.com/tensorflow/clustering-and-k-means).