Contents in the post based on the free Coursera Machine Learning course, taught by Andrew Ng.
1. Feature Scaling
When each feature has a similar scale or similar ranges of values, Gradient Descent converges much faster. Therefore, It is possible to converge with fewer iteration.
∴ That is why we use Feature Scaling.
x1 = size (0-2000) x1 = size/2000
x2 = number of bedrooms (1-5) x2 = number of bedrooms/5
And we have to get every feature into approximately a '-1 <= xi <= 1' range.
2. Mean normalization
When we perform Feature Scaling, we divided the maximum value, but also we can also do 'Mean Normalization. Through Mean Normalization we can make features get approximately zero average.
* But we have to except x0 (=1). If we contain x0, we can converge to the mean value of zero.
Comments