SGD and GD on LR
We provide you with a dataset of handwritten digits1 that contains a training set of 60000 examples and a test set of 2018 examples (“hw4 lr.zip”). Each image in this dataset has 28 × 28 pixels and the associated label is the handwritten digit—that is, an integer from the set {0, 1, · · · , 9}—in the image. In this exercise, you need to build a logistic regression classifier to predict if a given image has the handwritten digit 9 in it or not. You can use your favorite programming language to finish this exercise.
(a) Choose a proper normalization method to process the data matrix. Please report the normalization method you use.
(b) Find a Lipschitz constant of ∇L(w), where L(w) is the objective function of the logistic regression after normalizing and w is the model parameter to be estimated. Please report your result.
(a) Use GD and SGD to train the logistic regression classifier on the training set, respectively. Evaluate the classification accuracy on the training set after each iteration. Stop the iteration when Accuracy ≥ 90% or total steps are more than 5000. Please plot the accuracy of these two classifiers (the one trained by GD and the other trained by SGD) versus the iteration step on one graph.
(b) Compare the total iteration counts and the total time cost of the two methods (GD and SGD), respectively. Please report your result.
(c) Compare the confusion matrix, precision, recall and F1 score of the two classifiers (the one trained by GD and the other trained by SGD). Please report your result.
(a) The training set is imbalanced as the majority class has roughly ten times more images than the minority class. Imbalanced data can hurt the performance of the classifiers badly. Thus, please undersample the majority class such that the numbers of images in the two classes are roughly the same.
(b) Use GD to train the logistic regression classifier on the new training set after undersampling. Stop the iteration when Accuracy ≥ 90% or total steps are more than 5000.
(c) Evaluate the two classifiers (the one trained with GD on the original training set and the other trained on the new training set after undersampling) on the test set. Compare the confusion matrix, precision, recall and F1 score of the two classifiers. Please report your result.
Solution
1.
(a).
First ,the original data point is a pixel (28*28), I consider flattening all the graphs and then I get a vector (784,1).After this transformation, the input training data becomes a matrix with size (60000,784)
Since all the pixel values are in the range [0,255], I just divide the data matrix X by 255 , and then insert one column (1,1,1,…,1).T into the matrix.
(b)
Since hi<=1 , we can get that
As total function we use is L(w)/n, thus the Lipschitz constant can get is :
The result we get is
1 | 9.77436284224097 |
2.
Since in SGD it’s easy to jump to an extreme point(it’s easy to achieve high accuracy such as 0.9, but it performs rather bad on the test sets), I let the stopping point be p > 0.925.
The output is as follows:
1 | the call GD_of_LR(): |
We can see that SGD runs much faster than GD . Their effect on the training set is not much different(accuracy, precision, recall,F1_score).
The iteration plot is as follows. The accuracy of GD increase almost all the time, but the accuracy of SGD fluctuates a lot.
3.
In this session, I choose all the 5949 data points whose label is 9 and I randomly select other 5949 samples . Finally I use all the about 12000 data points to train my logistic regression model. The results I get are as follows:
1 | the call GD_of_LR(): |
In this training model, its step size is small, and it cost a lot steps. From the point of the time cost , SGD is still rather faster than GD, and its steps are about five times of GD. We can see that SGD is really faster than GD in real application, but it’s not stable which means that SGD will not converge monotonously to the best solution (or even worse it won’t), but it’s effect is usually very good and we don’t want a overfitting model either.
The plot of p with iterations are as follows:
Codes
1 | import numpy as np |