Coursera: Machine Learning (Week 4) [Assignment Solution] - Andrew NG

1.One-vs-all logistic regression and neural networks to recognize hand-written digits.

                                         Don't just copy paste the code for the sake of completion. 

Make sure you understand the code first.

In this exercise, you will implement one-vs-all logistic regression and neural networks to recognize hand-written digits. Before starting the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.


It consist of the following files:
  • ex3.m - Octave/MATLAB script that steps you through part 1
  • ex3 nn.m - Octave/MATLAB script that steps you through part 2
  • ex3data1.mat - Training set of hand-written digits
  • ex3weights.mat - Initial weights for the neural network exercise
  • submit.m - Submission script that sends your solutions to our servers
  • displayData.m - Function to help visualize the dataset
  • fmincg.m - Function minimization routine (similar to fminunc)
  • sigmoid.m - Sigmoid function
  • [*] lrCostFunction.m - Logistic regression cost function
  • [*] oneVsAll.m - Train a one-vs-all multi-class classifier
  • [*] predictOneVsAll.m - Predict using a one-vs-all multi-class classifier
  • [*] predict.m - Neural network prediction function
* indicates files you will need to complete

lrCostFunction.m :

function [J, grad] = lrCostFunction(theta, X, y, lambda)
  %LRCOSTFUNCTION Compute cost and gradient for logistic regression with 
  %regularization
  %   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
  %   theta as the parameter for regularized logistic regression and the
  %   gradient of the cost w.r.t. to the parameters. 
  
  % Initialize some useful values
  m = length(y); % number of training examples
  
  % You need to return the following variables correctly 
  J = 0;
  grad = zeros(size(theta));
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Compute the cost of a particular choice of theta.
  %               You should set J to the cost.
  %               Compute the partial derivatives and set grad to the partial
  %               derivatives of the cost w.r.t. each parameter in theta
  %
  % Hint: The computation of the cost function and gradients can be
  %       efficiently vectorized. For example, consider the computation
  %
  %           sigmoid(X * theta)
  %
  %       Each row of the resulting matrix will contain the value of the
  %       prediction for that example. You can make use of this to vectorize
  %       the cost function and gradient computations. 
  %
  % Hint: When computing the gradient of the regularized cost function, 
  %       there're many possible vectorized solutions, but one solution
  %       looks like:
  %           grad = (unregularized gradient for logistic regression)
  %           temp = theta; 
  %           temp(1) = 0;   % because we don't add anything for j = 0  
  %           grad = grad + YOUR_CODE_HERE (using the temp variable)
  %
  
  %DIMENSIONS: 
  %   theta = (n+1) x 1
  %   X     = m x (n+1)
  %   y     = m x 1
  %   grad  = (n+1) x 1
  %   J     = Scalar
  
  z   = X * theta;   % m x 1
  h_x = sigmoid(z);  % m x 1 
  
  reg_term = (lambda/(2*m)) * sum(theta(2:end).^2);
  
  J = (1/m)*sum((-y.*log(h_x))-((1-y).*log(1-h_x))) + reg_term; % scalar
  
  grad(1) = (1/m) * (X(:,1)'*(h_x-y));                                    % 1 x 1
  grad(2:end) = (1/m) * (X(:,2:end)'*(h_x-y)) + (lambda/m)*theta(2:end);  % n x 1
  
  % =============================================================
  
  grad = grad(:);
end

oneVsAll.m :


function [all_theta] = oneVsAll(X, y, num_labels, lambda)
  %ONEVSALL trains multiple logistic regression classifiers and returns all
  %the classifiers in a matrix all_theta, where the i-th row of all_theta 
  %corresponds to the classifier for label i
  %   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
  %   logistic regression classifiers and returns each of these classifiers
  %   in a matrix all_theta, where the i-th row of all_theta corresponds 
  %   to the classifier for label i
  
  % num_labels = No. of output classifier (Here, it is 10)
  
  % Some useful variables
  m = size(X, 1);        % No. of Training Samples == No. of Images : (Here, 5000) 
  n = size(X, 2);        % No. of features == No. of pixels in each Image : (Here, 400)
  
  % You need to return the following variables correctly 
  all_theta = zeros(num_labels, n + 1);  
  %DIMENSIONS: num_labels x (input_layer_size+1) == num_labels x (no_of_features+1) == 10 x 401
  
  %DIMENSIONS: X = m x input_layer_size
  %Here, 1 row in X represents 1 training Image of pixel 20x20
  
  % Add ones to the X data matrix
  X = [ones(m, 1) X];   %DIMENSIONS: X = m x (input_layer_size+1) = m x (no_of_features+1)
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: You should complete the following code to train num_labels
  %               logistic regression classifiers with regularization
  %               parameter lambda. 
  %
  % Hint: theta(:) will return a column vector.
  %
  % Hint: You can use y == c to obtain a vector of 1's and 0's that tell you
  %       whether the ground truth is true/false for this class.
  %
  % Note: For this assignment, we recommend using fmincg to optimize the cost
  %       function. It is okay to use a for-loop (for c = 1:num_labels) to
  %       loop over the different classes.
  %
  %       fmincg works similarly to fminunc, but is more efficient when we
  %       are dealing with large number of parameters.
  %
  % Example Code for fmincg:
  %
  %     % Set Initial theta
  %     initial_theta = zeros(n + 1, 1);
  %     
  %     % Set options for fminunc
  %     options = optimset('GradObj', 'on', 'MaxIter', 50);
  % 
  %     % Run fmincg to obtain the optimal theta
  %     % This function will return theta and the cost 
  %     [theta] = ...
  %         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
  %                 initial_theta, options);
  %
  
  initial_theta = zeros(n+1, 1);
  options = optimset('GradObj', 'on', 'MaxIter', 50);
  
  for c=1:num_labels
  all_theta(c,:) = ...
           fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
                   initial_theta, options);
  end
  
  % =========================================================================
end

predictOneVsAll.m :

function p = predictOneVsAll(all_theta, X)
  %PREDICT Predict the label for a trained one-vs-all classifier. The labels
  %are in the range 1..K, where K = size(all_theta, 1).
  %  p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
  %  for each example in the matrix X. Note that X contains the examples in
  %  rows. all_theta is a matrix where the i-th row is a trained logistic
  %  regression theta vector for the i-th class. You should set p to a vector
  %  of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
  %  for 4 examples)
  
  m = size(X, 1);     % No. of Input Examples to Predict (Each row = 1 Example)
  num_labels = size(all_theta, 1); %No. of Ouput Classifier
  
  % You need to return the following variables correctly
  p = zeros(size(X, 1), 1);    % No_of_Input_Examples x 1 == m x 1
  
  % Add ones to the X data matrix
  X = [ones(m, 1) X];
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Complete the following code to make predictions using
  %               your learned logistic regression parameters (one-vs-all).
  %               You should set p to a vector of predictions (from 1 to
  %               num_labels).
  %
  % Hint: This code can be done all vectorized using the max function.
  %       In particular, the max function can also return the index of the
  %       max element, for more information see 'help max'. If your examples
  %       are in rows, then, you can use max(A, [], 2) to obtain the max
  %       for each row.
  %
  % num_labels = No. of output classifier (Here, it is 10)
  % DIMENSIONS:
  % all_theta = 10 x 401 = num_labels x (input_layer_size+1) == num_labels x (no_of_features+1)
  
  prob_mat = X * all_theta';     % 5000 x 10 == no_of_input_image x num_labels
  [prob, p] = max(prob_mat,[],2); % m  x 1 
  %returns maximum element in each row  == max. probability and its index for each input image
  %p: predicted output (index)
  %prob: probability of predicted output
  
  %%%%%%%% WORKING: Computation per input image %%%%%%%%%
  % for i = 1:m                               % To iterate through each input sample
  %     one_image = X(i,:);                   % 1 x 401 == 1 x no_of_features
  %     prob_mat = one_image * all_theta';    % 1 x 10  == 1 x num_labels
  %     [prob, out] = max(prob_mat);
  %     %out: predicted output
  %     %prob: probability of predicted output
  %     p(i) = out;
  % end
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  
  %%%%%%%% WORKING %%%%%%%%%
  % for i = 1:m
  %     RX = repmat(X(i,:),num_labels,1);
  %     RX = RX .* all_theta;
  %     SX = sum(RX,2);
  %     [val, index] = max(SX);
  %     p(i) = index;
  % end
  %%%%%%%%%%%%%%%%%%%%%%%%%%
  % =========================================================================
end

predict.m :


function p = predict(Theta1, Theta2, X)
  %PREDICT Predict the label of an input given a trained neural network
  %   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
  %   trained weights of a neural network (Theta1, Theta2)
  
  % Useful values
  m = size(X, 1);
  num_labels = size(Theta2, 1);
  
  % You need to return the following variables correctly 
  p = zeros(size(X, 1), 1);  % m x 1
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Complete the following code to make predictions using
  %               your learned neural network. You should set p to a 
  %               vector containing labels between 1 to num_labels.
  %
  % Hint: The max function might come in useful. In particular, the max
  %       function can also return the index of the max element, for more
  %       information see 'help max'. If your examples are in rows, then, you
  %       can use max(A, [], 2) to obtain the max for each row.
  %
  %DIMENSIONS:
  % theta1 = 25 x 401
  % theta2 = 10 x 26
  
  % layer1 (input)  = 400 nodes + 1bias
  % layer2 (hidden) = 25 nodes + 1bias 
  % layer3 (output) = 10 nodes
  % 
  % theta dimensions = S_(j+1) x ((S_j)+1)
  % theta1 = 25 x 401
  % theta2 = 10 x 26
  
  % theta1:
  %     1st row indicates: theta corresponding to all nodes from layer1 connecting to for 1st node of layer2
  %     2nd row indicates: theta corresponding to all nodes from layer1 connecting to for 2nd node of layer2
  %     and
  %     1st Column indicates: theta corresponding to node1 from layer1 to all nodes in layer2
  %     2nd Column indicates: theta corresponding to node2 from layer1 to all nodes in layer2
  %     
  % theta2:
  %     1st row indicates: theta corresponding to all nodes from layer2 connecting to for 1st node of layer3
  %     2nd row indicates: theta corresponding to all nodes from layer2 connecting to for 2nd node of layer3
  %     and
  %     1st Column indicates: theta corresponding to node1 from layer2 to all nodes in layer3
  %     2nd Column indicates: theta corresponding to node2 from layer2 to all nodes in layer3
      
  a1 = [ones(m,1) X]; % 5000 x 401 == no_of_input_images x no_of_features % Adding 1 in X 
  %No. of rows = no. of input images
  %No. of Column = No. of features in each image
  
  z2 = a1 * Theta1';  % 5000 x 25
  a2 = sigmoid(z2);   % 5000 x 25
 
  a2 =  [ones(size(a2,1),1) a2];  % 5000 x 26
  
  z3 = a2 * Theta2';  % 5000 x 10
  a3 = sigmoid(z3);  % 5000 x 10
  
  [prob, p] = max(a3,[],2); 
  %returns maximum element in each row  == max. probability and its index for each input image
  %p: predicted output (index)
  %prob: probability of predicted output
  
  % =========================================================================
end
  • 一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一
                         Machine Learning Coursera-All weeks solutions [Assignment + Quiz]   click here
                                                                                &
                         Coursera Google Data Analytics Professional Quiz Answers   click here


Have no concerns to ask doubts in the comment section. I will give my best to answer it.
If you find this helpful kindly comment and share the post.
This is the simplest way to encourage me to keep doing such work.


Thanks & Regards,
- Wolf

Comments

Popular posts from this blog

Coursera Google Data Analytics Professional Quiz Answers of all 8 courses.Data Analytics Professional Certificate

Coursera Google Data Analytics Professional Data Analysis with R Programming (Week 5) Quiz Answer-Documentation and reports.