Coursera: Machine Learning (Week 6) Quiz - Machine Learning System Design

2.Machine Learning System Design.

Don't just copy & paste for the sake of completion. The solutions uploaded here are only for reference.They are meant to unblock you if you get stuck somewhere.Make sure you understand first.

You are working on a spam classification system using regularized logistic regression. “Spam” is a positive class (y = 1) and “not spam” is the negative class (y = 0). You have trained your classifier and there are m = 1000 examples in the cross-validation set. The chart of predicted class vs. actual class is:

For reference:
Accuracy = (true positives + true negatives) / (total examples)
Precision = (true positives) / (true positives + false positives)
Recall = (true positives) / (true positives + false negatives)
F1 score = (2 * precision * recall) / (precision + recall)

What is the classifier’s F1 score (as a value from 0 to 1)?
Enter your answer in the box below. If necessary, provide at least two values after
the decimal point.
0.16
Precision is 0.087 and recall is 0.85, so F1 score is (2 * precision * recall) /
(precision + recall) = 0.158.

Suppose a massive dataset is available for training a learning algorithm. Training on a lot of data is likely to give good performance when two of the following conditions hold true.
Which are the two?
- We train a learning algorithm with a large number of parameters (that is able to learn/represent fairly complex functions).
- The features x contain sufficient information to predict accurately. (For example, one way to verify this is if a human expert on the domain can confidently predict when given only ).
- When we are willing to include high order polynomial features of (such as $x_1^2, x_2^2, x_1x_2$ , etc.).
- We train a learning algorithm with a small number of parameters (that is thus unlikely to overfit).
- We train a model that does not use regularization.
- The classes are not too skewed.
- Our learning algorithm is able to represent fairly complex functions (for example, if we train a neural network or other model with a large number of parameters).
- A human expert on the application domain can confidently predict y when given only the features x (or more generally we have some way to be confident that x contains sufficient information to predict y accurately)

Suppose you have trained a logistic regression classifier which is outputing $h_\theta(x)$ .

Currently, you predict 1 if $h_\theta(x) \geq threshold$ , and predict 0 if $h_\theta(x) < threshold$ , where currently the threshold is set to 0.5.

Suppose you increase the threshold to 0.9. Which of the following are true? Check all that apply.
- The classifier is likely to have unchanged precision and recall, but higher accuracy.
- The classifier is likely to now have higher recall.
- The classifier is likely to now have higher precision.
- The classifier is likely to have unchanged precision and recall, and thus the same F1 score.
- The classifier is likely to now have lower recall.
- The classifier is likely to now have lower precision.

Suppose you have trained a logistic regression classifier which is outputing $h_\theta(x)$ .

Currently, you predict 1 if $h_\theta(x) \geq threshold$ , and predict 0 if $h_\theta(x) < threshold$ , where currently the threshold is set to 0.5.

Suppose you decrease the threshold to 0.3. Which of the following are true? Check all that apply.
- The classifier is likely to have unchanged precision and recall, but higher accuracy.
- The classifier is likely to have unchanged precision and recall, but lower accuracy.
- The classifier is likely to now have higher recall.
- The classifier is likely to now have higher precision.
- The classifier is likely to have unchanged precision and recall, and thus the same F1 score.
- The classifier is likely to now have lower recall.
- The classifier is likely to now have lower precision.

Suppose you are working on a spam classifier, where spam emails are positive examples (y = 1) and non-spam emails are negative examples (y = 0). You have a training set of emails in which 99% of the emails are non-spam and the other 1% is spam.

Which of the following statements are true? Check all that apply.

A good classifier should have both a high precision and high recall on the cross validation set.
If you always predict non-spam (output y=0), your classifier will have an accuracy of 99%.
If you always predict non-spam (output y=0), your classifier will have 99% accuracy on the training set, but it will do much worse on the cross validation set because it has overfit the training data.
If you always predict non-spam (output y=0), your classifier will have 99% accuracy on the training set, and it will likely perform similarly on the cross validation set.

Which of the following statements are true? Check all that apply.

Using a very large training set makes it unlikely for model to overfit the training data.
After training a logistic regression classifier, you must use 0.5 as your threshold for predicting whether an example is positive or negative.
If your model is underfitting the training set, then obtaining more data is likely to help.
It is a good idea to spend a lot of time collecting a large amount of data before building your first version of a learning algorithm.
On skewed datasets (e.g., when there are more positive examples than negative examples), accuracy is not a good measure of performance and you should instead use F1 score based on the precision and recall.
The “error analysis” process of manually examining the examples which your algorithm got wrong can help suggest what are good steps to take (e.g., developing new features) to improve your algorithm’s performance.
一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一
Machine Learning Coursera-All weeks solutions [Assignment + Quiz] click here
&
Coursera Google Data Analytics Professional Quiz Answers click here

Have no concerns to ask doubts in the comment section. I will give my best to answer it.
If you find this helpful kindly comment and share the post.
This is the simplest way to encourage me to keep doing such work.

Thanks & Regards,
- Wolf

Coursera: Machine Learning - All weeks solutions [Assignment + Quiz] - Andrew NG

June 03, 2021

Coursera Google Data Analytics Professional Quiz Answers of all 8 courses.Data Analytics Professional Certificate

July 20, 2021

Coursera Google Data Analytics Professional Foundations: Data, Data, Everywhere (Week 1) Quiz Answer- Introducing data analytics.

July 02, 2021

Search This Blog

Solution provider