The three categories to consider when evaluating algorithms are ease of understanding accuracy and

Confusion Matrix :

When to minimize what?

  1. We know that there will be some error associated with every model that we use for predicting the true class of the target variable. This will result in false Positives and False Negatives(i.e Model Classifying things incorrectly as compared to the actual class).
  2. There’s no hard rule that says what should be minimized in all the situations. It purely depends on the business needs and the context of the problem you are trying to solve. Based on what, we might want to minimize either False positives or False Negatives.

Precision:

Recall:

When to use Precision and when to use Recall?

  1. It is clear that recall gives us information about a classifier’s performance with respect to false negatives(how many did we miss), while precision gives us information about its performance with respect to false positives(how many did we caught).
  2. Precision is about being precise. So even if we managed to capture only one cancer case, and we captured it correctly, then we are 100% precise.
  3. Recall is not so much about capturing cases correctly but more about capturing all cases that have “cancer” with the answer as “cancer”. So if we simply always say every case as “cancer”, we have 100% recall.
  4. So basically if we want to focus more on minimizing False Negatives, we would want our recall to be as close to 100% as possible without precision being too bad and if we want to focus on minimizing False positives, then our focus should be to make Precision as close to 100% as possible.

Accuracy:

Disadvantage of accuracy:

Image result for AUC-ROC curve

What is AUC-ROC Curve?

  1. AUC-ROC curve is a performance measurement for the classification problem at various thersholds settings.
  2. ROC is a probability curve and AUC represents the degree or measure of separability.
  3. ROC (Receiver Operating Characteristic) curve tells us about how good the model can distinguish between two things(e.g. If a patient has a disease or no).
  4. Better models can accurately distinguish between the two. Whereas, a poor model will have difficulties in distinguishing between the two.
  5. Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s. By analogy, Higher the AUC, better the model is at distinguishing between patients with the disease and no disease
Image result for AUC-ROC curveImage result for AUC-ROC curve
  1. AS we see, the first model does quite a good job of distinguishing the positive and negative values. Therefore, there the AUC score is 0.9 as the area under the ROC curve ios large.

How to use AUC ROC curve for a multi-class model?

  1. Loss functions are a type of methods to evaluate how well your algorithm models your dataset. If your predictions are totally off, your loss function will output a higher number. If they’re pretty good , it’ ll output a lower number. As you change pieces of your algorithm to try and improve your model, your loss function will tell your are getting anywhere.
  2. Gradually with help of some optimization function like Gradient Descent, loss function learns to reduce error in prediction.
  3. There’s no one-size-fits-all loss function to algorithms in machine learning. There are various factors involved in choosing a loss function for specific problem such as type of machine learning algorithm chosen, ease of calculating the derivatives and to some degree the percentage of outliers in the data set.
  4. We can classify loss functions into two major categories depending upon the type of learning task we are dealing with like Regression losses or Classification losses,
  5. In classification, we are trying to predict the output from a set of finite categorical values.
  6. In Regression, on the other hand, deals with predicting a continuous value.

Classification losses:

MSE / Quadratic loss / L2 loss:

Mean Absolute Error / L1 Loss:

Image result for mean absolute error

Cross-Entropy Loss (Binary Classification):

Hinge Loss(Binary Classification):