After implementing a machine learning algorithm the next step is to find out how effective is the model based on metric and data sets. Different performance metrics are used to evaluate different Machine Learning Algorithms. In this post, we will cover different types of evaluation metrics available for classification and regression. Confusion matrix precision Recall F1 score Accuracy ROC Curve -AUC Log Loss MSE MAE Cross-Entropy Loss Hinge Loss Confusion Matrix :Confusion Matrix as the name suggests gives us a matrix as output as N X N matrix , where N is the number of classes being predicted. Confusion matrix, also known as an error matrix. This Metric used for finding the correctness and accuracy of the model and even works better for imbalanced data set. Confusion matrix is a table with two dimensions (“Actual” and “Predicted”), and sets of “classes” in both dimensions. Most performance measures are computed from the confusion matrix. confusion matrix for Binary Class: The Confusion matrix in itself is not a performance measure as such, but almost all of the performance metrics are based on confusion matrix and the numbers inside it. True Positives: The cases in which we predicted YES and the actual output was also YES. True Negatives: The cases in which we predicted NO and the actual output was NO. False Positives: The cases in which we predicted YES and the actual output was NO. False Negatives: The cases in which we predicted NO and the actual output was YES. When to minimize what?
Minimizing False positives:
Precision:Precision is defined as the number of true positives divided by the number of true positives plus the number of false positives. It tells us what proportion of predictions which are predicted as positives that are actual positives to the total predicted positives. Recall:Recall is defined as the number of true positives divided by the number of true positives plus the number of false negatives. It tells us what proportion of predictions which are predicted as positives that are actual positives to the total positives in the original data set. High recall, low precision: Indicates that most of the positive examples are correctly recognized (low FN) but there are a lot of false positives. Low recall, high precision: Indicates that we miss a lot of positive examples (high FN) but those we predict as positive are indeed positive (low FP). When to use Precision and when to use Recall?
F1 score combines precision and recall relative to a specific positive class. It conveys the balance between the precision and the recall and there is an uneven class distribution. F1 score reaches its best value at 1 and worst at 0. Accuracy:we use Accuracy in classification problems and it is the most common evaluation metric. Accuracy is defined as the ratio of the number of correct predictions made by the model over all kinds of predictions made. Disadvantage of accuracy:One of the biggest disadvantage of accuracy is it doesn’t work well when we have an imbalanced data set. it works well only if there are an equal number of samples belonging to each class. Example: Consider that there are 98% samples of class A and 2% of class B in our training set. Then our model can easily get 98% training accuracy by simply predicting every training sample belonging to class A. When the same model is tested on a test set with 60% samples of class A and 40% samples of class B, then the test accuracy would drop down to 60%. Classification Accuracy is great but gives us a false sense of achieving high accuracy. The real problem arises when the cost of misclassification of the minor class samples are very high. If we deal with a rare but fatal disease, the cost of failing to diagnose the disease of a sick person is much higher than the cost of sending a healthy person to more tests or fraud detection. When we need to check or visualize the performance of the multi-class classification problem, we use the AUC(Area Under The Curve) ROC (Receiver Operating Characteristics ) curve. It is one of the most important evaluation metrics for checking any classification model’s performance. It is also written as AUROC(Area Under the Receiver Operating Characteristics). AUC-ROC Curve is plotted with TPR against the FPR where TPR is on y-axis and FPR is on the x-axis.
What is AUC-ROC Curve?
2) Whereas, if we see the last model, predictions are completely overlapping each other and we get the AUC score of 0.5. This means that the model is performing poorly and it is predictions are almost random. How to use AUC ROC curve for a multi-class model?In the multi-class model, we can plot N number of AUC ROC curves for N number classes using one vs ALL methodology. So for example, if you have three classes named X, Y and Z you will have one ROC for X classified against Y and Z another ROC for Y classified against X and Z, and the third one Z classified against Y and X.
. MSE / Quadratic loss / L2 loss . MAE / L1 loss . Mean bias error Classification losses:. Hinge loss . cross-entropy loss / log loss . likelihood loss MSE / Quadratic loss / L2 loss:Mean Squared Error, or MSE loss is the default loss to use for regression problems. Mathematically, it is the preferred loss function under the inference framework of maximum likelihood if the distribution of the target variable is Gaussian. It is the loss function to be evaluated first and only changed if you have a good reason. MSE is calculated as the average of the squared differences between the predicted and actual values. The result is always positive regardless of the sign of the predicted and actual values and a perfect value is 0.0. The squaring means that larger mistakes result in more error than similar mistakes, meaning that the model is punished for making larger mistakes. Mean Absolute Error / L1 Loss:On some regression problems, the distribution of the target variable may be mostly Gaussian but may have many outliers, e.g. large or small values far from the mean value. MAE, on the other hand, is measured as the average sum of absolute difference between predictions and actual observations. Like MSE, this as well measures the magnitude of error without considering their direction. Unlike MSE, MAE needs more complicated tools such as linear programming to compute the gradients. MAE is more robust to outliers since it does not make use of square. Cross-Entropy Loss (Binary Classification):Cross-Entropy is the default loss function to use for binary classification problems. It is intended for use with binary classification where the target values are in the set {0,1}. Mathematically, it is the preferred loss function under the inference framework of maximum likelihood. It is the loss function to be evaluated first and only changed if you have a good reason. Notice that when actual label is 1 (y(i) = 1), second half of function disappears whereas in case actual label is 0(y(i) = 0) first half is dropped off. In short we are multiplying the log of the actual predicted probability for the ground truth class. An important aspect of this is that cross entropy loss penalizes heavily the predictions that are confident but wrong. Hinge Loss(Binary Classification):An alternative to cross-entropy for binary classification problems is the hinge loss function, primarily developed for use with support vector machine (SVM) models. It is intended for use with binary classification where the target values are in the set{-1,1}. The hinge loss function encourages examples to have the correct sign, assigning more error when there is a difference in the sign between the actual and predicted class values. Reports of performance with the hinge loss are mixed, sometimes resulting in better performance than cross-entropy on binary classification problems. |