What is a Confusion Matrix?
A confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known. It allows visualization of the performance of an algorithm, and helps in understanding whether the system is confusing two classes (i.e., commonly mislabeling one as another).
The confusion matrix shows the ways in which your classification model is confused when it makes predictions. It gives insight not only into the errors being made by your classifier but, more importantly, the types of errors that are being made.
Confusion Matrix Structure
A confusion matrix is a table with two dimensions ("Actual" and "Predicted"), and identical sets of "classes" in both dimensions. It compares the actual target values with those predicted by the machine learning model.
The general structure of a confusion matrix for a binary classifier is as follows:
Predicted | ||
---|---|---|
Actual | Positive | Negative |
Positive | True Positive (TP) | False Negative (FN) |
Negative | False Positive (FP) | True Negative (TN) |
The four quadrants of the confusion matrix correspond to the following:
- True Positive (TP): Correctly predicted positive class.
- True Negative (TN): Correctly predicted negative class.
- False Positive (FP): Incorrectly predicted positive class (Type I error).
- False Negative (FN): Incorrectly predicted negative class (Type II error).
Confusion Matrix Example
Suppose we have a binary classification problem where we are predicting whether emails are "Spam" or "Not Spam". We test our classifier on a set of 100 emails:
- 50 emails are actually Spam, and 50 are Not Spam.
- Our classifier predicts 45 emails as Spam, and 55 as Not Spam.
Let's say that:
- Out of the 50 actual Spams, the classifier correctly predicted 40 as Spam (True Positives), and incorrectly predicted 10 as Not Spam (False Negatives).
- Out of the 50 actual Not Spams, the classifier incorrectly predicted 5 as Spam (False Positives), and correctly predicted 45 as Not Spam (True Negatives).
The confusion matrix would be:
Predicted | ||
---|---|---|
Actual | Spam | Not Spam |
Spam | TP = 40 | FN = 10 |
Not Spam | FP = 5 | TN = 45 |
Metrics Derived from the Confusion Matrix
The confusion matrix provides the foundation for calculating a variety of performance metrics. Some of the most commonly used metrics include:
Accuracy
Accuracy is the proportion of the total number of predictions that were correct.
Formula:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Example Calculation:
Accuracy = (40 + 45) / (40 + 45 + 5 + 10) = 85 / 100 = 0.85
Precision
Precision is the proportion of positive identifications that were actually correct. It answers: "What proportion of predicted positives is actually positive?"
Formula:
Precision = TP / (TP + FP)
Example Calculation:
Precision = 40 / (40 + 5) = 40 / 45 ≈ 0.8889
Recall (Sensitivity or True Positive Rate)
Recall is the proportion of actual positives that were correctly identified. It answers: "What proportion of actual positives was correctly classified?"
Formula:
Recall = TP / (TP + FN)
Example Calculation:
Recall = 40 / (40 + 10) = 40 / 50 = 0.80
Specificity (True Negative Rate)
Specificity is the proportion of actual negatives that were correctly identified.
Formula:
Specificity = TN / (TN + FP)
Example Calculation:
Specificity = 45 / (45 + 5) = 45 / 50 = 0.90
F1 Score
The F1 Score is the harmonic mean of Precision and Recall. It provides a balance between Precision and Recall.
Formula:
F1 Score = 2 * (Precision * Recall) / (Precision + Recall)
Example Calculation:
F1 Score = 2 * (0.8889 * 0.80) / (0.8889 + 0.80) ≈ 0.8421
False Positive Rate (Fall-out)
The proportion of actual negatives that were incorrectly classified as positives.
Formula:
False Positive Rate = FP / (FP + TN)
Example Calculation:
False Positive Rate = 5 / (5 + 45) = 5 / 50 = 0.10
Understanding Class Imbalance
In datasets where one class significantly outnumbers another (class imbalance), accuracy can be a misleading metric. For example, in a dataset where 99% of the instances belong to one class, a classifier that always predicts that class will have 99% accuracy but will be ineffective.
The confusion matrix allows us to see the breakdown of correct and incorrect classifications for each class, which gives a more comprehensive view of a model's performance, especially in the presence of imbalanced classes.
Confusion Matrix in Multi-Class Classification
For multi-class classification problems, the confusion matrix becomes a larger square matrix, with dimensions equal to the number of classes. Each cell in the matrix represents the number of instances of class i that were classified as class j.
Suppose we have a classifier for three classes: A, B, and C. The confusion matrix would look like:
Predicted | |||
---|---|---|---|
Actual | A | B | C |
A | 50 | 2 | 1 |
B | 5 | 45 | 5 |
C | 0 | 3 | 47 |
This matrix shows how many instances of each actual class were classified into each predicted class.
Calculating Metrics for Multi-Class Classification
In multi-class classification, metrics like Precision, Recall, and F1 Score can be calculated for each class individually, and then averaged across classes using methods like micro-averaging or macro-averaging.
Micro-Averaging
Micro-averaging aggregates the contributions of all classes to compute the average metric.
Macro-Averaging
Macro-averaging computes the metric independently for each class and then takes the average (unweighted) of the measures.
Applications of the Confusion Matrix
Model Evaluation and Selection
The confusion matrix is a valuable tool for evaluating classification models, especially in determining which model performs better on specific types of errors. It helps in selecting models based on the trade-offs between different types of errors (e.g., deciding between higher precision or higher recall based on application requirements).
Medical Diagnosis
In medical testing, the confusion matrix helps in understanding how often tests correctly identify a condition (True Positives), incorrectly indicate a condition in a healthy person (False Positives), incorrectly miss a condition (False Negatives), and correctly identify the absence of a condition (True Negatives). This is crucial for assessing the effectiveness of diagnostic tests.
Fraud Detection
In fraud detection, the confusion matrix can help in evaluating how effectively a model identifies fraudulent transactions (True Positives), misses fraudulent transactions (False Negatives), or incorrectly flags legitimate transactions as fraudulent (False Positives). This aids in balancing customer satisfaction and fraud prevention.
Confusion Matrix vs ROC Curve
While a confusion matrix provides detailed insights into the performance of a classification model by showing the exact number of True Positives, False Positives, True Negatives, and False Negatives, the ROC (Receiver Operating Characteristic) curve is a graphical representation that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.
The ROC curve plots the True Positive Rate (Sensitivity) against the False Positive Rate (1 - Specificity) at various threshold settings. The area under the ROC curve (AUC) is a measure of how well the model can distinguish between the two classes.
Both tools are useful, but they serve different purposes. The confusion matrix gives a detailed breakdown at a particular threshold, while the ROC curve shows how model performance varies across thresholds.
Limitations of the Confusion Matrix
- It is only applicable to supervised learning where true values are known.
- In multi-class scenarios, the confusion matrix can become large and harder to interpret.
- Does not account for the severity or cost associated with different types of errors.
- Only provides information at a fixed threshold; cannot show performance changes over different thresholds.
Confusion Matrix in Machine Learning Libraries
Many machine learning libraries and tools provide built-in functions to compute and display confusion matrices. Examples include:
- In Python scikit-learn, the function
confusion_matrix(y_true, y_pred)
computes the confusion matrix. - In R, the package
caret
provides the functionconfusionMatrix()
.
Confusion Matrix History
The confusion matrix has its origins in the field of classification and was first introduced by the British biologist and statistician Karl Pearson in the early 20th century. He used a matrix to describe the errors made in mathematical tables.
Over time, the confusion matrix became a standard tool in machine learning and statistical classification, particularly for analyzing the performance of classification algorithms.
The term "confusion matrix" itself was popularized by the American statistician William H. Greene in his textbook on econometrics. Today, confusion matrices are a fundamental concept taught in data science and machine learning courses.
References
- Fawcett, T. "An introduction to ROC analysis." Pattern Recognition Letters 27.8 (2006): 861-874.
- Powers, D.M.W. "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation." Journal of Machine Learning Technologies 2.1 (2011): 37-63.
- Kohavi, R. and Provost, F., "Glossary of terms", Machine Learning, Special Issue on Applications of Machine Learning and the Knowledge Discovery Process, vol.30, no.2-3, pp. 271-274, 1998.
- Saito, T. and Rehmsmeier, M., "The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets." PloS one 10.3 (2015): e0118432.