7.2 Model Evaluation#
To evaluate our model, we can examine how many values of \(y\) (understanding display rules) were predicted correctly by the model:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
df = pd.read_csv("data/data.dat", delimiter='\t')
X = np.asarray(df['age']).reshape(-1, 1)
y = np.asarray(df['display']) # binary outcome
model = LogisticRegression()
results = model.fit(X, y)
predictions = model.predict(X)
accuracy = model.score(X, y)
print("Model predictions:", predictions)
print("\nAccuracy:", accuracy)
Model predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 1 0 0
1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
Accuracy: 0.7714285714285715
An accuracy of 77% indicates the that the model correctly predicts the outcome for about 77% of the children in our data. This suggests that the model peforms reasonably well, altough it still misclassifies some cases. For a more detailed investigation, a confusion matrix is a useful way to visualize the prediction accuracy:
from sklearn.metrics import confusion_matrix, classification_report
print(f"Confusion matrix:\n {confusion_matrix(y, model.predict(X))}")
Confusion matrix:
[[24 7]
[ 9 30]]
The output of the confusion matrix provides the following values:
Predicted Negative |
Predicted Positive |
|
---|---|---|
Actual Negative |
True Negative (TN) |
False Positive (FP) |
Actual Positive |
False Negative (FN) |
True Positive (TP) |
For an even deeper inspection of the model’s accuracy, we can print the classification report:
report = classification_report(y, model.predict(X))
print(report)
precision recall f1-score support
0 0.73 0.77 0.75 31
1 0.81 0.77 0.79 39
accuracy 0.77 70
macro avg 0.77 0.77 0.77 70
weighted avg 0.77 0.77 0.77 70
The output can be interpreted as follows:
Precision: Propportion of true positive predictions among all positive predictions made by the model.
Class 0: When the model predicts that a sample does not understand the display rules (Class 0), 73% of the time it is correct.
*Class 1: When the model predicts that a sample does understand the display rules (Class 1), 81% of the time it is correct. *
Recall: The proportion of true positives that are correctly identified by the model.
Class 0: 77% of the actual samples that do not understand the display rules (Class 0) are correctly identified by the model.
Class 1: 77% of the actual samples that do understand the display rules (Class 1) are correctly identified by the model.
F1-Score: harmonic mean of precision and recall, providing a balance between the two and offering a good overall measure of model performance.
For class 0, it is 0.75 and for class 1, it is 0.79. This suggests the model is sligthly more effective at correctly predicting class 1.
Support: actual occurence of each class in the dataset
Accuracy: The overall proportion of correctly predicted observations.
model correctly predicts the outcome 77% of the time, which is fairly good
Multiple Logistic Regression#
You may want to use two or more variables as inputs for the regression. In our example, we will use age
and TOM
as predictors for display
by simply adding them to \(X\).
X = df[['age', 'TOM']]
y = df['display']
model = LogisticRegression()
results = model.fit(X, y)
report = classification_report(y, model.predict(X))
print(report)
precision recall f1-score support
0 0.79 0.74 0.77 31
1 0.80 0.85 0.82 39
accuracy 0.80 70
macro avg 0.80 0.79 0.80 70
weighted avg 0.80 0.80 0.80 70