Logistic Regression#

Exercise 1: Loading the Data#

For today’s exercise we will use the Breast Cancer Wisconsin (Diagnostic). It is a collection of data used for predicting whether a breast tumor is malignant (cancerous) or benign (non-cancerous), containinh information derived from images of breast mass samples obtained through fine needle aspirates.

The dataset consists of 569 samples with 30 features that measure various characteristics of cell nuclei, such as radius, texture, perimeter, and area. Each sample is labeled as either malignant (1) or benign (0).

  1. Please visit the documentation and familiarize yourself with the dataset.

  2. Take an initial look at the features (predictors) and targets (outcomes) through the .head() method.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from ucimlrepo import fetch_ucirepo

# Fetch dataset 
breast_cancer_wisconsin_diagnostic = fetch_ucirepo(id=17) 
  
# Data (as pandas dataframes) 
X = breast_cancer_wisconsin_diagnostic.data.features 
y = breast_cancer_wisconsin_diagnostic.data.targets

# Convert y to a 1D array (this is the required input for the logistic regression model)
y = np.ravel(y)

# Print information 
print(breast_cancer_wisconsin_diagnostic.variables)
print(X.head())
print(y)
                  name     role         type demographic description units  \
0                   ID       ID  Categorical        None        None  None   
1            Diagnosis   Target  Categorical        None        None  None   
2              radius1  Feature   Continuous        None        None  None   
3             texture1  Feature   Continuous        None        None  None   
4           perimeter1  Feature   Continuous        None        None  None   
5                area1  Feature   Continuous        None        None  None   
6          smoothness1  Feature   Continuous        None        None  None   
7         compactness1  Feature   Continuous        None        None  None   
8           concavity1  Feature   Continuous        None        None  None   
9      concave_points1  Feature   Continuous        None        None  None   
10           symmetry1  Feature   Continuous        None        None  None   
11  fractal_dimension1  Feature   Continuous        None        None  None   
12             radius2  Feature   Continuous        None        None  None   
13            texture2  Feature   Continuous        None        None  None   
14          perimeter2  Feature   Continuous        None        None  None   
15               area2  Feature   Continuous        None        None  None   
16         smoothness2  Feature   Continuous        None        None  None   
17        compactness2  Feature   Continuous        None        None  None   
18          concavity2  Feature   Continuous        None        None  None   
19     concave_points2  Feature   Continuous        None        None  None   
20           symmetry2  Feature   Continuous        None        None  None   
21  fractal_dimension2  Feature   Continuous        None        None  None   
22             radius3  Feature   Continuous        None        None  None   
23            texture3  Feature   Continuous        None        None  None   
24          perimeter3  Feature   Continuous        None        None  None   
25               area3  Feature   Continuous        None        None  None   
26         smoothness3  Feature   Continuous        None        None  None   
27        compactness3  Feature   Continuous        None        None  None   
28          concavity3  Feature   Continuous        None        None  None   
29     concave_points3  Feature   Continuous        None        None  None   
30           symmetry3  Feature   Continuous        None        None  None   
31  fractal_dimension3  Feature   Continuous        None        None  None   

   missing_values  
0              no  
1              no  
2              no  
3              no  
4              no  
5              no  
6              no  
7              no  
8              no  
9              no  
10             no  
11             no  
12             no  
13             no  
14             no  
15             no  
16             no  
17             no  
18             no  
19             no  
20             no  
21             no  
22             no  
23             no  
24             no  
25             no  
26             no  
27             no  
28             no  
29             no  
30             no  
31             no  
   radius1  texture1  perimeter1   area1  smoothness1  compactness1  \
0    17.99     10.38      122.80  1001.0      0.11840       0.27760   
1    20.57     17.77      132.90  1326.0      0.08474       0.07864   
2    19.69     21.25      130.00  1203.0      0.10960       0.15990   
3    11.42     20.38       77.58   386.1      0.14250       0.28390   
4    20.29     14.34      135.10  1297.0      0.10030       0.13280   

   concavity1  concave_points1  symmetry1  fractal_dimension1  ...  radius3  \
0      0.3001          0.14710     0.2419             0.07871  ...    25.38   
1      0.0869          0.07017     0.1812             0.05667  ...    24.99   
2      0.1974          0.12790     0.2069             0.05999  ...    23.57   
3      0.2414          0.10520     0.2597             0.09744  ...    14.91   
4      0.1980          0.10430     0.1809             0.05883  ...    22.54   

   texture3  perimeter3   area3  smoothness3  compactness3  concavity3  \
0     17.33      184.60  2019.0       0.1622        0.6656      0.7119   
1     23.41      158.80  1956.0       0.1238        0.1866      0.2416   
2     25.53      152.50  1709.0       0.1444        0.4245      0.4504   
3     26.50       98.87   567.7       0.2098        0.8663      0.6869   
4     16.67      152.20  1575.0       0.1374        0.2050      0.4000   

   concave_points3  symmetry3  fractal_dimension3  
0           0.2654     0.4601             0.11890  
1           0.1860     0.2750             0.08902  
2           0.2430     0.3613             0.08758  
3           0.2575     0.6638             0.17300  
4           0.1625     0.2364             0.07678  

[5 rows x 30 columns]
['M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M'
 'M' 'B' 'B' 'B' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M'
 'M' 'B' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'M'
 'M' 'B' 'M' 'M' 'B' 'B' 'B' 'B' 'M' 'B' 'M' 'M' 'B' 'B' 'B' 'B' 'M' 'B'
 'M' 'M' 'B' 'M' 'B' 'M' 'M' 'B' 'B' 'B' 'M' 'M' 'B' 'M' 'M' 'M' 'B' 'B'
 'B' 'M' 'B' 'B' 'M' 'M' 'B' 'B' 'B' 'M' 'M' 'B' 'B' 'B' 'B' 'M' 'B' 'B'
 'M' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'M' 'M' 'B' 'M' 'M' 'B' 'B' 'B'
 'M' 'M' 'B' 'M' 'B' 'M' 'M' 'B' 'M' 'M' 'B' 'B' 'M' 'B' 'B' 'M' 'B' 'B'
 'B' 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'B' 'B' 'B' 'M'
 'M' 'B' 'M' 'B' 'B' 'M' 'M' 'B' 'B' 'M' 'M' 'B' 'B' 'B' 'B' 'M' 'B' 'B'
 'M' 'M' 'M' 'B' 'M' 'B' 'M' 'B' 'B' 'B' 'M' 'B' 'B' 'M' 'M' 'B' 'M' 'M'
 'M' 'M' 'B' 'M' 'M' 'M' 'B' 'M' 'B' 'M' 'B' 'B' 'M' 'B' 'M' 'M' 'M' 'M'
 'B' 'B' 'M' 'M' 'B' 'B' 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'M' 'M' 'B' 'B' 'M'
 'B' 'B' 'M' 'M' 'B' 'M' 'B' 'B' 'B' 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'M' 'B'
 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'B' 'B' 'B' 'B'
 'B' 'B' 'M' 'B' 'M' 'B' 'B' 'M' 'B' 'B' 'M' 'B' 'M' 'M' 'B' 'B' 'B' 'B'
 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'B' 'M' 'B' 'M' 'B' 'B' 'B'
 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'B' 'B' 'M' 'B' 'M'
 'B' 'B' 'B' 'B' 'M' 'M' 'M' 'B' 'B' 'B' 'B' 'M' 'B' 'M' 'B' 'M' 'B' 'B'
 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'M' 'M' 'B' 'B' 'B' 'B' 'B' 'B'
 'B' 'B' 'B' 'B' 'B' 'M' 'M' 'B' 'M' 'M' 'M' 'B' 'M' 'M' 'B' 'B' 'B' 'B'
 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'B' 'B' 'M' 'B' 'B' 'M' 'M' 'B' 'B'
 'B' 'B' 'B' 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'B' 'B' 'B' 'B'
 'M' 'B' 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'B'
 'M' 'M' 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'B' 'M' 'B' 'M' 'B' 'B' 'M'
 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'M' 'B' 'B' 'B' 'B' 'B' 'B'
 'M' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'B'
 'B' 'M' 'B' 'M' 'B' 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'M' 'M' 'B' 'M' 'B' 'M'
 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'B' 'M' 'B' 'M' 'B' 'M' 'M' 'B' 'B' 'B' 'M'
 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'M' 'M' 'B' 'B' 'B'
 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B'
 'B' 'B' 'B' 'B' 'M' 'M' 'M' 'M' 'M' 'M' 'B']

Exercise 2: Fitting the prediction model#

  1. Fit a logitic regression model using all predictors for predicting

  2. Get and print the accuracy of the model.

  3. Get and print the confusion matrix for the target variable.

  4. Review the classification report and interpret the results.

Hint: If you get a warning about convergence, try setting max_iter=10000 in the logistic regression class.

model = LogisticRegression(max_iter=10000)
results = model.fit(X, y)

# Get the intercept and coefficients
intercept = results.intercept_
coef = results.coef_

print("Intercept:", intercept)
print("Coefficients:", coef)

# Evaluate the model using predictions
print("Model accuracy:", model.score(X, y))

# Confusion Matrix
conf = confusion_matrix(y, model.predict(X))
print("Confusion matrix:\n", conf)

#classification report
from sklearn.metrics import classification_report
report = classification_report(y, model.predict(X))
print(report)
Intercept: [-27.95056684]
Coefficients: [[-1.01649102 -0.1801684   0.27565655 -0.0226662   0.1788688   0.22074883
   0.53685545  0.29603555  0.26639959  0.03046789  0.07816283 -1.25532588
  -0.11582457  0.10854792  0.02539334 -0.06767259  0.03651147  0.03822884
   0.03627027 -0.0140333  -0.15086497  0.43624862  0.10562901  0.01380557
   0.35772193  0.68749004  1.4271106   0.60399598  0.72848145  0.09506939]]
Model accuracy: 0.9578207381370826
Confusion matrix:
 [[348   9]
 [ 15 197]]
              precision    recall  f1-score   support

           B       0.96      0.97      0.97       357
           M       0.96      0.93      0.94       212

    accuracy                           0.96       569
   macro avg       0.96      0.95      0.95       569
weighted avg       0.96      0.96      0.96       569

Voluntary exercise#

  1. Try to create a custom plot which visualizes the confusion matrix It should contain:

    • The four squares of the matrix (color coded)

    • Labels of the actual values in the middle of each square

    • Labels for all squares

    • A colorbar

    • A title

  2. Use ConfusionMatrixDisplay() from scikit-learn to achieve the same goal (and see that sometimes it makes sense to not re-invent the wheel :))

# 1. Plot a custom confusion matrix
fig, ax = plt.subplots(figsize=(8, 8))
cax = ax.imshow(conf, cmap='Blues')

# labels
ax.xaxis.set(ticks=(0, 1), ticklabels=('Predicted Benign', 'Predicted Malignant'))
ax.yaxis.set(ticks=(0, 1), ticklabels=('Actual Benign', 'Actual Malignant'))

# Annotate the confusion matrix
for i in range(2):
    for j in range(2):
        ax.text(j, i, conf[i, j], ha='center', va='center', color='red', fontsize=16)

ax.set_title('Confusion Matrix', fontsize=15)

plt.colorbar(cax)
plt.show()

# 2. Using scikit-learn
from sklearn.metrics import ConfusionMatrixDisplay

# Display the confusion matrix using scikit-learn
fig, ax = plt.subplots(figsize=(6, 6))
disp = ConfusionMatrixDisplay(confusion_matrix=conf,
                              display_labels=["Benign", "Malignant"])

disp.plot(cmap='Blues', ax=ax, colorbar=True)
plt.title("Confusion Matrix")
plt.show()
../../_images/f45d1046a706aae811ba4a3256917f189390d93067b3e6afaa0c17b3a8af5ab6.png ../../_images/8b28e8c832376786c600fa728120a3105688402b31116de3071871fe231e05e5.png