7.3 Exercises#
Exercise 1: Loading the Data#
For today’s exercise we will use the Breast Cancer Wisconsin (Diagnostic). It is a collection of data used for predicting whether a breast tumor is malignant (cancerous) or benign (non-cancerous), containing information derived from images of breast mass samples obtained through fine needle aspirates.
The dataset consists of 569 samples with 30 features that measure various characteristics of cell nuclei, such as radius, texture, perimeter, and area. Each sample is labeled as either malignant (1) or benign (0).
Please visit the documentation and familiarize yourself with the dataset.
Take an initial look at the features (predictors) and targets (outcomes) through the
.head()
method.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from ucimlrepo import fetch_ucirepo
# Fetch dataset
breast_cancer_wisconsin_diagnostic = fetch_ucirepo(id=17)
# Data (as pandas dataframes)
X = breast_cancer_wisconsin_diagnostic.data.features
y = breast_cancer_wisconsin_diagnostic.data.targets
# Convert y to a 1D array (this is the required input for the logistic regression model)
y = np.ravel(y)
# Print information
print(breast_cancer_wisconsin_diagnostic.variables)
print(X.head())
print(y)
Exercise 2: Fitting the prediction model#
Fit a logistic regression model using all predictors for predicting
Get and print the accuracy of the model.
Get and print the confusion matrix for the target variable.
Review the classification report and interpret the results.
Hint: If you get a warning about convergence, try setting max_iter=10000
in the logistic regression class.
# TODO
Voluntary exercise#
Try to create a custom plot which visualizes the confusion matrix It should contain:
The four squares of the matrix (color coded)
Labels of the actual values in the middle of each square
Labels for all squares
A colorbar
A title
Use
ConfusionMatrixDisplay()
from scikit-learn to achieve the same goal (and see that sometimes it makes sense to not re-invent the wheel :))
# TODO