9.3 Exercises#

In this exercise, we will revisit the Cleveland Heart Disease dataset, which we already explored in the categorical regression session. This dataset is widely used in medical research and machine learning for predicting heart disease. It includes data from patients with suspected heart conditions and features a variety of clinical and demographic attributes.

As done before, we first load the dataset and combine features (predictors) and targets into a single DataFrame, before having a look at it:

# Uncomment the following lines if you are using Google Colab
#!pip install semopy
#!pip install ucimlrepo
#!pip install jupyterquiz

import pandas as pd
import semopy
from semopy import calc_stats
from ucimlrepo import fetch_ucirepo
  
# Fetch dataset 
heart_disease = fetch_ucirepo(id=45) 
  
# Get data (they already are DataFrames) 
X = heart_disease.data.features 
y = heart_disease.data.targets 

# Create a combined DataFrame
df = pd.concat([X, y], axis=1)

print(df.describe())
print(df.head())
              age         sex          cp    trestbps        chol         fbs  \
count  303.000000  303.000000  303.000000  303.000000  303.000000  303.000000   
mean    54.438944    0.679868    3.158416  131.689769  246.693069    0.148515   
std      9.038662    0.467299    0.960126   17.599748   51.776918    0.356198   
min     29.000000    0.000000    1.000000   94.000000  126.000000    0.000000   
25%     48.000000    0.000000    3.000000  120.000000  211.000000    0.000000   
50%     56.000000    1.000000    3.000000  130.000000  241.000000    0.000000   
75%     61.000000    1.000000    4.000000  140.000000  275.000000    0.000000   
max     77.000000    1.000000    4.000000  200.000000  564.000000    1.000000   

          restecg     thalach       exang     oldpeak       slope          ca  \
count  303.000000  303.000000  303.000000  303.000000  303.000000  299.000000   
mean     0.990099  149.607261    0.326733    1.039604    1.600660    0.672241   
std      0.994971   22.875003    0.469794    1.161075    0.616226    0.937438   
min      0.000000   71.000000    0.000000    0.000000    1.000000    0.000000   
25%      0.000000  133.500000    0.000000    0.000000    1.000000    0.000000   
50%      1.000000  153.000000    0.000000    0.800000    2.000000    0.000000   
75%      2.000000  166.000000    1.000000    1.600000    2.000000    1.000000   
max      2.000000  202.000000    1.000000    6.200000    3.000000    3.000000   

             thal         num  
count  301.000000  303.000000  
mean     4.734219    0.937294  
std      1.939706    1.228536  
min      3.000000    0.000000  
25%      3.000000    0.000000  
50%      3.000000    0.000000  
75%      7.000000    2.000000  
max      7.000000    4.000000  
   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   63    1   1       145   233    1        2      150      0      2.3      3   
1   67    1   4       160   286    0        2      108      1      1.5      2   
2   67    1   4       120   229    0        2      129      1      2.6      2   
3   37    1   3       130   250    0        0      187      0      3.5      3   
4   41    0   2       130   204    0        2      172      0      1.4      1   

    ca  thal  num  
0  0.0   6.0    0  
1  3.0   3.0    2  
2  2.0   7.0    1  
3  0.0   3.0    0  
4  0.0   3.0    0  

Exercise 1: Path Modelling#

For this exercise, you will investigate the following hypotheses:

  1. age directly affects heart disease presence (num)

  2. Cholesterol (chol) and resting blood pressure (trestbps) mediate the realtionship between age and num

  3. Max heart rate achieved (thalach) and exercise-induced angina (exang) have direct effects on heart disease (num).

Additional information: The num variable is a categorical variable with values ranging from 0 to 4, where 0 indicates no heart disease, and 1 to 4 indicate the presence of heart disease, with increasing severity.

Your tasks therefore are:

  1. Create and fit a path model for the stated hypotheses.

  2. Print and interpret the relevant results

  3. Create the path diagram and check if you did everything correctly.

# TODO: Exercise 1

Exercise 2: Quiz#

from jupyterquiz import display_quiz

display_quiz('https://raw.githubusercontent.com/mibur1/psy111/main/book/solutions/quiz/question3.json')
display_quiz('https://raw.githubusercontent.com/mibur1/psy111/main/book/solutions/quiz/question4.json')