14.2 Exercises#

The Dataset#

The Oxboys dataset consists of longitudinal height measurements for boys from Oxford, UK. Our goal for today is to predict height by age. The dataset contains the following variables:

  • Subject - Unique identifier for each child in the experiment

  • age - The standardized age

  • height - The height of the child in centimeters

  • Occasion - The result of converting age from a continuous variable to a categorical one (can be ignored)

# Load packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import seaborn as sns
import matplotlib.pyplot as plt

data = sm.datasets.get_rdataset("Oxboys", "nlme").data
data
Subject age height Occasion
0 1 -1.0000 140.5 1
1 1 -0.7479 143.4 2
2 1 -0.4630 144.8 3
3 1 -0.1643 147.1 4
4 1 -0.0027 147.7 5
... ... ... ... ...
229 26 -0.0027 138.4 5
230 26 0.2466 138.9 6
231 26 0.5562 141.8 7
232 26 0.7781 142.6 8
233 26 1.0055 143.1 9

234 rows × 4 columns

Exercise 1: Data visualization#

To get a better feeling for the data, create two plots:

  • Plot 1: Scatterplot with a single regression line for height predicted by age

  • Plot 2: Scatterplot with a regression line for each subject for height predicted by age.

Inspect the plots. Do you think multilevel modeling is needed for this data?

# TODO: Exercise 1

# Plot 1: Plot with single regression line for height predicted by age
sns.lmplot(x="...", y="...", data=data)
plt.title("Combined regression")

# Plot 2: Plot with a regressin line for each subject
sns.lmplot(x="...", y="...", hue="...", data=data)
plt.title("Subject-specific regressions");

Exercise 2: Fitting a null model#

To find out how much variance in height is explained by Subject we begin by fitting a null model without any predictors. Set up the model and inspect the model output. To further analyze the data, calculate the ICC and interpret it. Which model parameter indicates the amount of variance in the intercept?

# TODO: Exercise 2

Exercise 3: Fitting a random intercept model#

As seen in the null model, there is a lot of variance explained by the grouping variable Subject. Therefore, fitting one regression over all datapoints may lead to wrong interpretations and we need the model to account for inter-individual differences. To do so, please fit a random intercept model, predicting height with age. What is the average relationship between age and height?

# TODO: Exercise 3

Exercise 4: Fitting a random intercept & random slope model#

To increase the flexibility in our model, we will now add random slopes as well, meaning that for every subject a random intecept and a random slope is fitted. Please specify the model and interpret the output.

# TODO: Exercise 4

Voluntary exercise 1: Fitting a random slope & fixed intercept model#

Until now we either looked at random intercept - fixed slope models (only the intercepts vary accross subject) or at random intercept - random slope moddels (intercept and slope vary accross subects). However, it is also possible to fit fixed intercept & random slope models. Find out how to do it and specify such a model. What is different compared to the random intercept & random slope model?

Important information: There might be some warnings which refer to the model not converging. This is probably caused by the low variance in the slope (see above). The random effects in the model have very small or zero variance, indicating that the random effect might not be necessary or is poorly estimated. For now, you can ignore this warning. However, you should be extremely cautious when interpreting model estimates in case of convergence problems.

# TODO: Voluntary exercise 1