14.2 Exercises#
The Dataset#
The Oxboys
dataset consists of longitudinal height measurements for boys from Oxford, UK. Our goal for today is to predict height
by age
. The dataset contains the following variables:
Subject
- Unique identifier for each child in the experimentage
- The standardized ageheight
- The height of the child in centimetersOccasion
- The result of converting age from a continuous variable to a categorical one (can be ignored)
# Load packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import seaborn as sns
import matplotlib.pyplot as plt
data = sm.datasets.get_rdataset("Oxboys", "nlme").data
data
Subject | age | height | Occasion | |
---|---|---|---|---|
0 | 1 | -1.0000 | 140.5 | 1 |
1 | 1 | -0.7479 | 143.4 | 2 |
2 | 1 | -0.4630 | 144.8 | 3 |
3 | 1 | -0.1643 | 147.1 | 4 |
4 | 1 | -0.0027 | 147.7 | 5 |
... | ... | ... | ... | ... |
229 | 26 | -0.0027 | 138.4 | 5 |
230 | 26 | 0.2466 | 138.9 | 6 |
231 | 26 | 0.5562 | 141.8 | 7 |
232 | 26 | 0.7781 | 142.6 | 8 |
233 | 26 | 1.0055 | 143.1 | 9 |
234 rows × 4 columns
Exercise 1: Data visualization#
To get a better feeling for the data, create two plots:
Plot 1: Scatterplot with a single regression line for
height
predicted byage
Plot 2: Scatterplot with a regression line for each subject for
height
predicted byage
.
Inspect the plots. Do you think multilevel modeling is needed for this data?
# TODO: Exercise 1
# Plot 1: Plot with single regression line for height predicted by age
sns.lmplot(x="...", y="...", data=data)
plt.title("Combined regression")
# Plot 2: Plot with a regressin line for each subject
sns.lmplot(x="...", y="...", hue="...", data=data)
plt.title("Subject-specific regressions");
Exercise 2: Fitting a null model#
To find out how much variance in height
is explained by Subject
we begin by fitting a null model without any predictors. Set up the model and inspect the model output. To further analyze the data, calculate the ICC and interpret it. Which model parameter indicates the amount of variance in the intercept?
# TODO: Exercise 2
Exercise 3: Fitting a random intercept model#
As seen in the null model, there is a lot of variance explained by the grouping variable Subject
. Therefore, fitting one regression over all datapoints may lead to wrong interpretations and we need the model to account for inter-individual differences. To do so, please fit a random intercept model, predicting height
with age
. What is the average relationship between age
and height
?
# TODO: Exercise 3
Exercise 4: Fitting a random intercept & random slope model#
To increase the flexibility in our model, we will now add random slopes as well, meaning that for every subject a random intecept and a random slope is fitted. Please specify the model and interpret the output.
# TODO: Exercise 4
Voluntary exercise 1: Fitting a random slope & fixed intercept model#
Until now we either looked at random intercept - fixed slope models (only the intercepts vary accross subject) or at random intercept - random slope moddels (intercept and slope vary accross subects). However, it is also possible to fit fixed intercept & random slope models. Find out how to do it and specify such a model. What is different compared to the random intercept & random slope model?
Important information: There might be some warnings which refer to the model not converging. This is probably caused by the low variance in the slope (see above). The random effects in the model have very small or zero variance, indicating that the random effect might not be necessary or is poorly estimated. For now, you can ignore this warning. However, you should be extremely cautious when interpreting model estimates in case of convergence problems.
# TODO: Voluntary exercise 1