14.2 Exercises

14.2 Exercises#

The Dataset#

The Oxboys dataset consists of longitudinal height measurements for boys from Oxford, UK. Our goal for today is to predict height by age. The dataset contains the following variables:

Subject - Unique identifier for each child in the experiment
age - The standardized age
height - The height of the child in centimeters
Occasion - The result of converting age from a continuous variable to a categorical one (can be ignored)

# Load packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import seaborn as sns
import matplotlib.pyplot as plt

data = sm.datasets.get_rdataset("Oxboys", "nlme").data
data

	Subject	age	height	Occasion
0	1	-1.0000	140.5	1
1	1	-0.7479	143.4	2
2	1	-0.4630	144.8	3
3	1	-0.1643	147.1	4
4	1	-0.0027	147.7	5
...	...	...	...	...
229	26	-0.0027	138.4	5
230	26	0.2466	138.9	6
231	26	0.5562	141.8	7
232	26	0.7781	142.6	8
233	26	1.0055	143.1	9

234 rows × 4 columns

Exercise 1: Data visualization#

To get a better feeling for the data, create two plots:

Plot 1: Scatterplot with a single regression line for height predicted by age
Plot 2: Scatterplot with a regression line for each subject for height predicted by age.

Inspect the plots. Do you think multilevel modeling is needed for this data?

# TODO: Exercise 1

# Plot 1: Plot with single regression line for height predicted by age
sns.lmplot(x="...", y="...", data=data)
plt.title("Combined regression")

# Plot 2: Plot with a regressin line for each subject
sns.lmplot(x="...", y="...", hue="...", data=data)
plt.title("Subject-specific regressions");

Exercise 2: Fitting a null model#

To find out how much variance in height is explained by Subject we begin by fitting a null model without any predictors. Set up the model and inspect the model output. To further analyze the data, calculate the ICC and interpret it. Which model parameter indicates the amount of variance in the intercept?

# TODO: Exercise 2

Exercise 3: Fitting a random intercept model#

As seen in the null model, there is a lot of variance explained by the grouping variable Subject. Therefore, fitting one regression over all datapoints may lead to wrong interpretations and we need the model to account for inter-individual differences. To do so, please fit a random intercept model, predicting height with age. What is the average relationship between age and height?

# TODO: Exercise 3

Exercise 4: Fitting a random intercept & random slope model#

To increase the flexibility in our model, we will now add random slopes as well, meaning that for every subject a random intecept and a random slope is fitted. Please specify the model and interpret the output.

# TODO: Exercise 4

Voluntary exercise 1: Fitting a random slope & fixed intercept model#

Until now we either looked at random intercept - fixed slope models (only the intercepts vary accross subject) or at random intercept - random slope moddels (intercept and slope vary accross subects). However, it is also possible to fit fixed intercept & random slope models. Find out how to do it and specify such a model. What is different compared to the random intercept & random slope model?

Important information: There might be some warnings which refer to the model not converging. This is probably caused by the low variance in the slope (see above). The random effects in the model have very small or zero variance, indicating that the random effect might not be necessary or is poorly estimated. For now, you can ignore this warning. However, you should be extremely cautious when interpreting model estimates in case of convergence problems.

# TODO: Voluntary exercise 1