11.2 SEM#
As in the CFA example, we again use the HolzingerSwineford1939 dataset, which contains mental ability test scores from seventh- and eighth-grade pupils in two schools.
import semopy
data = semopy.examples.holzinger39.get_data()
data
| id | sex | ageyr | agemo | school | grade | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 13 | 1 | Pasteur | 7.0 | 3.333333 | 7.75 | 0.375 | 2.333333 | 5.75 | 1.285714 | 3.391304 | 5.75 | 6.361111 |
| 2 | 2 | 2 | 13 | 7 | Pasteur | 7.0 | 5.333333 | 5.25 | 2.125 | 1.666667 | 3.00 | 1.285714 | 3.782609 | 6.25 | 7.916667 |
| 3 | 3 | 2 | 13 | 1 | Pasteur | 7.0 | 4.500000 | 5.25 | 1.875 | 1.000000 | 1.75 | 0.428571 | 3.260870 | 3.90 | 4.416667 |
| 4 | 4 | 1 | 13 | 2 | Pasteur | 7.0 | 5.333333 | 7.75 | 3.000 | 2.666667 | 4.50 | 2.428571 | 3.000000 | 5.30 | 4.861111 |
| 5 | 5 | 2 | 12 | 2 | Pasteur | 7.0 | 4.833333 | 4.75 | 0.875 | 2.666667 | 4.00 | 2.571429 | 3.695652 | 6.30 | 5.916667 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 297 | 346 | 1 | 13 | 5 | Grant-White | 8.0 | 4.000000 | 7.00 | 1.375 | 2.666667 | 4.25 | 1.000000 | 5.086957 | 5.60 | 5.250000 |
| 298 | 347 | 2 | 14 | 10 | Grant-White | 8.0 | 3.000000 | 6.00 | 1.625 | 2.333333 | 4.00 | 1.000000 | 4.608696 | 6.05 | 6.083333 |
| 299 | 348 | 2 | 14 | 3 | Grant-White | 8.0 | 4.666667 | 5.50 | 1.875 | 3.666667 | 5.75 | 4.285714 | 4.000000 | 6.00 | 7.611111 |
| 300 | 349 | 1 | 14 | 2 | Grant-White | 8.0 | 4.333333 | 6.75 | 0.500 | 3.666667 | 4.50 | 2.000000 | 5.086957 | 6.20 | 4.388889 |
| 301 | 351 | 1 | 13 | 5 | Grant-White | NaN | 4.333333 | 6.00 | 3.375 | 3.666667 | 5.75 | 3.142857 | 4.086957 | 6.95 | 5.166667 |
301 rows × 15 columns
In Confirmatory Factor Analysis (CFA), we specify how observed variables measure latent constructs and allow latent variables to correlate. CFA focuses on the measurement model and does not include directional (regression) relationships between latent variables.
Structural Equation Modelling (SEM) extends CFA by allowing directional relationships among latent variables. An SEM therefore consists of two components:
a measurement model, which specifies how observed variables relate to latent variables, and
a structural model, which specifies regressions among latent variables.
Whenever at least one latent variable is used to predict another latent variable, the model is considered an SEM. As an example, we now specify a model in which visual ability predicts speed ability. Text-related abilities are not included in this model.
# Specify the model
desc = '''# Measurement model
visual =~ x1 + x2 + x3
speed =~ x7 + x8 + x9
# Structural model
speed ~ visual'''
# Fit the model
model = semopy.Model(desc)
results = model.fit(data)
# Visualize the model
semopy.semplot(model, plot_covs = True, std_ests=True, filename='data/sem_plot.pdf')
Model Estimates#
estimates = model.inspect(std_est=True)
print(estimates)
lval op rval Estimate Est. Std Std. Err z-value p-value
0 speed ~ visual 0.368338 0.460510 0.08295 4.440492 0.000009
1 x1 ~ visual 1.000000 0.666995 - - -
2 x2 ~ visual 0.689313 0.456019 0.123415 5.585336 0.0
3 x3 ~ visual 0.984819 0.678165 0.159891 6.159304 0.0
4 x7 ~ speed 1.000000 0.571792 - - -
5 x8 ~ speed 1.203822 0.740606 0.169823 7.088685 0.0
6 x9 ~ speed 1.051845 0.649322 0.147314 7.140136 0.0
7 speed ~~ speed 0.304732 0.787930 0.071634 4.254029 0.000021
8 visual ~~ visual 0.604528 1.000000 0.12996 4.651641 0.000003
9 x1 ~~ x1 0.754319 0.555117 0.110373 6.834275 0.0
10 x2 ~~ x2 1.094042 0.792047 0.102616 10.661487 0.0
11 x3 ~~ x3 0.688536 0.540092 0.104947 6.560781 0.0
12 x7 ~~ x7 0.796166 0.673054 0.081598 9.757123 0.0
13 x8 ~~ x8 0.461362 0.451503 0.076857 6.00289 0.0
14 x9 ~~ x9 0.586986 0.578381 0.070959 8.272212 0.0
For guidance on interpreting factor loadings, (co)variances, and residual variances, please refer to the previous chapter. Here, we focus on the newly introduced structural regression: speed ~ visual
The
Estimatecolumn contains the unstandardised regression coefficient, representing the expected change in speed ability for a one-unit increase in visual ability (on the latent scale).The
Est. Stdcolumn contains the standardised regression coefficient, representing the expected change (in standard deviations) in speed ability for a one-standard-deviation increase in visual ability.
The regression coefficient is significantly different from zero (see p-value), indicating that visual ability significantly predicts speed ability within this model.
Model Fit#
stats = semopy.calc_stats(model)
print(stats.T)
Value
DoF 8.000000e+00
DoF Baseline 1.500000e+01
chi2 4.741342e+01
chi2 p-value 1.278751e-07
chi2 Baseline 3.417214e+02
CFI 8.793669e-01
GFI 8.612512e-01
AGFI 7.398460e-01
NFI 8.612512e-01
TLI 7.738129e-01
RMSEA 1.281494e-01
AIC 2.568496e+01
BIC 7.387739e+01
LogLik 1.575197e-01
To assess how well the model reproduces the observed data, we examine the model fit indices (see the previous chapter for details).
The χ² test is significant, indicating that the model-implied covariance matrix differs from the observed covariance matrix.
CFI (≈ 0.88) and TLI (≈ 0.77) fall below commonly used thresholds for acceptable fit.
RMSEA (≈ 0.13) exceeds typical cut-offs for adequate model fit.
Taken together, these indices suggest that the model provides a poor overall fit to the data. Although the regression from visual ability to speed ability is statistically significant, the model does not adequately capture the covariance structure of the observed variables.
Summary
Structural Equation Modelling allows us to test directional hypotheses between latent variables, but a statistically significant regression path does not guarantee good overall model fit. Both parameter estimates and global fit measures must be considered when evaluating an SEM.