11.2 SEM#

As before, we will use the HolzingerSwineford1939 dataset:

import semopy

data = semopy.examples.holzinger39.get_data()
data
id sex ageyr agemo school grade x1 x2 x3 x4 x5 x6 x7 x8 x9
1 1 1 13 1 Pasteur 7.0 3.333333 7.75 0.375 2.333333 5.75 1.285714 3.391304 5.75 6.361111
2 2 2 13 7 Pasteur 7.0 5.333333 5.25 2.125 1.666667 3.00 1.285714 3.782609 6.25 7.916667
3 3 2 13 1 Pasteur 7.0 4.500000 5.25 1.875 1.000000 1.75 0.428571 3.260870 3.90 4.416667
4 4 1 13 2 Pasteur 7.0 5.333333 7.75 3.000 2.666667 4.50 2.428571 3.000000 5.30 4.861111
5 5 2 12 2 Pasteur 7.0 4.833333 4.75 0.875 2.666667 4.00 2.571429 3.695652 6.30 5.916667
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
297 346 1 13 5 Grant-White 8.0 4.000000 7.00 1.375 2.666667 4.25 1.000000 5.086957 5.60 5.250000
298 347 2 14 10 Grant-White 8.0 3.000000 6.00 1.625 2.333333 4.00 1.000000 4.608696 6.05 6.083333
299 348 2 14 3 Grant-White 8.0 4.666667 5.50 1.875 3.666667 5.75 4.285714 4.000000 6.00 7.611111
300 349 1 14 2 Grant-White 8.0 4.333333 6.75 0.500 3.666667 4.50 2.000000 5.086957 6.20 4.388889
301 351 1 13 5 Grant-White NaN 4.333333 6.00 3.375 3.666667 5.75 3.142857 4.086957 6.95 5.166667

301 rows × 15 columns

Performing SEM#

As you know, CFA is a special case of SEM, which is defined by not having unidirectional paths present at one level, i.e. no latent variable is used to predict another latent variable (only correlations, i.e. bidirectional paths are used). But what if we suspect that one latent factor is actually predicting another one. Such models would be considered SEM.

Note that SEM models contain a measurement model and a structural model. The measurement model describes relationships between measured variables and latent factors. The structural model describes relationships between latent variables.

Let’s specify and fit a SEM model that predicts speed ability with visual speed and ignores text processing related abilities.

# Specify the model
desc = '''# Measurement model
          visual =~ x1 + x2 + x3
          speed =~ x7 + x8 + x9

          # Structural model
          speed ~ visual'''

# Fit the model
model = semopy.Model(desc)
results = model.fit(data)

# Print the estimates and fit measures
estimates = model.inspect()
print(estimates)

stats = semopy.calc_stats(model)
print(stats.T)

# Visualize the model
semopy.semplot(model, plot_covs = True, filename='data/sem_plot.pdf')
      lval  op    rval  Estimate  Std. Err    z-value   p-value
0    speed   ~  visual  0.368338   0.08295   4.440492  0.000009
1       x1   ~  visual  1.000000         -          -         -
2       x2   ~  visual  0.689313  0.123415   5.585336       0.0
3       x3   ~  visual  0.984819  0.159891   6.159304       0.0
4       x7   ~   speed  1.000000         -          -         -
5       x8   ~   speed  1.203822  0.169823   7.088685       0.0
6       x9   ~   speed  1.051845  0.147314   7.140136       0.0
7    speed  ~~   speed  0.304732  0.071634   4.254029  0.000021
8   visual  ~~  visual  0.604528   0.12996   4.651641  0.000003
9       x1  ~~      x1  0.754319  0.110373   6.834275       0.0
10      x2  ~~      x2  1.094042  0.102616  10.661487       0.0
11      x3  ~~      x3  0.688536  0.104947   6.560781       0.0
12      x7  ~~      x7  0.796166  0.081598   9.757123       0.0
13      x8  ~~      x8  0.461362  0.076857    6.00289       0.0
14      x9  ~~      x9  0.586986  0.070959   8.272212       0.0
                      Value
DoF            8.000000e+00
DoF Baseline   1.500000e+01
chi2           4.741342e+01
chi2 p-value   1.278751e-07
chi2 Baseline  3.417214e+02
CFI            8.793669e-01
GFI            8.612512e-01
AGFI           7.398460e-01
NFI            8.612512e-01
TLI            7.738129e-01
RMSEA          1.281494e-01
AIC            2.568496e+01
BIC            7.387739e+01
LogLik         1.575197e-01
../../../_images/813e7d0a2dc3bb431aed4e4b64c02589097b23a34dafc7a9bd8b0b4f221a357e.svg

Model estimates#

For a guide on how to interpret loadings, (co)variances and residuals, please refer to the previous chapter.

However, notice the the newly added regression: speed ~ visual. The Estimate column can be refered as the slope of the added regression, meaning that a one unit increase in visual comes on average with a 0.37 unit increase in speed. As indicated by the p-value, this coefficient is significantly different from zero. With that, we can infer that visual is significantly predicting speed.

Fit measures#

To assess model fit, let’s look at the fit measures (please refer to the previous chapter for details). The significant \(\chi^2\)-Test indicates that the model implied covariance matrix is significantly different from the empirical one. Furthermore, TLI (0.77), RMSEA (0.13) and CFI (0.88) are not in the desired windows, indicating bad fit.