Chapter 12. Path Modeling

Multiple Regression and Beyond (3e) by Timothy Z. Keith

Author

Sungkyun Cho

Published

November 8, 2024

Load libraries

library(haven)
library(psych)
library(tidyverse)
library(lavaan)
library(semTools)
library(manymome)

공분산 행렬

covariance matrix

공분산 covariance of \(x, ~ y\): \(Cov(x, y)\) = \(\displaystyle\frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})\)
분산 variance of \(x\): covarance of \(x, ~ x\): \(V(x) = Cov(x, x)\) = \(\displaystyle\frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2\)
상관 correlation of \(x, ~ y\): \(Cor(x, y)\) = \(\displaystyle\frac{Cov(x, y)}{\sigma_x \sigma_y}\)
where \(\sigma_x\) and \(\sigma_y\) are the standard deviations of x and y respectively.

변수들 간의 상관계수/공분산 행렬을 알면 회귀계수를 구할 수 있음. 반대로, 회귀계수들을 알면 상관계수/공분산 행렬을 구할 수 있음.

Source: pp. 258-259, Multiple Regression and Beyond (3e) by Timothy Z. Keith

\(a = r_{12}\)
\(\displaystyle b = \frac{r_{13} - r_{12} r_{23}}{1 - r_{12}^2}\)
\(\displaystyle c = \frac{r_{23} - r_{12} r_{13}}{1 - r_{12}^2}\)

데이터 준비

# variable names
vars <- c("fam_back", "ability", "motivate", "courses", "achieve")

# Correlation matrix
lower <- "
1.000
.417 1.000
.190 .205 1.000
.372 .498 .375 1.000
.417 .737 .255 .615 1.000"

# standard deviations
sd <- c(1.000, 15.000, 10.000, 2.000, 10.000)

achi_cov <- getCov(lower, names = vars, sd = sd)
achi_cov |> print()

         fam_back ability motivate courses achieve
fam_back    1.000   6.255     1.90   0.744    4.17
ability     6.255 225.000    30.75  14.940  110.55
motivate    1.900  30.750   100.00   7.500   25.50
courses     0.744  14.940     7.50   4.000   12.30
achieve     4.170 110.550    25.50  12.300  100.00

공분산 행렬을 이용해서 회귀계수를 구하면,

paths <- "
  achieve ~ b*ability + c*motivate
  motivate ~ a*ability
"

achi_model <- sem(paths, sample.cov = achi_cov, sample.nobs = 1000)
parameterEstimates(achi_model, standardized = "std.all", ci = FALSE) |> print()

       lhs op      rhs label     est    se      z pvalue std.all
1  achieve  ~  ability     b   0.477 0.014 33.143      0   0.715
2  achieve  ~ motivate     c   0.108 0.022  5.030      0   0.108
3 motivate  ~  ability     a   0.137 0.021  6.623      0   0.205
4  achieve ~~  achieve        44.511 1.991 22.361      0   0.446
5 motivate ~~ motivate        95.702 4.280 22.361      0   0.958
6  ability ~~  ability       224.775 0.000     NA     NA   1.000

한편, 제한된 다음과 같은 모형에 대한 회귀계수를 구하면,

paths <- "
  achieve ~ 0*ability + c*motivate  # ability의 계수를 0으로 고정; 효과 크기 제로!
  motivate ~ a*ability
"

achi_model <- sem(paths, sample.cov = achi_cov, sample.nobs = 100)
parameterEstimates(achi_model, standardized = "std.all", ci = FALSE) |> print()

       lhs op      rhs label     est     se     z pvalue std.all
1  achieve  ~  ability         0.000  0.000    NA     NA   0.000
2  achieve  ~ motivate     c   0.255  0.097 2.637  0.008   0.255
3 motivate  ~  ability     a   0.137  0.065 2.094  0.036   0.205
4  achieve ~~  achieve        92.563 13.090 7.071  0.000   0.935
5 motivate ~~ motivate        94.840 13.412 7.071  0.000   0.958
6  ability ~~  ability       222.750  0.000    NA     NA   1.000

이 때, 이 회귀계수들을 이용해 상관계수를 역으로 계산하면; implied/predicted correlation matrix

# implied/predicted correlation matrix
lavInspect(achi_model, "cor.ov") |>  print()

         achiev motivt abilty
achieve   1.000              
motivate  0.255  1.000       
ability   0.052  0.205  1.000

실제 상관계수와 비교해보면,

# observed correlation matrix
achi_cov[c(5, 3, 2), c(5, 3, 2)] |> cov2cor() |> print()

         achieve motivate ability
achieve    1.000    0.255   0.737
motivate   0.255    1.000   0.205
ability    0.737    0.205   1.000

이 때, 원래의 상관계수를 복구하지 못한다면 데이터와 모형이 맞지 않는다고 볼 수 있음!

파라미터 추정 방식

lavaan website: estimators and more

기본적으로 ML (Maximum Likelihood) 방법을 사용

공분산 행렬로부터 데이터 생성

# generate a dataset with the same covariance matrix
set.seed(123)
achi <- semTools::kd(achi_cov, n = 1000, type = "exact") |> as_tibble()
achi |> print()

# A tibble: 1,000 x 5
   fam_back ability motivate courses achieve
      <dbl>   <dbl>    <dbl>   <dbl>   <dbl>
 1    0.220  -6.13     -2.48 -2.95    -8.33 
 2   -1.28   -0.980    -6.90  1.99     0.105
 3   -1.19  -15.8      -6.33 -1.19    -9.33 
 4   -1.32   -5.75      2.82  3.14     6.53 
 5    1.18   12.1       3.58  2.58     3.64 
 6    1.72   21.7      -9.78  0.0788  11.5  
 7   -0.629  -4.56      2.07 -2.26    -0.556
 8   -0.559  -6.81     -7.23 -0.0846  -3.99 
 9    0.706  10.4      18.6   2.09    21.5  
10    1.02    4.77     11.2   1.17     5.92 
# i 990 more rows

# check the covariance matrix
cov(achi) |> print()

          fam_back    ability   motivate    courses    achieve
fam_back 1.0010010   6.261261   1.901902  0.7447447   4.174174
ability  6.2612613 225.225225  30.780781 14.9549550 110.660661
motivate 1.9019019  30.780781 100.100100  7.5075075  25.525526
courses  0.7447447  14.954955   7.507508  4.0040040  12.312312
achieve  4.1741742 110.660661  25.525526 12.3123123 100.100100

구조 모형

Source: p. 270, Multiple Regression and Beyond (3e) by Timothy Z. Keith

achi_paths <- "
  achieve ~ fam_back + ability + motivate + courses
  courses ~ fam_back + ability + motivate
  motivate ~ fam_back + ability
  ability ~ fam_back
"

achi_model <- sem(achi_paths, data = achi)
summary(achi_model, standardized = TRUE, header = FALSE) |> print()


Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  achieve ~                                                             
    fam_back          0.695    0.217    3.202    0.001    0.695    0.069
    ability           0.367    0.015   23.758    0.000    0.367    0.551
    motivate          0.013    0.021    0.604    0.546    0.013    0.013
    courses           1.550    0.119   12.996    0.000    1.550    0.310
  courses ~                                                             
    fam_back          0.330    0.057    5.838    0.000    0.330    0.165
    ability           0.050    0.004   13.194    0.000    0.050    0.374
    motivate          0.053    0.005   10.158    0.000    0.053    0.267
  motivate ~                                                            
    fam_back          1.265    0.338    3.741    0.000    1.265    0.127
    ability           0.101    0.023    4.502    0.000    0.101    0.152
  ability ~                                                             
    fam_back          6.255    0.431   14.508    0.000    6.255    0.417

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .achieve          37.103    1.659   22.361    0.000   37.103    0.371
   .courses           2.608    0.117   22.361    0.000    2.608    0.652
   .motivate         94.475    4.225   22.361    0.000   94.475    0.945
   .ability         185.875    8.313   22.361    0.000  185.875    0.826

\(R^2\) 파악

1 - 잔차의 표준화된 분산(variance): .xxxx 표시
직접 출력하는 옵션: rsquare = TRUE

summary(achi_model, standardized = TRUE, rsquare = TRUE, header = FALSE) |> print()


Parameter Estimates:

  Standard errors                            Bootstrap
  Number of requested bootstrap draws             1000
  Number of successful bootstrap draws             965

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  achieve ~                                                             
    fam_back          0.695    0.221    3.141    0.002    0.695    0.069
    ability           0.367    0.016   23.074    0.000    0.367    0.551
    motivate   (a)    0.013    0.021    0.610    0.542    0.013    0.013
    courses   (b1)    1.550    0.119   12.974    0.000    1.550    0.310
  courses ~                                                             
    fam_back          0.330    0.059    5.604    0.000    0.330    0.165
    ability           0.050    0.004   13.685    0.000    0.050    0.374
    motivate  (b2)    0.053    0.005    9.984    0.000    0.053    0.267
  motivate ~                                                            
    fam_back          1.265    0.331    3.826    0.000    1.265    0.127
    ability           0.101    0.024    4.270    0.000    0.101    0.152
  ability ~                                                             
    fam_back          6.255    0.433   14.455    0.000    6.255    0.417

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .achieve          37.103    1.730   21.450    0.000   37.103    0.371
   .courses           2.608    0.121   21.579    0.000    2.608    0.652
   .motivate         94.475    4.205   22.470    0.000   94.475    0.945
   .ability         185.875    8.120   22.891    0.000  185.875    0.826

R-Square:
                   Estimate
    achieve           0.629
    courses           0.348
    motivate          0.055
    ability           0.174

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    indirect          0.083    0.011    7.739    0.000    0.083    0.083
    total             0.095    0.021    4.549    0.000    0.095    0.095

summary() options

header: TRUE
fit.measures: FALSE
estimates: TRUE
ci: FALSE
fmi: FALSE
standardized: FALSE / “std.all”, “std.lv”
rsquare: FALSE
modindices: FALSE
std.nox: FALSE
remove.step1: TRUE
remove.unused: TRUE
cov.std: TRUE
nd: 3L

lavaan doc 참고;

If header = TRUE, the header section (including fit measures) is printed.
If fit.measures = TRUE, additional fit measures are added to the header section. The related fm.args list allows to set options related to the fit measures. See fitMeasures for more details.
If estimates = TRUE, print the parameter estimates section.
If ci = TRUE, add confidence intervals to the parameter estimates section.
If fmi = TRUE, add the fmi (fraction of missing information) column, if it is available.
If standardized=TRUE or a character vector, the standardized solution is also printed (see parameterEstimates). Note that SEs and tests are still based on unstandardized estimates. Use standardizedSolution to obtain SEs and test statistics for standardized estimates.
The std.nox argument is deprecated; the standardized argument allows “std.nox” solution to be specifically requested.
If remove.step1, the parameters of the measurement part are not shown (only used when using sam().) If remove.unused, automatically added parameters that are fixed to their default (0 or 1) values are removed.
If rsquare=TRUE, the R-Square values for the dependent variables in the model are printed.
If efa = TRUE, EFA related information is printed. The related efa.args list allows to set options related to the EFA output. See summary.efaList for more details.
If modindices=TRUE, modification indices are printed for all fixed parameters.
The argument nd determines the number of digits after the decimal point to be printed (currently only in the parameter estimates section.)

직접, 간접, 총효과

Direct, Indirect, and Total Effects

Source: p. 272, Multiple Regression and Beyond (3e) by Timothy Z. Keith

movivation -> achievement

achi_paths <- "
  achieve ~ fam_back + ability + a*motivate + b1*courses
  courses ~ fam_back + ability + b2*motivate
  motivate ~ fam_back + ability
  ability ~ fam_back

  indirect := b1*b2
  total := a + b1*b2
"

achi_model <- sem(achi_paths, data = achi)
summary(achi_model, standardized = TRUE, header = FALSE) |> print()


Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  achieve ~                                                             
    fam_back          0.695    0.217    3.202    0.001    0.695    0.069
    ability           0.367    0.015   23.758    0.000    0.367    0.551
    motivate   (a)    0.013    0.021    0.604    0.546    0.013    0.013
    courses   (b1)    1.550    0.119   12.996    0.000    1.550    0.310
  courses ~                                                             
    fam_back          0.330    0.057    5.838    0.000    0.330    0.165
    ability           0.050    0.004   13.194    0.000    0.050    0.374
    motivate  (b2)    0.053    0.005   10.158    0.000    0.053    0.267
  motivate ~                                                            
    fam_back          1.265    0.338    3.741    0.000    1.265    0.127
    ability           0.101    0.023    4.502    0.000    0.101    0.152
  ability ~                                                             
    fam_back          6.255    0.431   14.508    0.000    6.255    0.417

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .achieve          37.103    1.659   22.361    0.000   37.103    0.371
   .courses           2.608    0.117   22.361    0.000    2.608    0.652
   .motivate         94.475    4.225   22.361    0.000   94.475    0.945
   .ability         185.875    8.313   22.361    0.000  185.875    0.826

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    indirect          0.083    0.010    8.003    0.000    0.083    0.083
    total             0.095    0.021    4.448    0.000    0.095    0.095

Bootstrapping으로 추정

achi_model <- sem(
  achi_paths, 
  data = achi, 
  se = "bootstrap",  # standard errors
  bootstrap = 1000,  # number of bootstrap samples
)
summary(achi_model, standardized = TRUE, header = FALSE, ci = TRUE) |> print()


Parameter Estimates:

  Standard errors                            Bootstrap
  Number of requested bootstrap draws             1000
  Number of successful bootstrap draws             962

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
  achieve ~                                                             
    fam_back          0.695    0.214    3.253    0.001    0.272    1.150
    ability           0.367    0.016   23.685    0.000    0.334    0.396
    motivate   (a)    0.013    0.020    0.625    0.532   -0.027    0.052
    courses   (b1)    1.550    0.121   12.767    0.000    1.313    1.771
  courses ~                                                             
    fam_back          0.330    0.057    5.814    0.000    0.211    0.437
    ability           0.050    0.004   13.570    0.000    0.043    0.057
    motivate  (b2)    0.053    0.005   10.221    0.000    0.043    0.063
  motivate ~                                                            
    fam_back          1.265    0.351    3.607    0.000    0.568    1.973
    ability           0.101    0.025    4.112    0.000    0.054    0.147
  ability ~                                                             
    fam_back          6.255    0.436   14.350    0.000    5.396    7.099
   Std.lv  Std.all
                  
    0.695    0.069
    0.367    0.551
    0.013    0.013
    1.550    0.310
                  
    0.330    0.165
    0.050    0.374
    0.053    0.267
                  
    1.265    0.127
    0.101    0.152
                  
    6.255    0.417

Variances:
                   Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
   .achieve          37.103    1.718   21.601    0.000   33.508   40.317
   .courses           2.608    0.116   22.453    0.000    2.363    2.842
   .motivate         94.475    4.170   22.656    0.000   85.326  102.484
   .ability         185.875    8.044   23.108    0.000  171.185  202.593
   Std.lv  Std.all
   37.103    0.371
    2.608    0.652
   94.475    0.945
  185.875    0.826

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
    indirect          0.083    0.011    7.870    0.000    0.063    0.104
    total             0.095    0.021    4.617    0.000    0.055    0.138
   Std.lv  Std.all
    0.083    0.083
    0.095    0.095

parameterEstimates(
  achi_model, 
  boot.ci.type = "bca.simple", # bias-corrected
  standardized = "std.all" # standardized estimates
) |> filter(label != "") |> print()

       lhs op      rhs    label   est    se      z pvalue ci.lower ci.upper
1  achieve  ~ motivate        a 0.013 0.020  0.625  0.532   -0.027    0.053
2  achieve  ~  courses       b1 1.550 0.121 12.767  0.000    1.313    1.771
3  courses  ~ motivate       b2 0.053 0.005 10.221  0.000    0.044    0.065
4 indirect :=    b1*b2 indirect 0.083 0.011  7.870  0.000    0.064    0.106
5    total :=  a+b1*b2    total 0.095 0.021  4.617  0.000    0.057    0.140
  std.all
1   0.013
2   0.310
3   0.267
4   0.083
5   0.095

패키지를 이용해 경로를 찾아 추정: manymome 참조

library(manymome)

# All specific indirect paths from ability to achieve
paths <- all_indirect_paths(achi_model,
  x = "ability",
  y = "achieve"
)
paths |> print()

Call: 
all_indirect_paths(fit = achi_model, x = "ability", y = "achieve")
Path(s): 
  path                                     
1 ability -> courses -> achieve            
2 ability -> motivate -> achieve           
3 ability -> motivate -> courses -> achieve

ind_est_std <- many_indirect_effects(paths,
  fit = achi_model, R = 1000,
  boot_ci = TRUE, boot_type = "bc",
  standardized_x = TRUE,
  standardized_y = TRUE
)

ind_est_std


==  Indirect Effect(s) (Both x-variable(s) and y-variable(s) Standardized)  ==
                                            std  CI.lo CI.hi Sig
ability -> courses -> achieve             0.116  0.094 0.140 Sig
ability -> motivate -> achieve            0.002 -0.004 0.010    
ability -> motivate -> courses -> achieve 0.013  0.007 0.020 Sig

 - [CI.lo to CI.hi] are 95.0% bias-corrected confidence intervals by
   nonparametric bootstrapping with 1000 samples.
 - std: The standardized indirect effects.

# total indirect effect
ind_est_std[[1]] + ind_est_std[[2]] + ind_est_std[[3]]


== Indirect Effect (Both 'ability' and 'achieve' Standardized) ==
                                                               
 Path:                ability -> courses -> achieve            
 Path:                ability -> motivate -> achieve           
 Path:                ability -> motivate -> courses -> achieve
 Function of Effects: 0.131                                    
 95.0% Bootstrap CI:  [0.107 to 0.154]                         

Computation of the Function of Effects:
 ((ability->courses->achieve)
+(ability->motivate->achieve))
+(ability->motivate->courses->achieve) 


Bias-corrected confidence interval formed by nonparametric
bootstrapping with 1000 bootstrap samples.

Total effect = direct effect + total indirect effect

# direct effect
direct_std <- indirect_effect(x = "ability",
                              y = "achieve",
                              fit = achi_model,
                              boot_ci = TRUE, boot_type = "bc",
                              standardized_x = TRUE,
                              standardized_y = TRUE)
direct_std


==  Effect (Both 'ability' and 'achieve' Standardized) ==
                                       
 Path:               ability -> achieve
 Effect:             0.551             
 95.0% Bootstrap CI: [0.506 to 0.597]  

Computation Formula:
  (b.achieve~ability)*sd_ability/sd_achieve
Computation:
  (0.36737)*(15.00000)/(10.00000)

Bias-corrected confidence interval formed by nonparametric
bootstrapping with 1000 bootstrap samples.

direct_std + ind_est_std[[1]] + ind_est_std[[2]] + ind_est_std[[3]]


== Indirect Effect (Both 'ability' and 'achieve' Standardized) ==
                                                               
 Path:                ability -> achieve                       
 Path:                ability -> courses -> achieve            
 Path:                ability -> motivate -> achieve           
 Path:                ability -> motivate -> courses -> achieve
 Function of Effects: 0.682                                    
 95.0% Bootstrap CI:  [0.643 to 0.717]                         

Computation of the Function of Effects:
 (((ability->achieve)
+(ability->courses->achieve))
+(ability->motivate->achieve))
+(ability->motivate->courses->achieve) 


Bias-corrected confidence interval formed by nonparametric
bootstrapping with 1000 bootstrap samples.

위의 모든 간접효과의 합은 중간의 걸친 변수들이 다 제거되었을 때의 효과와 동일

Source: p. 274, Multiple Regression and Beyond (3e) by Timothy Z. Keith

lm(achieve ~ ability + fam_back, data = achi) |> lm.beta::lm.beta() |> print()


Call:
lm(formula = achieve ~ ability + fam_back, data = achi)

Standardized Coefficients::
(Intercept)     ability    fam_back 
         NA   0.6816408   0.1327558

Spurious effect

단순 효과(상관계수) - 총 효과의 크기

능력(ability)이 성취(achievement)에 미치는 영향에 confounding으로 작용하는 것은 무엇인가?
집안배경(family background)의 요소로 인해 능력(ability)과 성취(achievement) 사이의 실질적인 관계의 크기는 얼마나 변하는가?

lm(achieve ~ ability, data = achi) |> lm.beta::lm.beta() |> print()


Call:
lm(formula = achieve ~ ability, data = achi)

Standardized Coefficients::
(Intercept)     ability 
         NA       0.737

achi |> select(achieve, ability) |> lowerCor(digits=3)

        achiv ablty
achieve 1.000      
ability 0.737 1.000

# simple effect - total effect
0.737 - 0.681

0.0559999999999999

연습문제

Table 12.1에서 familiy background에 대해서도 동일한 분석을 수행해보세요.
예전 salary 데이터에 대해서도 동일한 분석을 수행해보세요; 링크