The Institute of Mixture Modeling for Equity-Oriented Researchers, Scholars, and Educators (IMMERSE) is an IES funded training grant (R305B220021) to support Education scholars in integrating mixture modeling into their research.
Please visit our website to learn more and apply for the year-long fellowship.
Follow us on Twitter!
How to reference this walkthrough: This work was supported by the IMMERSE Project (IES - 305B220021)
Visit our GitHub account to download the materials needed for this walkthrough.
Example: PISA Student Data
PISA
data collected in the U.S.
in 2015. To learn more about this data see here.PISA
data & documentation in
R use the following code:Variables:
broad_interest
composite measure of students’ self reported broad interest
enjoyment
composite measure of students’ self reported enjoyment
instrumental_mot
composite measure of students’ self reported instrumental motivation
self_efficacy
composite measure of students’ self reported self efficacy
#devtools::install_github("jrosen48/pisaUSA15")
#library(pisaUSA15)
Latent Profile Analysis (LPA) is a statistical modeling approach for estimating distinct profiles of variables. In the social sciences and in educational research, these profiles could represent, for example, how different youth experience dimensions of being engaged (i.e., cognitively, behaviorally, and affectively) at the same time. Note that LPA works best with continuous variables (and, in some cases, ordinal variables), but is not appropriate for dichotomous (binary) variables.
Many analysts have carried out LPA using a latent variable modeling approach. From this approach, different parameters - means, variances, and covariances - are freely estimated across profiles, fixed to be the same across profiles, or constrained to be zero. The MPlus software is commonly used to estimate these models (see here) using the expectation-maximization (EM) algorithm to obtain the maximum likelihood estimates for the parameters.
Different models (or how or whether parameters are estimated) can be specified and estimated. While MPlus is widely-used (and powerful), it is costly, closed-source, and can be difficult to use, particularly with respect to interpreting or using the output of specified models as part of a reproducible workflow
The code used to estimate LPA models in this walkthrough is from the
tidyLPA
package. TidyLPA
(source)
is an R package designed to estimate latent profile models using a tidy
framework. It can interface with Mplus via the MplusAutomation package,
enabling the estimation of latent profile models with different
variance-covariance structures.
model 1
Profile-invariant / Diagonal: Equal variances,
and covariances fixed to 0model 2
Profile-varying / Diagonal: Free variances and
covariances fixed to 0model 3
Profile-invariant / Non-Diagonal: Equal
variances and equal covariances
model 4
Free variances, and equal covariancesmodel 5
Equal variances, and free covariancesmodel 6
Profile Varying / Non-Diagonal: Free variances
and free covariancesProfile-invariant/diagonal:
Equal Variances: Variances are fixed to equality across the profiles (i.e., variances are constrained to be equal for each profile).
Covariances fixed to zero (i.e., the off-diagonal cells of the matrix are zero).
The most parsimonious model and the most restricted.
\[ \begin{pmatrix} \sigma^2_1 & 0 & 0 \\ 0 & \sigma^2_2 & 0 \\ 0 & 0 & \sigma^2_3 \\ \end{pmatrix} \]
Profile-varying/diagonal:
Free variances: Variances parameters are freely estimated across the profiles (i.e., variances vary by profile).
Covariances fixed to zero (i.e., the off-diagonal cells of the matrix are zero).
This model is more flexible and less parsimonious than model 1.
\[ \begin{pmatrix} \sigma^2_{1p} & 0 & 0 \\ 0 & \sigma^2_{2p} & 0 \\ 0 & 0 & \sigma^2_{3p} \\ \end{pmatrix} \]
Profile-invariant/ non-diagonal or unrestricted:
Equal variances: Variances are fixed to equality across profile. (i.e., variances are constrained to be same for each profile).
Equal Covariances: The covariances are now estimated and constrained to be equal.
\[ \begin{pmatrix} \sigma^2_1 & \sigma_{12} & \sigma_{13} \\ \sigma_{12} & \sigma^2_2 & \sigma_{23} \\ \sigma_{13} & \sigma_{23} & \sigma^2_3 \\ \end{pmatrix} \]
Varying means, varying variances, and equal covariances:
Free variances: Variances parameters are freely estimated across profiles (i.e., variances vary by profile).
Equal Covariances: Covariances are constrained to be equal.
This model is also considered to be an extension of Model 3.
\[ \begin{pmatrix} \sigma^2_{1p} & \sigma_{12} & \sigma_{13} \\ \sigma_{12} & \sigma^2_{2p} & \sigma_{23} \\ \sigma_{13} & \sigma_{23} & \sigma^2_{3p} \\ \end{pmatrix} \]
Varying means, equal variances, and varying covariances:
Equal variances: Variances are fixed to equality across the profiles. (i.e., variances are constrained to be same for each profile).
Free Covariances: Covariances are now freely estimated across the profiles.
This model is also considered to be an extension of Model 3.
\[ \begin{pmatrix} \sigma^2_{1} & \sigma_{12p} & \sigma_{13p} \\ \sigma_{12p} & \sigma^2_{2} & \sigma_{23p} \\ \sigma_{13p} & \sigma_{23p} & \sigma^2_{3} \\ \end{pmatrix} \]
Profile-varying / Non-diagonal:
Free variances: Variances parameters are freely estimated across profiles (i.e., variances vary by profile).
Free Covariances: Covariances are now freely estimated across the profiles.
This is the most complex and unrestricted model. It is also the least parsimonious
Note: The unrestricted model is also sometimes known as Model 4.
\[ \begin{pmatrix} \sigma^2_{1p} & \sigma_{12p} & \sigma_{13p} \\ \sigma_{12p} & \sigma^2_{2p} & \sigma_{23p} \\ \sigma_{13p} & \sigma_{23p} & \sigma^2_{3p} \\ \end{pmatrix} \]
library(naniar)
library(tidyverse)
library(haven)
library(glue)
library(MplusAutomation)
library(here)
library(janitor)
library(gt)
library(tidyLPA)
library(pisaUSA15)
library(cowplot)
library(filesstrings)
here::i_am("lpa_enum.Rmd")
pisa <- pisaUSA15[1:500,] %>%
dplyr::select(broad_interest, enjoyment, instrumental_mot, self_efficacy)
ds <- pisa %>%
pivot_longer(broad_interest:self_efficacy, names_to = "variable") %>%
group_by(variable) %>%
summarise(mean = mean(value, na.rm = TRUE),
sd = sd(value, na.rm = TRUE))
ds %>%
gt () %>%
tab_header(title = md("**Descriptive Summary**")) %>%
cols_label(
variable = "Variable",
mean = md("M"),
sd = md("SD")
) %>%
fmt_number(c(2:3),
decimals = 2) %>%
cols_align(
align = "center",
columns = mean
)
Descriptive Summary | ||
Variable | M | SD |
---|---|---|
broad_interest | 2.67 | 0.77 |
enjoyment | 2.82 | 0.72 |
instrumental_mot | 2.13 | 0.75 |
self_efficacy | 2.12 | 0.64 |
tidyLPA
Enumerate using estimate_profiles()
:
variances
and covariances
to
indicate the model you want to specify, in this example, we are
estimating all six models.# Run LPA models
lpa_fit <- pisa %>%
estimate_profiles(1:5,
package = "MplusAutomation",
ANALYSIS = "starts = 500 100;",
OUTPUT = "sampstat residual tech11 tech14",
variances = c("equal", "varying", "equal", "varying", "equal", "varying"),
covariances = c("zero", "zero", "equal", "equal", "varying", "varying"),
keepfiles = TRUE)
# Compare fit statistics
get_fit(lpa_fit)
# Move files to folder
files <- list.files(here(), pattern = "^model")
move_files(files, here("tidyLPA"), overwrite = TRUE)
CHECK EVERY SINGLE OUTPUT TO ENSURE MODELS CONVERGED AND MAKE NOTE OF MODELS THAT DID NOT!
…please :)
Mplus
Alternative method to estimate_profiles()
: Run
enumeration using mplusObject
method
You can change the model specification for LPA using the syntax provided in lecture.
When estimating LPA in Mplus, the default variance/covariance specification is the most restricted model (Model 1). So we don’t have to specify anything here.
lpa_k14 <- lapply(1:5, function(k) {
lpa_enum <- mplusObject(
TITLE = glue("Profile {k}"),
VARIABLE = glue(
"usevar = broad_interest-self_efficacy;
classes = c({k}); "),
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 500 100;",
OUTPUT = "sampstat svalues residual tech11 tech14;",
usevariables = colnames(pisa),
rdata = pisa)
lpa_enum_fit <- mplusModeler(lpa_enum,
dataout=glue(here("enum_lpa", "lpa_pisa")),
modelout=glue(here("enum_lpa", "c{k}_lpa_m1.inp")) ,
check=TRUE, run = TRUE, hashfilename = FALSE)
})
Here, an addition loop adds the variance/covariance specifications for each class-specific statement. For the profile-varying/diagonal specification, you must specify the variances to be freely estimated:
broad_interest-self_efficacy;
lpa_m2_k14 <- lapply(1:5, function(k){
# This MODEL section changes the model specification
MODEL <- paste(sapply(1:k, function(i) {
glue("
%c#{i}%
broad_interest-self_efficacy; ! variances are freely estimated
")
}), collapse = "\n")
lpa_enum_m2 <- mplusObject(
TITLE = glue("Profile {k} - Model 2"),
VARIABLE = glue(
"usevar = broad_interest-self_efficacy;
classes = c({k});"),
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 500 100;",
MODEL = MODEL,
OUTPUT = "sampstat svalues residual tech11 tech14;",
usevariables = colnames(pisa),
rdata = pisa)
lpa_m2_fit <- mplusModeler(lpa_enum_m2,
dataout = here("enum_lpa", "lpa_pisa"),
modelout = glue(here("enum_lpa","c{k}_lpa_m2.inp")),
check = TRUE, run = TRUE, hashfilename = FALSE)
})
For reference, here is the Mplus syntax for different specifications:
Fixed covariance to zero (DEFAULT):
broad_interest WITH enjoyment@0;
Free covariance:
broad_interest WITH enjoyment;
Equal covariances:
%c#1%
broad_interest WITH enjoyment (1);
%c#2%
broad_interest WITH enjoyment (1);
Equal variance (DEFAULT):
%c#1%
broad_interest (1);
%c#2%
broad_interest (1);
Free variance:
mth_scor-bio_scor;
You can also open the tidyLPA
.inp files to see the
specifications.
Evaluate each model specification separately using fit indices.
source("enum_table")
# Read in model
output_pisa <- readModels(here("tidyLPA"), quiet = TRUE)
enum_fit(output_pisa)
## # A tibble: 29 × 11
## Title Parameters LL BIC aBIC CAIC AWE BLRT_PValue T11_VLMR_PValue
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 " Mode… 8 -2089. 4227. 4201. 4235. 4300. NA NA
## 2 " Mode… 13 -1997. 4074. 4032. 4087. 4193. 0 0
## 3 " Mode… 18 -1953. 4017. 3960. 4035. 4183. 0 0.0087
## 4 " Mode… 23 -1889. 3921. 3848. 3944. 4133. 0 0
## 5 " Mode… 28 -1871. 3915. 3826. 3943. 4172. 0 0.0177
## 6 " Mode… 8 -2089. 4227. 4201. 4235. 4300. NA NA
## 7 " Mode… 17 -1989. 4083. 4029. 4100. 4239. 0 0
## 8 " Mode… 26 -1878. 3917. 3834. 3943. 4156. 0 0.0002
## 9 " Mode… 35 -1851. 3919. 3808. 3954. 4241. 0 0.454
## 10 " Mode… 44 -1825. 3923. 3783. 3967. 4327. 0 0.0113
## # ℹ 19 more rows
## # ℹ 2 more variables: BF <dbl>, cmPk <dbl>
enum_table(output_pisa, 1:5, 6:10, 11:15, 16:20, 21:25, 26:29)
Model Fit Summary Table1 | ||||||||||
Classes | Par | LL | BIC | aBIC | CAIC | AWE | BLRT | VLMR | BF | cmPk |
---|---|---|---|---|---|---|---|---|---|---|
Model 1 | ||||||||||
Model 1 With 1 Classes | 8 | −2,088.66 | 4,226.84 | 4,201.45 | 4,234.84 | 4,300.36 | – | – | 0.0 | <.001 |
Model 1 With 2 Classes | 13 | −1,996.51 | 4,073.50 | 4,032.24 | 4,086.50 | 4,192.97 | <.001 | <.001 | 0.0 | <.001 |
Model 1 With 3 Classes | 18 | −1,952.98 | 4,017.38 | 3,960.25 | 4,035.38 | 4,182.81 | <.001 | 0.009 | 0.0 | <.001 |
Model 1 With 4 Classes | 23 | −1,889.43 | 3,921.23 | 3,848.23 | 3,944.23 | 4,132.61 | <.001 | <.001 | 0.0 | 0.046 |
Model 1 With 5 Classes | 28 | −1,870.91 | 3,915.16 | 3,826.28 | 3,943.15 | 4,172.48 | <.001 | 0.018 | – | 0.954 |
Model 2 | ||||||||||
Model 2 With 1 Classes | 8 | −2,088.66 | 4,226.84 | 4,201.45 | 4,234.84 | 4,300.36 | – | – | 0.0 | <.001 |
Model 2 With 2 Classes | 17 | −1,988.99 | 4,083.22 | 4,029.26 | 4,100.22 | 4,239.45 | <.001 | <.001 | 0.0 | <.001 |
Model 2 With 3 Classes | 26 | −1,877.96 | 3,916.88 | 3,834.35 | 3,942.88 | 4,155.83 | <.001 | <.001 | 3.4 | 0.748 |
Model 2 With 4 Classes | 35 | −1,851.35 | 3,919.35 | 3,808.26 | 3,954.35 | 4,241.01 | <.001 | 0.454 | 6.3 | 0.217 |
Model 2 With 5 Classes | 44 | −1,825.33 | 3,923.04 | 3,783.38 | 3,967.04 | 4,327.41 | <.001 | 0.011 | – | 0.034 |
Model 3 | ||||||||||
Model 3 With 1 Classes | 14 | −1,968.35 | 4,023.36 | 3,978.93 | 4,037.36 | 4,152.02 | – | – | 0.1 | <.001 |
Model 3 With 2 Classes | 19 | −1,950.11 | 4,017.84 | 3,957.53 | 4,036.84 | 4,192.45 | <.001 | 0.002 | 0.0 | <.001 |
Model 3 With 3 Classes | 24 | −1,930.36 | 4,009.29 | 3,933.12 | 4,033.29 | 4,229.86 | <.001 | 0.171 | 0.0 | <.001 |
Model 3 With 4 Classes | 29 | −1,840.85 | 3,861.23 | 3,769.18 | 3,890.23 | 4,127.75 | <.001 | <.001 | >100 | 0.995 |
Model 3 With 5 Classes | 34 | −1,830.68 | 3,871.83 | 3,763.91 | 3,905.83 | 4,184.30 | <.001 | 0.118 | – | 0.005 |
Model 4 | ||||||||||
Model 4 With 1 Classes | 14 | −1,968.35 | 4,023.36 | 3,978.93 | 4,037.36 | 4,152.02 | – | – | 0.0 | <.001 |
Model 4 With 2 Classes | 23 | −1,930.96 | 4,004.30 | 3,931.30 | 4,027.30 | 4,215.67 | <.001 | <.001 | 0.0 | <.001 |
Model 4 With 3 Classes | 32 | −1,859.47 | 3,917.02 | 3,815.45 | 3,949.02 | 4,211.11 | <.001 | 0.033 | >100 | 1.000 |
Model 4 With 4 Classes | 41 | −1,844.84 | 3,943.49 | 3,813.35 | 3,984.48 | 4,320.29 | 0.286 | 0.637 | >100 | <.001 |
Model 4 With 5 Classes | 50 | −1,829.41 | 3,968.34 | 3,809.64 | 4,018.34 | 4,427.85 | 1.000 | 0.566 | – | <.001 |
Model 5 | ||||||||||
Model 5 With 1 Classes | 14 | −1,968.35 | 4,023.36 | 3,978.93 | 4,037.36 | 4,152.02 | – | – | 0.0 | <.001 |
Model 5 With 2 Classes | 25 | −1,927.10 | 4,008.95 | 3,929.60 | 4,033.95 | 4,238.71 | <.001 | <.001 | 20.7 | 0.953 |
Model 5 With 3 Classes | 36 | −1,896.08 | 4,015.01 | 3,900.75 | 4,051.01 | 4,345.86 | <.001 | 0.003 | >100 | 0.046 |
Model 5 With 4 Classes | 47 | −1,878.20 | 4,047.34 | 3,898.17 | 4,094.34 | 4,479.29 | <.001 | 0.288 | >100 | <.001 |
Model 5 With 5 Classes | 58 | −1,858.18 | 4,075.39 | 3,891.30 | 4,133.39 | 4,608.43 | 0.028 | 0.094 | – | <.001 |
Model 6 | ||||||||||
Model 6 With 1 Classes | 14 | −1,968.35 | 4,023.36 | 3,978.93 | 4,037.36 | 4,152.02 | – | – | 0.0 | 0.044 |
Model 6 With 2 Classes | 29 | −1,918.84 | 4,017.19 | 3,925.15 | 4,046.19 | 4,283.71 | <.001 | 0.004 | >100 | 0.956 |
Model 6 With 3 Classes | 44 | −1,885.73 | 4,043.84 | 3,904.18 | 4,087.83 | 4,448.21 | <.001 | 0.009 | >100 | <.001 |
Model 6 With 4 Classes | 59 | −1,854.21 | 4,073.66 | 3,886.39 | 4,132.66 | 4,615.89 | 0.078 | 0.043 | – | <.001 |
1 Note. Par = Parameters; LL = model log likelihood; BIC = Bayesian information criterion; aBIC = sample size adjusted BIC; CAIC = consistent Akaike information criterion; AWE = approximate weight of evidence criterion; BLRT = bootstrapped likelihood ratio test p-value; cmPk = approximate correct model probability. |
Look for “elbow” to help with profile selection
source("ic_plot")
ic_plot(output_pisa)
Based on fit indices, I am choosing the following candidate models:
Take the candidate models and recalculate the approximate correct model probabilities (Masyn, 2013)
# CmpK recalculation:
enum_fit1 <- enum_fit(output_pisa)
stage2_cmpk <- enum_fit1 %>%
slice(2, 8, 14, 18, 22, 27) %>%
mutate(SIC = -.5 * BIC,
expSIC = exp(SIC - max(SIC)),
cmPk = expSIC / sum(expSIC),
BF = exp(SIC - lead(SIC))) %>%
select(Title, Parameters, BIC:AWE, cmPk, BF)
# Format Fit Table
stage2_cmpk %>%
gt() %>%
tab_options(column_labels.font.weight = "bold") %>%
fmt_number(
7,
decimals = 2,
drop_trailing_zeros = TRUE,
suffixing = TRUE
) %>%
fmt_number(c(3:6),
decimals = 2) %>%
fmt_number(8,decimals = 2,
drop_trailing_zeros=TRUE,
suffixing = TRUE) %>%
fmt(8, fns = function(x)
ifelse(x>100, ">100",
scales::number(x, accuracy = .1))) %>%
tab_style(
style = list(
cell_text(weight = "bold")
),
locations = list(cells_body(
columns = BIC,
row = BIC == min(BIC[1:nrow(stage2_cmpk)])
),
cells_body(
columns = aBIC,
row = aBIC == min(aBIC[1:nrow(stage2_cmpk)])
),
cells_body(
columns = CAIC,
row = CAIC == min(CAIC[1:nrow(stage2_cmpk)])
),
cells_body(
columns = AWE,
row = AWE == min(AWE[1:nrow(stage2_cmpk)])
),
cells_body(
columns = cmPk,
row = cmPk == max(cmPk[1:nrow(stage2_cmpk)])
),
cells_body(
columns = BF,
row = BF > 10)
)
)
Title | Parameters | BIC | aBIC | CAIC | AWE | cmPk | BF |
---|---|---|---|---|---|---|---|
Model 1 With 2 Classes | 13 | 4,073.50 | 4,032.24 | 4,086.50 | 4,192.97 | 0 | 0.0 |
Model 2 With 3 Classes | 26 | 3,916.88 | 3,834.35 | 3,942.88 | 4,155.83 | 0 | 0.0 |
Model 3 With 4 Classes | 29 | 3,861.23 | 3,769.18 | 3,890.23 | 4,127.75 | 1 | >100 |
Model 4 With 3 Classes | 32 | 3,917.02 | 3,815.45 | 3,949.02 | 4,211.11 | 0 | >100 |
Model 5 With 2 Classes | 25 | 4,008.95 | 3,929.60 | 4,033.95 | 4,238.71 | 0 | 61.7 |
Model 6 With 2 Classes | 29 | 4,017.19 | 3,925.15 | 4,046.19 | 4,283.71 | 0 | NA |
You can also compare models using nested model testing directly with MplusAutomation. Note that you can only compare across models but the profiles must stay the same.
# MplusAutomation Method using `compareModels`
compareModels(output_pisa[["model_2_class_3.out"]],
output_pisa[["model_4_class_3.out"]], diffTest = TRUE)
##
## ==============
##
## Mplus model comparison
## ----------------------
##
## ------
## Model 1: C:/Users/dnajiarch/Box/IES_IMMERSE/youtube_materials/lpa_enum/tidyLPA/model_2_class_3.out
## Model 2: C:/Users/dnajiarch/Box/IES_IMMERSE/youtube_materials/lpa_enum/tidyLPA/model_4_class_3.out
## ------
##
## Model Summary Comparison
## ------------------------
##
## m1 m2
## Title model 2 with 3 classes model 4 with 3 classes
## Observations 488 488
## Estimator MLR MLR
## Parameters 26 32
## LL -1877.965 -1859.466
## AIC 3807.929 3782.932
## BIC 3916.877 3917.022
##
## MLR Chi-Square Difference Test for Nested Models Based on Loglikelihood
## -----------------------------------------------------------------------
##
## Difference Test Scaling Correction: 1.177167
## Chi-square difference: 31.4297
## Diff degrees of freedom: 6
## P-value: 0
##
## Note: The chi-square difference test assumes that these models are nested.
## It is up to you to verify this assumption.
##
## MLR Chi-Square Difference test for nested models
## --------------------------------------------
##
## Difference Test Scaling Correction:
## Chi-square difference:
## Diff degrees of freedom:
## P-value:
##
## Note: The chi-square difference test assumes that these models are nested.
## It is up to you to verify this assumption.
##
## =========
##
## Model parameter comparison
## --------------------------
## Parameters present in both models
## =========
##
## Approximately equal in both models (param. est. diff <= 1e-04)
## ----------------------------------------------
## paramHeader param LatentClass m1_est m2_est . m1_se m2_se . m1_est_se
## Means ENJOYMENT 1 2.968 2.968 | 0.012 0.021 | 238.242
## m2_est_se . m1_pval m2_pval
## 143.432 | 0 0
##
##
## Parameter estimates that differ between models (param. est. diff > 1e-04)
## ----------------------------------------------
## paramHeader param LatentClass m1_est m2_est . m1_se
## BROAD_IN.WITH ENJOYMENT 1 0.000 0.010 | 0.000
## BROAD_IN.WITH ENJOYMENT 2 0.000 0.010 | 0.000
## BROAD_IN.WITH ENJOYMENT 3 0.000 0.010 | 0.000
## BROAD_IN.WITH INSTRUMENT 1 0.000 -0.013 | 0.000
## BROAD_IN.WITH INSTRUMENT 2 0.000 -0.013 | 0.000
## BROAD_IN.WITH INSTRUMENT 3 0.000 -0.013 | 0.000
## BROAD_IN.WITH SELF_EFFIC 1 0.000 -0.029 | 0.000
## BROAD_IN.WITH SELF_EFFIC 2 0.000 -0.029 | 0.000
## BROAD_IN.WITH SELF_EFFIC 3 0.000 -0.029 | 0.000
## ENJOYMEN.WITH INSTRUMENT 1 0.000 -0.034 | 0.000
## ENJOYMEN.WITH INSTRUMENT 2 0.000 -0.034 | 0.000
## ENJOYMEN.WITH INSTRUMENT 3 0.000 -0.034 | 0.000
## ENJOYMEN.WITH SELF_EFFIC 1 0.000 -0.029 | 0.000
## ENJOYMEN.WITH SELF_EFFIC 2 0.000 -0.029 | 0.000
## ENJOYMEN.WITH SELF_EFFIC 3 0.000 -0.029 | 0.000
## INSTRUME.WITH SELF_EFFIC 1 0.000 0.056 | 0.000
## INSTRUME.WITH SELF_EFFIC 2 0.000 0.056 | 0.000
## INSTRUME.WITH SELF_EFFIC 3 0.000 0.056 | 0.000
## Means BROAD_INTE 1 2.912 2.926 | 0.049
## Means BROAD_INTE 2 3.205 3.264 | 0.086
## Means BROAD_INTE 3 2.257 2.159 | 0.082
## Means C1#1 Categorical.Latent.Variables -0.285 0.143 | 0.188
## Means C1#2 Categorical.Latent.Variables -0.891 -1.085 | 0.183
## Means ENJOYMENT 2 3.808 3.948 | 0.048
## Means ENJOYMENT 3 2.305 2.271 | 0.072
## Means INSTRUMENT 1 2.042 1.991 | 0.082
## Means INSTRUMENT 2 1.735 1.801 | 0.092
## Means INSTRUMENT 3 2.357 2.400 | 0.068
## Means SELF_EFFIC 1 2.072 2.086 | 0.057
## Means SELF_EFFIC 2 1.724 1.758 | 0.067
## Means SELF_EFFIC 3 2.330 2.294 | 0.049
## Variances BROAD_INTE 1 0.262 0.279 | 0.056
## Variances BROAD_INTE 2 0.405 0.384 | 0.104
## Variances BROAD_INTE 3 0.594 0.578 | 0.083
## Variances ENJOYMENT 1 0.010 0.051 | 0.003
## Variances ENJOYMENT 2 0.060 0.011 | 0.016
## Variances ENJOYMENT 3 0.397 0.448 | 0.044
## Variances INSTRUMENT 1 0.358 0.312 | 0.119
## Variances INSTRUMENT 2 0.636 0.797 | 0.134
## Variances INSTRUMENT 3 0.560 0.654 | 0.069
## Variances SELF_EFFIC 1 0.298 0.328 | 0.038
## Variances SELF_EFFIC 2 0.309 0.377 | 0.048
## Variances SELF_EFFIC 3 0.435 0.449 | 0.045
## m2_se . m1_est_se m2_est_se . m1_pval m2_pval
## 0.008 | 999.000 1.300 | 999.000 0.194
## 0.008 | 999.000 1.300 | 999.000 0.194
## 0.008 | 999.000 1.300 | 999.000 0.194
## 0.024 | 999.000 -0.530 | 999.000 0.596
## 0.024 | 999.000 -0.530 | 999.000 0.596
## 0.024 | 999.000 -0.530 | 999.000 0.596
## 0.023 | 999.000 -1.290 | 999.000 0.197
## 0.023 | 999.000 -1.290 | 999.000 0.197
## 0.023 | 999.000 -1.290 | 999.000 0.197
## 0.015 | 999.000 -2.252 | 999.000 0.024
## 0.015 | 999.000 -2.252 | 999.000 0.024
## 0.015 | 999.000 -2.252 | 999.000 0.024
## 0.008 | 999.000 -3.702 | 999.000 0.000
## 0.008 | 999.000 -3.702 | 999.000 0.000
## 0.008 | 999.000 -3.702 | 999.000 0.000
## 0.020 | 999.000 2.777 | 999.000 0.005
## 0.020 | 999.000 2.777 | 999.000 0.005
## 0.020 | 999.000 2.777 | 999.000 0.005
## 0.060 | 59.366 49.094 | 0.000 0.000
## 0.126 | 37.165 25.960 | 0.000 0.000
## 0.114 | 27.478 18.918 | 0.000 0.000
## 0.341 | -1.517 0.418 | 0.129 0.676
## 0.271 | -4.877 -4.013 | 0.000 0.000
## 0.016 | 80.133 251.321 | 0.000 0.000
## 0.149 | 32.038 15.214 | 0.000 0.000
## 0.051 | 24.923 39.148 | 0.000 0.000
## 0.110 | 18.769 16.401 | 0.000 0.000
## 0.093 | 34.505 25.743 | 0.000 0.000
## 0.044 | 36.269 47.523 | 0.000 0.000
## 0.076 | 25.718 23.035 | 0.000 0.000
## 0.074 | 47.308 30.997 | 0.000 0.000
## 0.053 | 4.671 5.292 | 0.000 0.000
## 0.222 | 3.887 1.729 | 0.000 0.084
## 0.099 | 7.189 5.844 | 0.000 0.000
## 0.019 | 3.283 2.630 | 0.001 0.009
## 0.004 | 3.701 3.089 | 0.000 0.002
## 0.070 | 9.011 6.373 | 0.000 0.000
## 0.061 | 3.008 5.143 | 0.003 0.000
## 0.178 | 4.738 4.479 | 0.000 0.000
## 0.069 | 8.137 9.507 | 0.000 0.000
## 0.032 | 7.771 10.250 | 0.000 0.000
## 0.067 | 6.484 5.626 | 0.000 0.000
## 0.054 | 9.689 8.370 | 0.000 0.000
##
##
## P-values that differ between models (p-value diff > 1e-04)
## -----------------------------------
## paramHeader param LatentClass m1_est m2_est . m1_se
## BROAD_IN.WITH ENJOYMENT 1 0.000 0.010 | 0.000
## BROAD_IN.WITH ENJOYMENT 2 0.000 0.010 | 0.000
## BROAD_IN.WITH ENJOYMENT 3 0.000 0.010 | 0.000
## BROAD_IN.WITH INSTRUMENT 1 0.000 -0.013 | 0.000
## BROAD_IN.WITH INSTRUMENT 2 0.000 -0.013 | 0.000
## BROAD_IN.WITH INSTRUMENT 3 0.000 -0.013 | 0.000
## BROAD_IN.WITH SELF_EFFIC 1 0.000 -0.029 | 0.000
## BROAD_IN.WITH SELF_EFFIC 2 0.000 -0.029 | 0.000
## BROAD_IN.WITH SELF_EFFIC 3 0.000 -0.029 | 0.000
## ENJOYMEN.WITH INSTRUMENT 1 0.000 -0.034 | 0.000
## ENJOYMEN.WITH INSTRUMENT 2 0.000 -0.034 | 0.000
## ENJOYMEN.WITH INSTRUMENT 3 0.000 -0.034 | 0.000
## ENJOYMEN.WITH SELF_EFFIC 1 0.000 -0.029 | 0.000
## ENJOYMEN.WITH SELF_EFFIC 2 0.000 -0.029 | 0.000
## ENJOYMEN.WITH SELF_EFFIC 3 0.000 -0.029 | 0.000
## INSTRUME.WITH SELF_EFFIC 1 0.000 0.056 | 0.000
## INSTRUME.WITH SELF_EFFIC 2 0.000 0.056 | 0.000
## INSTRUME.WITH SELF_EFFIC 3 0.000 0.056 | 0.000
## Means C1#1 Categorical.Latent.Variables -0.285 0.143 | 0.188
## Variances BROAD_INTE 2 0.405 0.384 | 0.104
## Variances ENJOYMENT 1 0.010 0.051 | 0.003
## Variances ENJOYMENT 2 0.060 0.011 | 0.016
## Variances INSTRUMENT 1 0.358 0.312 | 0.119
## m2_se . m1_est_se m2_est_se . m1_pval m2_pval
## 0.008 | 999.000 1.300 | 999.000 0.194
## 0.008 | 999.000 1.300 | 999.000 0.194
## 0.008 | 999.000 1.300 | 999.000 0.194
## 0.024 | 999.000 -0.530 | 999.000 0.596
## 0.024 | 999.000 -0.530 | 999.000 0.596
## 0.024 | 999.000 -0.530 | 999.000 0.596
## 0.023 | 999.000 -1.290 | 999.000 0.197
## 0.023 | 999.000 -1.290 | 999.000 0.197
## 0.023 | 999.000 -1.290 | 999.000 0.197
## 0.015 | 999.000 -2.252 | 999.000 0.024
## 0.015 | 999.000 -2.252 | 999.000 0.024
## 0.015 | 999.000 -2.252 | 999.000 0.024
## 0.008 | 999.000 -3.702 | 999.000 0.000
## 0.008 | 999.000 -3.702 | 999.000 0.000
## 0.008 | 999.000 -3.702 | 999.000 0.000
## 0.020 | 999.000 2.777 | 999.000 0.005
## 0.020 | 999.000 2.777 | 999.000 0.005
## 0.020 | 999.000 2.777 | 999.000 0.005
## 0.341 | -1.517 0.418 | 0.129 0.676
## 0.222 | 3.887 1.729 | 0.000 0.084
## 0.019 | 3.283 2.630 | 0.001 0.009
## 0.004 | 3.701 3.089 | 0.000 0.002
## 0.061 | 3.008 5.143 | 0.003 0.000
##
##
## Parameters unique to model 1: 0
## -----------------------------
##
## None
##
##
## Parameters unique to model 2: 0
## -----------------------------
##
## None
##
##
## ==============
Here, Model 1 (restricted, fewer parameters) is nested in Model 2. The chi-square difference test, assuming nested models, shows a significant improvement in fit for Model 2 over Model 1, despite the added parameters.
source("plot_lpa.txt")
plot_lpa(model_name = output_pisa$model_3_class_4.out)
Save figure
ggsave(here("figures", "model3_profile4.png"), dpi = "retina", bg = "white", height=5, width=8, units="in")
Use Mplus to calculate k-class confidence intervals (Note: Change the syntax to make your chosen *k*-class model):
classification <- mplusObject(
TITLE = "LPA - Calculated k-Class 95% CI",
VARIABLE =
"usevar = broad_interest-self_efficacy;
classes = c1(4);",
ANALYSIS =
"estimator = ml;
type = mixture;
starts = 0;
processors = 10;
optseed = 468036; ! This seed is taken from chosen model output
bootstrap = 1000;",
MODEL =
"
! This is copied and pasted from the chosen model input
%c1#1%
broad_interest (vbroad_interest);
enjoyment (venjoyment);
instrumental_mot (vinstrumental_mot);
self_efficacy (vself_efficacy);
broad_interest WITH enjoyment (broad_interestWenjoyment);
broad_interest WITH instrumental_mot (broad_interestWinstrumental_mot);
broad_interest WITH self_efficacy (broad_interestWself_efficacy);
enjoyment WITH instrumental_mot (enjoymentWinstrumental_mot);
enjoyment WITH self_efficacy (enjoymentWself_efficacy);
instrumental_mot WITH self_efficacy (instrumental_motWself_efficacy);
%c1#2%
broad_interest (vbroad_interest);
enjoyment (venjoyment);
instrumental_mot (vinstrumental_mot);
self_efficacy (vself_efficacy);
broad_interest WITH enjoyment (broad_interestWenjoyment);
broad_interest WITH instrumental_mot (broad_interestWinstrumental_mot);
broad_interest WITH self_efficacy (broad_interestWself_efficacy);
enjoyment WITH instrumental_mot (enjoymentWinstrumental_mot);
enjoyment WITH self_efficacy (enjoymentWself_efficacy);
instrumental_mot WITH self_efficacy (instrumental_motWself_efficacy);
%c1#3%
broad_interest (vbroad_interest);
enjoyment (venjoyment);
instrumental_mot (vinstrumental_mot);
self_efficacy (vself_efficacy);
broad_interest WITH enjoyment (broad_interestWenjoyment);
broad_interest WITH instrumental_mot (broad_interestWinstrumental_mot);
broad_interest WITH self_efficacy (broad_interestWself_efficacy);
enjoyment WITH instrumental_mot (enjoymentWinstrumental_mot);
enjoyment WITH self_efficacy (enjoymentWself_efficacy);
instrumental_mot WITH self_efficacy (instrumental_motWself_efficacy);
%c1#4%
broad_interest (vbroad_interest);
enjoyment (venjoyment);
instrumental_mot (vinstrumental_mot);
self_efficacy (vself_efficacy);
broad_interest WITH enjoyment (broad_interestWenjoyment);
broad_interest WITH instrumental_mot (broad_interestWinstrumental_mot);
broad_interest WITH self_efficacy (broad_interestWself_efficacy);
enjoyment WITH instrumental_mot (enjoymentWinstrumental_mot);
enjoyment WITH self_efficacy (enjoymentWself_efficacy);
instrumental_mot WITH self_efficacy (instrumental_motWself_efficacy);
!CHANGE THIS SECTION TO YOUR CHOSEN k-CLASS MODEL
%OVERALL%
[C1#1](c1);
[C1#2](c2);
[C1#3](c3);
Model Constraint:
New(p1 p2 p3 p4);
p1 = exp(c1)/(1+exp(c1)+exp(c2)+exp(c3));
p2 = exp(c2)/(1+exp(c1)+exp(c2)+exp(c3));
p3 = exp(c3)/(1+exp(c1)+exp(c2)+exp(c3));
p4 = 1/(1+exp(c1)+exp(c2)+exp(c3));",
OUTPUT = "cinterval(bcbootstrap)",
usevariables = colnames(pisa),
rdata = pisa)
classification_fit <- mplusModeler(classification,
dataout=here("mplus", "class.dat"),
modelout=here("mplus", "class.inp") ,
check=TRUE, run = TRUE, hashfilename = FALSE)
Create table
source("diagnostics_table")
class_output <- readModels(here("mplus", "class.out"))
diagnostics_table(class_output)
Model Classification Diagnostics | |||||
k-Class | k-Class Proportions | 95% CI | mcaPk | AvePPk | OCCk |
---|---|---|---|---|---|
Class 1 | 0.251 | [0.212, 0.294] | 0.258 | 0.946 | 52.276 |
Class 2 | 0.475 | [0.428, 0.521] | 0.471 | 0.971 | 37.007 |
Class 3 | 0.209 | [0.173, 0.244] | 0.207 | 0.983 | 218.844 |
Class 4 | 0.064 | [0.046, 0.087] | 0.064 | 0.996 | 3641.625 |
Hallquist, M. N., & Wiley, J. F. (2018). MplusAutomation: An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus. Structural equation modeling: a multidisciplinary journal, 25(4), 621-638.
Miller, J. D., Hoffer, T., Suchner, R., Brown, K., & Nelson, C. (1992). LSAY codebook. Northern Illinois University.
Muthén, B. O., Muthén, L. K., & Asparouhov, T. (2017). Regression and mediation analysis using Mplus. Los Angeles, CA: Muthén & Muthén.
Muthén, L.K. and Muthén, B.O. (1998-2017). Mplus User’s Guide. Eighth Edition. Los Angeles, CA: Muthén & Muthén
Rosenberg, J. M., van Lissa, C. J., Beymer, P. N., Anderson, D. J., Schell, M. J. & Schmidt, J. A. (2019). tidyLPA: Easily carry out Latent Profile Analysis (LPA) using open-source or commercial software [R package]. https://data-edu.github.io/tidyLPA/
R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/
Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686