Latent Profile & Latent Class Models

来源:百度文库 编辑:神马文学网 时间:2024/07/07 15:38:51

Applied Categorical & Nonnormal Data Analysis

Latent Profile & Latent Class Models


Introduction

Cluster analysis techniques and not the only way to find non-observed groupings in your data. In fact, from several perspectives cluster analysis may not be the best way to determine these groupings. There are several latent variable approaches that are available. In this unit we will explore two of them: Latent profile analysis and latent class analysis.

The advantages of these approaches over cluster analysis are that they are model based, generating probabilities for group membership. It is possible to test these models and to analyze their goodness of fit. The downside to this approach is that it requires sepcialized software that is more complex to run than typical statistical packages. We will demonstrate these techniques using the Mplus software from Muthén & Muthén. We will also use Stata for descriptive and subsidiary analyses.

Latent profile analysis will use continuous predictors and the latent class analysis will use binary predictor variables. We will use the reading, writing, math, science and social studies test scores from the hsb6a dataset. For the binary predictor variables we will do median splits on each of the tests to create hiread, hiwrite, himath, hisci and hiss.

Looking at the data

use hsb6adescribeContains data from hsb6a.dtaobs:           600                          highschool and beyond (600cases)vars:            23                          24 Oct 2003 14:18size:        31,200 (99.0% of memory free)-------------------------------------------------------------------------------storage  display     valuevariable name   type   format      label      variable label-------------------------------------------------------------------------------id              int    %9.0ggender          byte   %9.0g       glrace            byte   %12.0g      rlses             byte   %9.0g       slsch             byte   %9.0g       sclprog            byte   %9.0g       pllocus           float  %9.0g                  locus of controlconcept         float  %9.0g                  self-conceptmot             float  %9.0g                  motivationcareer          byte   %14.0g      cl         career choiceread            float  %9.0g                  reading scorewrite           float  %9.0g                  writing scoremath            float  %9.0g                  math scoresci             float  %9.0g                  science scoress              float  %9.0g                  social studies scorehiread          byte   %9.0ghiwrite         byte   %9.0ghimath          byte   %9.0ghisci           byte   %9.0ghiss            byte   %9.0gsum read write math sci ss hiread hiwrite himath hisci hissVariable |       Obs        Mean    Std. Dev.       Min        Max-------------+--------------------------------------------------------read |       600    51.90183    10.10298       28.3         76write |       600    52.38483    9.726455       25.5       67.1math |       600      51.849    9.414736       31.8       75.5sci |       600    51.76333    9.706179         26       74.2ss |       600    52.04567    9.879228       25.7       70.5-------------+--------------------------------------------------------hiread |       600        .525    .4997913          0          1hiwrite |       600         .54    .4988133          0          1himath |       600    .4966667    .5004061          0          1hisci |       600    .5266667     .499705          0          1hiss |       600    .6483333     .477889          0          1

A 2 Class Latent Profile Model

Data:File is I:\mplus\hsb6.dat ;Variable:Names areid gender race ses sch prog locus concept mot career read write mathsci ss hiread hiwrite himath hisci hiss academic;Usevariables areread write math sci ss ;classes = c(2);Analysis:Type=mixture;MODEL:%C#1%[read math sci ss write  * 30 ];%C#2%[read math sci ss write  * 60];OUTPUT:TECH8;SAVEDATA:file is lca_ex1.txt ;save is cprob;format is free;THE MODEL ESTIMATION TERMINATED NORMALLYTESTS OF MODEL FITLoglikelihoodH0 Value                       -5213.102Information CriteriaNumber of Free Parameters             16Akaike (AIC)                   10458.203Bayesian (BIC)                 10517.464Sample-Size Adjusted BIC       10466.721(n* = (n + 2) / 24)Entropy                            0.865FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZEBASED ON ESTIMATED POSTERIOR PROBABILITIESClass 1        123.03223          0.41011Class 2        176.96777          0.58989CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIPClass Counts and ProportionsClass 1              120          0.40000Class 2              180          0.60000Average Class Probabilities by Class1        2Class 1     0.961    0.039Class 2     0.043    0.957MODEL RESULTSEstimates     S.E.  Est./S.E.CLASS 1MeansREAD              43.151    0.820     52.641WRITE             44.524    1.024     43.485MATH              43.860    0.757     57.947SCI               43.322    1.051     41.239SS                45.119    0.946     47.707VariancesREAD              49.035    4.175     11.745WRITE             44.303    3.927     11.283MATH              45.062    3.768     11.958SCI               48.986    5.184      9.450SS                55.410    4.445     12.465CLASS 2MeansREAD              57.915    0.847     68.403WRITE             58.115    0.625     93.039MATH              57.136    0.800     71.386SCI               56.729    0.668     84.953SS                57.220    0.723     79.137VariancesREAD              49.035    4.175     11.745WRITE             44.303    3.927     11.283MATH              45.062    3.768     11.958SCI               48.986    5.184      9.450SS                55.410    4.445     12.465LATENT CLASS REGRESSION MODEL PARTMeansC#1               -0.364    0.179     -2.032QUALITY OF NUMERICAL RESULTSCondition Number for the Information Matrix              0.462E-03(ratio of smallest to largest eigenvalue)

A 3 Class Latent Profile Model

Data:File is I:\mplus\hsb6.dat ;Variable:Names areid gender race ses sch prog locus concept mot career read write mathsci ss hiread hiwrite himath hisci hiss academic;Usevariables areread write math sci ss ;classes = c(3);Analysis:Type=mixture;MODEL:%C#1%[read math sci ss write  *30 ];%C#2%[read math sci ss write  *45];%C#3%[read math sci ss write  *60];OUTPUT:TECH8;SAVEDATA:file is lca_ex2.txt ;save is cprob;format is free;THE MODEL ESTIMATION TERMINATED NORMALLYTESTS OF MODEL FITLoglikelihoodH0 Value                       -5100.544Information CriteriaNumber of Free Parameters             22Akaike (AIC)                   10245.087Bayesian (BIC)                 10326.571Sample-Size Adjusted BIC       10256.800(n* = (n + 2) / 24)Entropy                            0.877FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZEBASED ON ESTIMATED POSTERIOR PROBABILITIESClass 1         98.08460          0.32695Class 2        137.86474          0.45955Class 3         64.05066          0.21350CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIPClass Counts and ProportionsClass 1               99          0.33000Class 2              138          0.46000Class 3               63          0.21000Average Class Probabilities by Class1        2        3Class 1     0.961    0.039    0.000Class 2     0.021    0.940    0.039Class 3     0.000    0.068    0.932MODEL RESULTSEstimates     S.E.  Est./S.E.CLASS 1MeansREAD              41.866    0.614     68.208WRITE             43.080    0.870     49.514MATH              42.447    0.549     77.337SCI               41.409    0.748     55.358SS                44.232    0.819     54.010VariancesREAD              33.867    3.334     10.159WRITE             40.042    4.168      9.607MATH              28.667    2.980      9.619SCI               34.199    3.411     10.027SS                48.355    4.323     11.185CLASS 2MeansREAD              53.058    0.726     73.044WRITE             55.195    0.677     81.493MATH              52.704    0.683     77.191SCI               53.195    0.600     88.727SS                53.377    0.745     71.657VariancesREAD              33.867    3.334     10.159WRITE             40.042    4.168      9.607MATH              28.667    2.980      9.619SCI               34.199    3.411     10.027SS                48.355    4.323     11.185CLASS 3MeansREAD              64.588    0.949     68.070WRITE             61.318    0.624     98.232MATH              63.667    0.907     70.167SCI               62.043    0.873     71.064SS                62.139    0.827     75.163VariancesREAD              33.867    3.334     10.159WRITE             40.042    4.168      9.607MATH              28.667    2.980      9.619SCI               34.199    3.411     10.027SS                48.355    4.323     11.185LATENT CLASS REGRESSION MODEL PARTMeansC#1                0.426    0.201      2.120C#2                0.767    0.196      3.901QUALITY OF NUMERICAL RESULTSCondition Number for the Information Matrix              0.461E-03(ratio of smallest to largest eigenvalue)

A 2 Class Latent Class Model

Data:File is h:\mplus\hsb6.dat ;Variable:Names areid gender race ses sch prog locus concept mot career read write mathsci ss hiread hiwrite himath hisci hiss academic;Usevariables arehiread hiwrite himath hisci hiss ;categorical = hiread hiwrite himath hisci hiss;classes = c(2);Analysis:Type=mixture;MODEL:%C#1%[hiread$1 *2 himath$1 *2 hisci$1 *2 hiss$1 *2 hiwrite$1  *2 ];%C#2%[hiread$1 *-2 himath$1 *-2 hisci$1 *-2 hiss$1 *-2 hiwrite$1 *-2 ];OUTPUT:TECH8;SAVEDATA:file is lca_ex7.txt ;save is cprob;format is free;THE MODEL ESTIMATION TERMINATED NORMALLYTESTS OF MODEL FITLoglikelihoodH0 Value                        -849.157Information CriteriaNumber of Free Parameters             11Akaike (AIC)                    1720.315Bayesian (BIC)                  1761.057Sample-Size Adjusted BIC        1726.171(n* = (n + 2) / 24)Entropy                            0.815Chi-Square Test of Model Fit for the Latent Class Indicator Model PartPearson Chi-SquareValue                             44.642Degrees of Freedom                    20P-Value                           0.0012Likelihood Ratio Chi-SquareValue                             45.747Degrees of Freedom                    20P-Value                           0.0009FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZEBASED ON ESTIMATED POSTERIOR PROBABILITIESClass 1        123.33019          0.41110Class 2        176.66981          0.58890CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIPClass Counts and ProportionsClass 1              127          0.42333Class 2              173          0.57667Average Class Probabilities by Class1        2Class 1     0.930    0.070Class 2     0.030    0.970MODEL RESULTSEstimates     S.E.  Est./S.E.CLASS 1CLASS 2LATENT CLASS INDICATOR MODEL PARTClass 1ThresholdsHIREAD$1           2.273    0.424      5.354HIWRITE$1          1.376    0.276      4.990HIMATH$1           2.081    0.399      5.209HISCI$1            2.035    0.411      4.947HISS$1             0.642    0.231      2.780Class 2ThresholdsHIREAD$1          -1.540    0.264     -5.823HIWRITE$1         -1.488    0.244     -6.109HIMATH$1          -1.217    0.217     -5.616HISCI$1           -1.264    0.213     -5.927HISS$1            -2.047    0.279     -7.328LATENT CLASS REGRESSION MODEL PARTMeansC#1               -0.359    0.161     -2.231LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALEClass 1HIREADCategory 1         0.907    0.036     25.221Category 2         0.093    0.036      2.599HIWRITECategory 1         0.798    0.044     17.985Category 2         0.202    0.044      4.542HIMATHCategory 1         0.889    0.039     22.555Category 2         0.111    0.039      2.816HISCICategory 1         0.884    0.042     21.036Category 2         0.116    0.042      2.748HISSCategory 1         0.655    0.052     12.564Category 2         0.345    0.052      6.615Class 2HIREADCategory 1         0.177    0.038      4.592Category 2         0.823    0.038     21.417HIWRITECategory 1         0.184    0.037      5.031Category 2         0.816    0.037     22.288HIMATHCategory 1         0.228    0.038      5.980Category 2         0.772    0.038     20.197HISCICategory 1         0.220    0.037      6.015Category 2         0.780    0.037     21.288HISSCategory 1         0.114    0.028      4.043Category 2         0.886    0.028     31.304QUALITY OF NUMERICAL RESULTSCondition Number for the Information Matrix              0.654E-01(ratio of smallest to largest eigenvalue)

A 3 Class Latent Class Model

Data:File is h:\mplus\hsb6.dat ;Variable:Names areid gender race ses sch prog locus concept mot career read write mathsci ss hiread hiwrite himath hisci hiss academic;Usevariables arehiread hiwrite himath hisci hiss ;categorical = hiread hiwrite himath hisci hiss;classes = c(3);Analysis:Type=mixture;MODEL:%C#1%[hiread$1 *2 himath$1 *2 hisci$1 *2 hiss$1 *2 hiwrite$1  *2 ];%C#2%[hiread$1 *0 himath$1 *0 hisci$1 *0 hiss$1 *0 hiwrite$1  *0 ];%C#3%[hiread$1 *-2 himath$1 *-2 hisci$1 *-2 hiss$1 *-2 hiwrite$1 *-2 ];OUTPUT:TECH8;SAVEDATA:file is lca_ex8.txt ;save is cprob;format is free;THE MODEL ESTIMATION TERMINATED NORMALLYTESTS OF MODEL FITLoglikelihoodH0 Value                        -839.066Information CriteriaNumber of Free Parameters             17Akaike (AIC)                    1712.132Bayesian (BIC)                  1775.096Sample-Size Adjusted BIC        1721.182(n* = (n + 2) / 24)Entropy                            0.682Chi-Square Test of Model Fit for the Latent Class Indicator Model PartPearson Chi-SquareValue                             21.369Degrees of Freedom                    14P-Value                           0.0925Likelihood Ratio Chi-SquareValue                             25.564Degrees of Freedom                    14P-Value                           0.0294FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZEBASED ON ESTIMATED POSTERIOR PROBABILITIESClass 1         95.51732          0.31839Class 2        127.98211          0.42661Class 3         76.50058          0.25500CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIPClass Counts and ProportionsClass 1               94          0.31333Class 2              130          0.43333Class 3               76          0.25333Average Class Probabilities by Class1        2        3Class 1     0.913    0.087    0.000Class 2     0.074    0.826    0.099Class 3     0.000    0.163    0.837MODEL RESULTSEstimates     S.E.  Est./S.E.CLASS 1CLASS 2CLASS 3LATENT CLASS INDICATOR MODEL PARTClass 1ThresholdsHIREAD$1           2.883    0.671      4.296HIWRITE$1          1.735    0.418      4.150HIMATH$1           2.863    0.739      3.877HISCI$1            3.007    0.861      3.492HISS$1             0.991    0.319      3.106Class 2ThresholdsHIREAD$1          -0.392    0.348     -1.128HIWRITE$1         -0.451    0.445     -1.013HIMATH$1          -0.258    0.342     -0.754HISCI$1           -0.453    0.269     -1.688HISS$1            -1.201    0.400     -2.999Class 3ThresholdsHIREAD$1          -4.377    6.575     -0.666HIWRITE$1        -15.000    0.000      0.000HIMATH$1          -2.932    1.699     -1.726HISCI$1           -2.257    0.986     -2.289HISS$1            -3.761    2.143     -1.755LATENT CLASS REGRESSION MODEL PARTMeansC#1                0.222    0.398      0.558C#2                0.515    0.499      1.032LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALEClass 1HIREADCategory 1         0.947    0.034     28.108Category 2         0.053    0.034      1.574HIWRITECategory 1         0.850    0.053     15.951Category 2         0.150    0.053      2.815HIMATHCategory 1         0.946    0.038     25.073Category 2         0.054    0.038      1.431HISCICategory 1         0.953    0.039     24.648Category 2         0.047    0.039      1.219HISSCategory 1         0.729    0.063     11.577Category 2         0.271    0.063      4.298Class 2HIREADCategory 1         0.403    0.084      4.819Category 2         0.597    0.084      7.134HIWRITECategory 1         0.389    0.106      3.680Category 2         0.611    0.106      5.775HIMATHCategory 1         0.436    0.084      5.177Category 2         0.564    0.084      6.702HISCICategory 1         0.389    0.064      6.090Category 2         0.611    0.064      9.582HISSCategory 1         0.231    0.071      3.249Category 2         0.769    0.071     10.797Class 3HIREADCategory 1         0.012    0.081      0.154Category 2         0.988    0.081     12.253HIWRITECategory 1         0.000    0.000      0.000Category 2         1.000    0.000      0.000HIMATHCategory 1         0.051    0.082      0.620Category 2         0.949    0.082     11.641HISCICategory 1         0.095    0.085      1.120Category 2         0.905    0.085     10.700HISSCategory 1         0.023    0.048      0.477Category 2         0.977    0.048     20.530QUALITY OF NUMERICAL RESULTSCondition Number for the Information Matrix              0.323E-03(ratio of smallest to largest eigenvalue)


Categorical Data Analysis Course
Phil Ender -- 24apr03