********************************************************************************
* runmixregls - A PROGRAM TO RUN THE MIXREGLS MIXED-EFFECTS LOCATION SCALE
*               SOFTWARE FROM WITHIN STATA
*
*               George Leckie
*               Centre for Multilevel Modelling
*               University of Bristol
*
*               http://www.bristol.ac.uk/cmm/software/runmixregls/
********************************************************************************
* MIXREGLS is developed by Hedeker and Nordgren (2013).
*
* Hedeker, D. R. and Nordgren. 2013. MIXREGLS: A Program for Mixed-effects
*   Location Scale Analysis. Journal of Statistical Software, 52, 12, 1-38.
*   URL: http://www.jstatsoft.org/v52/i12.
********************************************************************************
* This do-file differs slightly from that released with the JSS article.
* We have improved runmixregls over time and that has required are to make 
* a few some edits to this do-file to make sure it still replicates the article
* We document these changes below.

********************************************************************************
* SECTION 1. Introduction
********************************************************************************


********************************************************************************
* SECTION 3. Review of the mixed-effects location scale model
********************************************************************************


********************************************************************************
* SECTION 3. Installing runmixregls
********************************************************************************

* Install the runmixregls command from the Statistical Software Components
* (SSC) archive. You only need to run this command once.
//ssc install runmixregls

* If you have already installed runmixregls from the SSC, you can check that
* you are using the latest version by typing the following command. 
* Uncomment this command if you wish to run it (i.e. remove the //).
//. adoupdate runmixregls



********************************************************************************
* SECTION 4. Syntax diagram
********************************************************************************

* View the runmixregls help file
help runmixregls



********************************************************************************
* SECTION 5. Example 1 - Reisby depression study
********************************************************************************

*-------------------------------------------------------------------------------
* SECTION 5.1. Data
*-------------------------------------------------------------------------------

* Load the Reisby data into memory
use "http://www.bristol.ac.uk/cmm/media/runmixregls/reisby", clear
// In contrast to the JSS article, here we refer to the version of the data 
// saved on the web.

* Describe the data contents
codebook, compact

* Recode missing values from -9 to Stata's system missing value
recode hamdep (-9 = .)

* Figure 1 - Spaghetti plot of depression trajectories plotted separately
* for non-endogenous (left panel) and endogenous subjects (right panel)
twoway (line hamdep week, connect(ascending)), xlabel(0(1)5) by(endog)



*-------------------------------------------------------------------------------
* SECTION 5.2. Model
*-------------------------------------------------------------------------------

* Declare the data to be panel data
xtset id

* Fit the model
runmixregls hamdep week endog endweek, ///
  between(endog) ///
  within(week endog)



*-------------------------------------------------------------------------------
* SECTION 5.3. Random effects ...
*-------------------------------------------------------------------------------

* Refit the model specifying the reffects() and residuals() options to
* retrieve the standardized empirical Bayes random effects and residuals,
* respectively
runmixregls hamdep week endog endweek, ///
   between(endog) ///
   within(week endog) ///
   reffects(theta) ///
   residuals(estd)
// In the original implementation of runmixregls as illustrated in the JSS 
// article, we specified the option reffects(theta1 theta2) to retrieve the 
// random location and scale effects. In the current implementation of 
// runmixregls, we have replaced this with reffects(theta) where runmixregls 
// will automatically name the two random effect variables theta0 and theta1.
   
* Rename the random location and scale effect to match the names given in the JRSSA article
rename theta1 theta2
rename theta0 theta1

* Fill in any missing values of theta1 and theta2
bysort id (theta1): replace theta1 = theta1[1]
bysort id (theta2): replace theta2 = theta2[1]
// In the original implementation of runmixregls, these will have been filled in 
// automatically which is why there is no reference to this step in the JSS 
// article

* Manually generate the standardized residuals as this option is not working in 
// the current implementation of runmixgels.

* First drop the original variable
drop estd

* Then save the predicted location
generate xb = _b[Mean:_cons] + _b[Mean:week]*week + _b[Mean:endog]*endog + _b[Mean:endweek]*endweek

* Next save the predicted between-subject random location variance
generate bsvar = exp(_b[Between:_cons] + _b[Between:endog]*endog)

* Then, save the within-subject residual variance 
generate wsvar = exp(_b[Within:_cons] + _b[Within:week]*week + _b[Within:endog]*endog + _b[Association:linear]*theta1 + _b[Scale:sigma]*theta2)

* Last calculate the residual by subtraction and standardize it by dividing 
* through by the within-subject residual variance
generate estd = (hamdep - xb - sqrt(bsvar)*theta1)/sqrt(wsvar)
   
* Figure 2 - Histogram of standardized model residuals
histogram estd, width(0.5) start(-3) frequency

* Figure 3 - Bivariate scatter plot of estimated random-location and -scale
* effects
scatter theta2 theta1, mlabel(id)

* Keep only those variables needed in subsequent tables
keep id hamdep week theta1 theta2

* Rename the response variable
rename hamdep hd

* Reshape the data from "long form" to "wide form"
reshape wide hd, i(id) j(week)

* Set the display precision of theta1 and theta2 to three decimal places
format %4.3f theta1 theta2

* Sort the data in descending order by theta2
gsort -theta2

* List the two subjects with the highest scale estimates
list id theta2 hd? if inlist(_n, 1, 2)

* List the two subjects with the lowest scale estimates
list id theta2 hd? if inlist(_n, 65, 66)

* Sort the data in ascending order by theta2
sort theta2

* List the two subjects in the bottom left corner of Figure 2
list id theta1 theta2 hd? if inlist(id, 117, 347)

* List the subject in the bottom right corner of Figure 2
list id theta1 theta2 hd? if inlist(id, 345)

* List the subject in the top left corner of Figure 2
list id theta1 theta2 hd? if inlist(id, 505)

* List the four subjects in the top right corner of Figure 2
list id theta1 theta2 hd? if inlist(id, 607, 322, 328, 360)



*-------------------------------------------------------------------------------
* SECTION 5.4. Additional analyses...
*-------------------------------------------------------------------------------

* Reload the Reisby data into memory
use http://www.bristol.ac.uk/cmm/media/runmixregls/reisby, clear
// In contrast to the JSS article, here we refer to the version of the data 
// saved on the web.

* Recode missing values from -9 to Stata's system missing value
recode hamdep (-9 = .)

* Declare the data to be panel data
xtset id

* Refit the model, removing the non-significant association between the WS
* variance and the random-location effects and increasing the number of
* integration points to 15
runmixregls hamdep week endog endweek, ///
   between(endog) ///
   within(week endog) ///
   association(none) ///
   intpoints(15)

* List the estimation results saved by runmixregls
ereturn list

* List the stage 2 parameter estimates
matrix list e(b_2)

* Make a copy of the stage 2 parameter estimates, e(b_2)
matrix b_2 = e(b_2)

* Make a copy of the stage 2 variance-covariance matrix, e(V_2)
matrix V_2 = e(V_2)

* Store the stage 2 coefficient vector and variance-covariance matrix
ereturn post b_2 V_2

* Display the stage 2 estimates table
ereturn display



********************************************************************************
* SECTION 6. Example 2 - Positive mood study
********************************************************************************

*-------------------------------------------------------------------------------
* SECTION 6.1. Data
*-------------------------------------------------------------------------------

* Load the Posmood data into memory
use http://www.bristol.ac.uk/cmm/media/runmixregls/posmood, clear
// In contrast to the JSS article, here we refer to the version of the data 
// saved on the web.

* Describe the data contents
codebook, compact

* Generate a unique observation identifier within each subject
bysort subject: gen observation = _n

* Label the new variable
label var observation "Observation"

* Declare the data to be panel data
xtset subject observation

* Figure 4 - Spaghetti plot of positive mood scores plotted for the first
* 20 subjects
xtline posmood if subject<=20, byopts(style(compact))



*-------------------------------------------------------------------------------
* SECTION 6.2. Model 1
*-------------------------------------------------------------------------------

* Fit Model 1
runmixregls posmood alone genderf, ///
   between(alone genderf) ///
   within(alone genderf)

* Store the estimation results
estimates store model1

* Generate subject group indicator variable
generate group = .
replace group = 1 if genderf==0 & alone==0
replace group = 2 if genderf==0 & alone==1
replace group = 3 if genderf==1 & alone==0
replace group = 4 if genderf==1 & alone==1

* Define value labels for the variable group
label define grouplabel 1 "Male, not alone" ///
                        2 "Male, alone" ///
                        3 "Female, not alone" ///
                        4 "Female, alone"

* Attach these value labels to the variable group
label values group grouplabel

* Predict the BS variance for each subject
predictnl BSvariance = exp(xb(Between))

* Tabulate the BS variance by subject group
tabstat BSvariance, by(group) nototal format(%4.3f)

* Redisplay the current (active) estimation results, but present the 
* parameter names rather than displaying the statistics for these
* coefficients.
runmixregls, coeflegend

* Predict the WS variance for each subject
predictnl WSvariance = exp(xb(Within) ///
  + 0.5 * (_b[Association:linear]^2 + _b[Scale:sigma]^2))

* Tabulate the WS variance by subject group
tabstat WSvariance, by(group) nototal format(%4.3f)

* Predict the ICC for each subject
predictnl ICC = exp(xb(Between))/(exp(xb(Between)) + exp(xb(Within) ///
  + 0.5 * (_b[Association:linear]^2 + _b[Scale:sigma]^2)))

* Tabulate the ICC by subject group
tabstat ICC, by(group) nototal format(%4.3f)



*-------------------------------------------------------------------------------
* SECTION 6.3. Model 2
*-------------------------------------------------------------------------------

* Generate the interaction between alone and genderf
generate algenf = alone * genderf

* Fit Model 2
runmixregls posmood alone genderf algenf, ///
   between(alone genderf algenf) ///
   within(alone genderf algenf)

* Store the estimation results
estimates store model2

* Perform a likelihood ratio test comparing Model 1 and 2
lrtest model1 model2

* Predict the gender difference in mean mood when subjects are alone
lincom _b[Mean:genderf]  + _b[Mean:algenf]



*-------------------------------------------------------------------------------
* SECTION 6.4. Model 3
*-------------------------------------------------------------------------------

* Fit Model 3
runmixregls posmood alone genderf algenf, ///
   between(alone genderf algenf) ///
   within(alone genderf algenf) ///
   association(quadratic)

* Store the estimation results
estimates store model3

* Perform a likelihood ratio test comparing Model 2 and 3
lrtest model2 model3



*-------------------------------------------------------------------------------
* SECTION 6.5. Additional analyses...
*-------------------------------------------------------------------------------

* Figure 5 - Predicted WS variance functions plotted against random-location
* effect, separately for each subject group
twoway ///
   (function exp(_b[Within:_cons] ///
    + _b[Association:linear] * x + _b[Association:quadratic] * x^2 ///
	+ 0.5 * (_b[Scale:sigma]^2)), range(-3 3)) ///
   (function exp(_b[Within:_cons] + _b[Within:alone] ///
    + _b[Association:linear] * x + _b[Association:quadratic] * x^2 ///
	+ 0.5 * (_b[Scale:sigma]^2)), range(-3 3)) ///
   (function exp(_b[Within:_cons] + _b[Within:genderf] ///
    + _b[Association:linear] * x + _b[Association:quadratic] * x^2 ///
    + 0.5 * (_b[Scale:sigma]^2)), range(-3 3)) ///
   (function exp(_b[Within:_cons] + _b[Within:alone] ///
   + _b[Within:genderf] + _b[Within:algenf] ///
   + _b[Association:linear] * x + _b[Association:quadratic] * x^2 ///
   + 0.5 * (_b[Scale:sigma]^2)), range(-3 3)), ///
   ytitle(WS variance) ///
   xtitle(Random-location effect) ///
   legend(order(1 "Male, not alone" ///
                2 "Male, alone" ///
                3 "Female, not alone" ///
                4 "Female, alone"))



********************************************************************************
* SECTION 7. Simulation study
********************************************************************************

* Clear existing data and set initial random number seed for replication
* purposes
clear
set seed 21561561

* Perform 1000 replications. Note, edit the 1/10 to 1/1000 to perform the
* full 1000 replications. We have reduced it to 10 replications here for
* illustrative pursposes as it takes a long time to run the full 1000
* replications.
forvalues r = 1/10 {

   * Display current replication number
   display "Replication = `r'"

   * Clear any existing data from memory 
   clear
   
   * Specify the new dataset to have 250 records, one for each subject
   set obs 250
   
   * Generate a subject identifier variable
   generate subject = _n
   
   * Generate the single subject-level covariate
   generate z = rnormal(0, 1)

   * Generate the standardized random-location effects
   generate theta1 = rnormal(0, 1)

   * Generate the standardized random-scale effects
   generate theta2 = rnormal(0, 1)

   * Expand the data from one record per subject to ten records per subject,
   * one for each observation.
   expand 10
   
   * Generate the observation identifier variable
   bysort subject: generate obs = _n

   * Generate the single observation-level covariate
   generate x = rnormal(0, 1)
   
   * Generate the BS standard deviation for the random-location effects
   generate sigma_v = sqrt(exp(-2.3))

   * Generate the BS standard deviation for the random-scale effects
   generate sigma_w = 0.45

   * Generate the WS variance
   generate sigma2_e = exp(-0.61 + 0.1 * x + 0.1 * z + sigma_w * theta2)
   
   * Generate the residual errors
   generate e = rnormal(0, sqrt(sigma2_e))

   * Generate the response variable
   generate y = 0 + 0.5 * x + 0.2 * z + sigma_v * theta1 + e

   * Declare the data to be a panel
   xtset subject

   * Fit the mixed-effect location scale model
   runmixregls y x z, within(x z) association(none)

   * Clear the existing data from memory 
   clear

   * Changes the number of records in the dataset to eight, one for each
   * parameter
   set obs 8
   
   * Generate a parameter identifier
   generate byte parameter = _n

   * Create a copy of the stage 2 coefficient vector
   matrix b_2 = e(b_2)'

   * Create a copy of the stage 2 variance-covariance matrix
   matrix V_2 = e(V_2)

   * Create a copy of the stage 3 coefficient vector
   matrix b_3 = e(b_3)'

   * Create a copy of the stage 3 variance-covariance matrix
   matrix V_3 = e(V_3)

   * Create a column vector of the stage 2 parameter standard errors as the
   * square roots of the leading diagonal of its variance-covariance matrix.
   matrix se_2 = vecdiag(cholesky(diag(vecdiag(V_2))))'

   * Create a column vector of the stage 3 parameter standard errors
   matrix se_3 = vecdiag(cholesky(diag(vecdiag(V_3))))'

   * Store stage 2 coefficient vector as a new variable
   svmat b_2

   * Store stage 2 standard error vector as a new variable
   svmat se_2

   * Store stage 3 coefficient vector as a new variable
   svmat b_3

   * Store stage 3 standard error vector as a new variable
   svmat se_3

   * Remove the suffix 1 suffix from the newly created variable names
   rename (*1) (*)

   * Set the display precision of the four variables to three decimal places
   format %4.3f b_2 se_2 b_3 se_3
  
   * List the dataset of stage 2 and stage 3 model results
   list, abbreviate(9) separator(0)

   * Save the model results to disk, saving the replication number in the
   * filename
   save "rep_`r'.dta", replace
   
}

* Clear the existing data from memory 
clear

* Generate a new variable to hold the replication number
generate rep = .

* Append the 1000 replications. Note, edit 1/10 to 1/1000 to append together
* the full 1000 replications. You must have first run the first loop for
* 1000 replications.
forvalues r = 1/10 {

   * Append the model results of the current replication to all previous
   * replications
   append using "rep_`r'"
   
   * Replace missing values of the replication identifier with the current
   * replication number
   replace rep = `r' if rep==.
   
}

* Generate a new variable which holds the true value associated with each
* parameter
generate truevalue = .
replace truevalue =  0.5  if parameter==1
replace truevalue =  0.2  if parameter==2
replace truevalue =  0.0  if parameter==3
replace truevalue = -2.3  if parameter==4
replace truevalue =  0.1  if parameter==5
replace truevalue =  0.1  if parameter==6
replace truevalue = -0.61 if parameter==7
replace truevalue =  0.45 if parameter==8

* Generate the percentage bias for each parameter
generate b_2_bias = 100 * (b_2 - truevalue)/truevalue
generate b_3_bias = 100 * (b_3 - truevalue)/truevalue

* Set the display precision of the percentage bias variables to be to the
* nearest integer
format %3.0f b_2_bias b_3_bias

* Generate coverage indicators for whether the true values are contained
* within the 95% confidence intervals
generate b_2_cov  = 100 * (inrange(truevalue, b_2 - invnorm(0.975) * se_2, ///
   b_2 + invnorm(0.975) * se_2))
generate b_3_cov  = 100 * (inrange(truevalue, b_3 - invnorm(0.975) * se_3, ///
   b_3 + invnorm(0.975) * se_3))

* Set the display precision of the coverage indicator variables to be to
* the nearest integer
format %3.0f b_2_cov b_3_cov

* Average the model results for each parameter over the 100 replications
collapse (mean) b_2 se_2 b_3 se_3 b_2_bias b_3_bias b_2_cov b_3_cov, ///
    by(parameter truevalue)

* Define value labels for the parameter identifier
label define parlabel 1 "beta_1" 2 "beta_2" 3 "beta_0" 4 "alpha_0" ///
   5 "tau_1" 6 "tau_2" 7 "tau_0" 8 "sigma_w"

* Attach these value labels to the parameter identifier
label values parameter parlabel

* List the averaged parameter estimates and the percentage bias statistics
list parameter truevalue b_2 b_3 b_2_bias b_3_bias, ///
   noobs abbreviate(9) separator(0)

* List the averaged standard errors
list parameter truevalue se_2 se_3, noobs abbreviate(9) separator(0)

* List the coverage rates
list parameter truevalue b_2_cov b_3_cov, noobs abbreviate(9) separator(0)



********************************************************************************
* SECTION 8. Conclusions
********************************************************************************



********************************************************************************
exit