*****************************************************************************
* Stata do-file to replicate results given in the following presentation:
*
* TITLE:	Running MLwiN from within Stata: The runmlwin command
* AUTHORS:	George Leckie and Chris Charlton
* MEETING:	E-stat
* LOCATION:	Bristol
* DATE:		07-04-2011
*
* George Leckie and Chris Charlton
* Centre for Multilevel Modelling, 2011
*****************************************************************************
* The analysis in this do-file replicates and extends Scotish neighbourhood
* study analysis presented in:
*
*   Raudenbush, S. W. (1993). A crossed random effects model for unbalanced
*      data with applications in cross-sectional and longitudinal research.
*      Journal of Educational and Behavioral Statistics, 18, 321-349.
*
* The data are provided with the following book:
*
*   Rabe-Hesketh, S. and Skrondal, A. (2008) Multilevel and Longitudinal
*      Modeling Using Stata, 2nd Edition, Stata Press.
*
*****************************************************************************
* To run this do-file in Stata you must be working with:
*
* (1) Stata 9.2 or higher
*
* (2) The latest version of MLwiN
*
* (3) The latest version of the runmlwin command
*
* runmlwin will inform you if you are working with out of date versions of
* Stata or the MLwiN software package.
*
* You must also tell runmlwin where MLwiN is installed. One way you can do
* this is by specifying the MLwiN path by defining a global macro called
* MLwiN_path. For example:
*
* . global MLwiN_path C:\Program Files\MLwiN v2.25\mlwin.exe
*
* See the "Remarks on installation instructions" section of the runmlwin help
* file for further information:
*
* . help runmlwin
*
*****************************************************************************

* Change this global macro to point to your MLwiN path
///global MLwiN_path C:\Program Files (x86)\MLwiN v2.25\mlwin.exe

* Open the Raudenbush (1993) Scotish neighbourhood study data
use "http://www.stata-press.com/data/mlmus2/neighborhood.dta", clear

* Replicate Table 1 of Raudenbush (1993). The table shows that the data are
* cross-classified. Students are said to be nested within the cross-
* classification of schools by neighbourhoods. To account for both school
* and neighbourhood effects in our models of student attainment, we will
* need to fit cross-classified multilevel model using runmlwin.
table neighid schid if inrange(neighid,26,38) | inrange(neighid,251,263) ///
	| inrange(neighid,793,800)

* Replicate Table 3 of Raudenbush (1993). The table gives summary statistics.
egen pickone = tag(neighid)

tabstat deprive if pickone==1, stat(mean sd) format(%4.3f) columns(stats)

tabstat attain p7vrq p7read dadocc daded momed dadunemp male, ///
	stat(mean sd) format(%4.3f) columns(stats)


* Generate a unique student identifer variable which will be the level 1
* unit identifier variable in the runmlwin command
gen studentid = _n

* Generate a variable cons to act as the constant or intercept variable in
* the runmlwin models
gen cons = 1

* Sort the data by students within neighbourhoods (otherwise runmlwin will
* complain when we try to fit the following model)
sort neighid studentid

* Fit a two-level (students within neighbourhoods) variances components
* model to attain. This model is refered to as model 1 in Table 4 of
* Raudenbush (1993). Note, you will need to click the "Resume Macro" button
* twice in MLwiN to return the model results to the Stata output window.
runmlwin attain cons, ///
	level2(neighid: cons) ///
	level1(studentid: cons) nopause

* Store the model results
estimates store model1a

* To introduce school level random effects into the model, we must
* specify a cross-classified model to respect the non-hierarchical data
* structure. First we generate 17 school binary indicator variables, one
* for each school in the study
quietly tabulate schid, generate(s)

* Next we need to define 16 constraints to constrain the 17 variances at
* level 3 of the following model to be equal. The resulting single variance
* will give the estimate of the school level variance.
forvalues s = 1/16 {

	constraint define `s' [RP3]var(s`s') = [RP3]var(s17)

}

* Fit the two-level cross-classified variances compoents model. This model
* is refered to as model 2 in Table 4 of Raudenbush (1993) 
runmlwin attain cons, ///
	level3(cons: s1-s17, diagonal) ///
	level2(neighid: cons) ///
	level1(studentid: cons) ///
	constraints(1/16) nopause

* Store the model results
estimates store model2a

* Generate a female dummy variable
gen female = 1 - male

* Fit the previous model where we adjust for a range of student and family
* characteristics. This model is refered to as model 3 in Table 4 of
* Raudenbush (1993) 
runmlwin attain cons p7vrq p7read ///
	dadocc daded momed dadunemp female deprive, ///
	level3(cons: s1-s17, diagonal) ///
	level2(neighid: cons) ///
	level1(studentid: cons) ///
	constraints(1/16) nopause

* Store the model results
estimates store model3a

* Compare the results across all three models
estimates table model1a model2a model3a, stats(deviance) ///
	keep("FP1:" "RP2:" "RP3:" "RP1:") ///
	b(%9.3f) style(oneline)


* We will now fit Model 1 by MCMC. Restore the IGLS results for this model
* as we will use these as starting values for MCMC estimation.
estimates restore model1a

* Fit Model 1 by MCMC
runmlwin attain cons, ///
	level2(neighid: cons) ///
	level1(studentid: cons) ///
	mcmc(on) initsprevious ///
	nopause

* Store the model results
estimates store model1b


* To demonstrate how to fit cross-classified models using MCMC, we reconsider
* the simpler variance components model, model 2. Before we fit this model
* using MCMC, we must give runmlwin an initial value for each parameter
matrix b = (0,.075,.15,.8)

* Fit model 2 using MCMC
runmlwin attain cons, ///
	level3(schid: cons) ///
	level2(neighid: cons) ///
	level1(studentid: cons) ///
	mcmc(cc) initsb(b) ///
	nopause

* Store the model results
estimates store model2b

* Specify starting values for fitting Model 3 by MCMC
matrix b = (0,0,0,0,0,0,0,0,0,.1,.1,.1)

* Fit Model 3 by MCMC
runmlwin attain cons p7vrq p7read ///
	dadocc daded momed dadunemp female deprive, ///
	level3(schid: cons) ///
	level2(neighid: cons) ///
	level1(studentid: cons) ///
	mcmc(cc) initsb(b) ///
	nopause

* Store the model results
estimates store model3b

* Compare the results across all three models
estimates table model1b model2b model3b, stats(deviance) ///
	keep("FP1:" "RP2:" "RP3:" "RP1:") ///
	b(%9.3f) style(oneline)




*****************************************************************************
exit