Residuals - FAQs
See also: Residuals slide presentation with voice-over plus transcript >>
My residuals column is too short
Example question: I calculated the level 2 residuals for my model (with student as level 1 and tutorial group as level 2). I have 2124 individuals, and as expected I got 2124 residuals. But these residuals are all in the first 2124 rows of the worksheet. Since I have 3 students (level 1 units) per tutorial group (level 2 unit), the first 2124 rows should contain 2124/3 = 708 of the residuals, with the remaining residuals spread through the rest of the dataset. The rows after 2124 have no value in the residuals columns. How can I get the correct values of the residuals?
The output that you describe here is correct. MLwiN worksheets are not organised with one row corresponding to one record across the whole worksheet (see the FAQ Does each row correspond to a single record right across the MLwiN worksheet?). When you use the Residuals window to calculate the residuals at any level, the output MLwiN produces consists of just one entry in each output column for each unit at that level. The output is not arranged with missing values in all other rows corresponding to that unit in your original dataset, instead the values occupy consecutive places in their column with no gaps. So for example the residual in place 10 of c300 will be the residual for tutorial group 10, not the residual for tutorial group 4, even though row 10 of your original dataset corresponds to student 1 for tutorial group 4. The residuals will be in the same order as the tutorial groups in your original dataset, so in order to see which residual belongs to which tutorial group you will need to create a short column which contains the names/ codes of the tutorial groups, but only has one entry per tutorial group (so 2124 entries in all). To do this, from the Data manipulation menu select 'unreplicate'. The Take data window will appear. In the 'Take first entry in blocks defined by' drop down box, select the column which contains your tutorial groups, and also select this column under 'Input columns'. Under 'Output columns' select any free column. Select 'Add to action list' and 'Execute'. The column you chose under 'Output columns' will now have 2124 entries consisting of the code for each tutorial group in your dataset, and the rows of this column will correspond to the rows of your residuals columns, so you will be able to see which residual corresponds to which tutorial group.
Should the comparative SD output when I calculate the residuals be different for each row?
Yes, there will be a different comparative SD for each level 2 unit. This is because the comparative SD will depend on the number of level 1 units within each level 2 unit, and, in the case of slope residuals, also on the value of the explanatory variable the slope is associated with. To see how these things will affect the value, see the formula for the 'comparative covariance matrix' which is given in Harvey Goldstein's book Multilevel Statistical Models on p53 of the 3rd edition or p58-59 of the 2nd edition (downloadable for free). You can see an example of how the comparative SDs are different for each residual if you look at the caterpillar plot in Chapter 3 of the User's Guide (on p49). The error bars for each residual are a different length. The error bars are simply representing the residual +/- 1.96 times the comparative SD calculated by the residuals window, so this means that the value of the comparative SD is different for each residual.