Worksheet FAQs

Does each row correspond to a single record right across the MLwiN worksheet?

No, MLwiN worksheets do not follow the principle that one row of the worksheet should correspond to one record right across all columns. Instead, it is up to the user to be aware of how the data is arranged in each column. Thus, a user might start with an imported dataset in which each row corresponds to one level 1 unit. They might then go on to create some variables in which each row still corresponds to one level 1 unit, for example by adding two of their original variables together, recoding a variable, or centring a variable around some value. They might then create some further variables in which each row corresponds to a level 2 unit, for example by taking the mean of some variable for each level 2 unit and then only keeping one record per level 2 unit. Or they might create a variable which had one row per level 1 unit but only contained observations on girls, the boys' observations being deleted from the variable. MLwiN will not complain if the columns in a worksheet are not the same length, and it will not assume that the rows of the worksheet correspond to the same records right across the columns. The user must be aware of what is in each column and use it appropriately, which means for example that when entering variables into a model or using them to plot a graph the user should be sure that each row corresponds to the same observation across all the variables. (Note that if you try to enter two variables of different lengths into a model or plot them against each other in a graph, then MLwiN will give an error message, but if you do this with two variables of the same length for which the rows do not correspond to the same observations, MLwiN will not know that there is a problem. This could happen if you have not included one of the variables when sorting your dataset or if you have two groups with the same number of individuals and have created two sets of variables, each set containing observations for just one group, and are now using one variable from each set).

Does it matter what order the data appears in in the worksheet?

For some purposes the order of the rows of data in the worksheet is not important, but if you want to run a model it is very important that the data is arranged so that all the cases belonging to the same highest level unit are on adjacent rows, and within highest level units, that all the cases belonging to the same unit at the next level down are on adjacent rows, and so on, right down to (within level 3 units) cases belonging to the same level 2 unit being on adjacent rows. In other words, the data should be sorted by level 1 units within level 2 units within ... within highest level units. (The exception to this is if you are using MCMC to fit a cross-classified model, since this does not rely on the order of the cases to determine which unit each case belongs to at each level; however it is still wise to sort the data in this way as far as possible since when IGLS is run this will give better starting values for MCMC).

Chapter 8 of the User's Guide to MLwiN ( available to download for free from here) gives details of how to sort the data in MLwiN; alternatively the data can be sorted in another package before importing to MLwiN, but the user should then check that the data is indeed correctly sorted after importing. See also the FAQ: Implausible results or convergence problems.

Other questions about worksheets

See also more about large datasets and getting data in and out

(Back to top)