Large datasets
How do I know MLwiN can handle the model based on my own data?
The capacity of MLwiN is determined entirely by the memory of your PC.
No of records x No of 'variables' = No of figures.
Memory required to store data: No of figures x 8 = Memory in bytes
Memory in bytes/1024 = Memory in kilobytes
Memory in kilobytes/1024 = Memory in megabytes
I'm having problems importing a large dataset from SAS
Example question: "I'm trying to get a 2.5 million dataset from SAS into MLWiN. I've tried to use a SAS macro and also, to just export my data to a text file. When I use the macro to try to get my data into MLwiN, it says that it's "scanning data" -- it slows down at about 600,000 and stops at approximately 740,000. It doesn't stop at the same place every time! (for example, once it stopped at 733,800 and once at 759,700). An error message pops up in a window: EXE file has encountered a problem and needs to close. We are sorry for the inconvenience. When I just try to import the text file, the same error message pops up. Wondering if you have any thoughts?!"
I would suggest importing a subset of your data, eg, a random sub sample of your higher level units. Even if you manage to get all the data in all but the simplest models would take a very long time to converge. A side benefit of using a sub-sample of data will be that you can check that you haven't overfit your model to your sub-sample of data by testing the final model on alternative sub-samples of the data. When model building you will want a considerably small sub-sample of your data, say 25,000 if the model is not too complex you can then estimate your final model on say 250,000.
I'm having problems working with a large dataset in MLwiN. Does MLwiN have a limit to the size of dataset that it can handle?
Example question: I am doing 2-level logistic modeling using MLwiN. My data has >250,000 individuals within >700 clusters. I have less than 10 covariates.
Try using a random sub-sample of your data. With only 2 levels, 700 level 2 units and 10 covariates you do not need nearly as many as 250000 observations to get precise estimates. You will be able to work a lot more quickly with a smaller sample and therefore be able to explore more avenues and potential models etc
… basically, it doesn't work for the whole data on MLwiN (even single level logistic) although it works for a subset smaller data (about 5000). I don't realize any problems for importing the entire whole data.
You can check whether the entire dataset has been imported properly by comparing the summary statistics: are they the same in MLwiN as in your other stats package?
Can MLwiN handle such a huge data?
Yes. But depends of course on the spec of your computer. See system requirements. The 64-bit version of MLwiN is far more likely to be able to handle data of this size.