Q: (Multiple imputation with minimal SAS) I want to use multiple imputation, but I don't want to do my analyses in SAS.

A: Of the major packages, SAS has the best support for multiple imputation. But it's possible to do your analyses in other packages.

As commonly practiced, multiple imputation analysis consists of three steps:

  1. Imputation. Create multiple copies of the data set. In each copy, fill in missing values with plausible random imputations.
  2. Analysis. Analyze each imputed data set separately, using complete-data methods.
  3. Synthesis. Combine the results of the separate analyses, using formulas that account for variation within and between the imputed data sets.

Step 1 is the hardest part. Here you need to use SAS PROC MI or other specialized imputation software . If you use SAS PROC MI, the software outputs a single data file containing all of the imputed data sets. The output file includes a variable called _imputation_ that tells you when each data set starts and ends. For the first imputed data set _imputation_=1, for the second, _imputation_=2, etc.

In step 2, you can continue to use SAS or switch to your favorite analysis software. Fit the same model to each of the imputed data sets. If you have 10 imputed data sets--i.e., 10 imputations--you'll get 10 sets of results.

In step 3, you combine your 10 sets of results to get a summary giving estimates, standard errors, and t tests that reflect variation within and between the imputed data sets. The formulas for this summary are straightforward, but there's no reason for everyone to work it out for themselves. Again, the formulas are implemented in SAS , but for the SAS-averse I have written a series of Excel spreadsheets that have the formulas built in. This example is designed for analyses using 10 imputations.

The spreadsheet consists of 11 worksheets. Paste your 10 sets of results into the worksheets called Imput1, Imput2, etc. The multiple imputation summary appears in the worksheet titled MI. You also need to fill in the yellow cells on the MI worksheet. In these cells, you tell the formulas how many imputed data sets you're using (10) and how many degrees of freedom the analysis would have if the data were complete. The complete-data degrees of freedom only matters for small data sets (Barnard & Rubin 1999). If you have a large data set and don't know the degrees of freedom, just fill in a large number like 5000.

The spreadsheet reports two numbers that will be unfamiliar to some users.

I have written spreadsheets for 2, 3, 4, 5, or 10 imputations. More than 10 imputations are rarely needed (Rubin 1987). For simple guidelines on choosing a number of imputations, see von Hippel (2005) .

References

Barnard, J. and Rubin, D.B. 1999. "Small-sample degrees of freedom with multiple imputation." Biometrika 86(4), 948-955.

Rubin, D.B. 1987. Multiple Imputation for Nonresponse in Surveys. New York: Wiley.

von Hippel, P.T. (2005). How Many Imputations Are Needed? A Comment on Hershberger and Fisher (2003)." Structural Equation Modeling, 12(2), 334-335.