Q: (Using PROC MI and PROC MIAnalyze) I'm learning SAS PROC MI and SAS PROC MIAnalyze for producing and analyzing multiply imputed data sets. What are the basics on using these, and what are some common difficulties and workarounds?

A: SAS provides detailed description and documentation. In addition, my Powerpoint slides on missing data include an example that applies PROC MI and MIAnalyze to a 3-variable dataset that is missing values for age and weight. (Click here, and flip to pages 17-21.)

Below are some of the basics.

Analyses using this software proceed in three basic steps: imputation, analysis, and synthesis. There are some common questions and problems which are discussed later on.

  1. Imputation. First you fill in the missing values with multiple imputations.
    PROC MI
      DATA=
    /*data set with missing values*/
      OUT=/*data set with values imputed*/
      NIMPUTE=/*# of imputations per missing value*/;
     VAR /*...variables in imputation model...*/;
    RUN;

    Here is some interpretation:

  2. Analysis. Next you fit your model just as you would if the data were complete. Using the BY statement, you fit the model separately for each version of the dataset.
    PROC /*REG or LOGISTIC or...*/ 
      DATA=/*imputed data set*/
      MODEL /*dependent variable*/ = /*independent variables*/;
     ODS OUTPUT
      /*parameter estimate keyword*/=parameters
      /*parameter covariance keyword*/=parameter_covariances;
     BY _IMPUTATION_; 
    RUN;

    The ODS statement uses the Output Delivery System to create a new data file that contains your parameter estimates and the variances and covariances among those estimates. There are separate estimates for each imputed data set. These estimates will be used in the final step.

    Unfortunately the ODS keywords are not consistent across procedures. You may need to look in the procedure documentation to find out the appropriate keywords. For some older procedures, you can create output data sets without using ODS. There are some examples of which keywords go with various parameters starting on page 8 of this document.

  3. Synthesis of results. In the final step, you combine the results from the different imputed data sets. The inputs to this step are the estimates, variances, and covariances for the different imputed data sets. In the previous step, you saved these into data sets called parameters and parameter_covariances.
    PROC MIAnalyze
      PARMS=
    parameters
      COVB=parameter_covariances;
     VAR intercept /*regressors*/ ;
    RUN;

    The output is a single set of estimates and standard errors, as well as confidence intervals and t tests. The standard errors account for the variation across imputed data sets, as well as the usual sampling variation.

Common questions and problems.