Q: (Using PROC MI and PROC MIAnalyze)
I'm learning SAS PROC MI and SAS PROC MIAnalyze for producing and analyzing
multiply imputed data sets. What are the basics on using these, and what are some
common difficulties and workarounds?
A:
SAS provides detailed description and
documentation. In addition, my Powerpoint slides on missing data include an example that applies PROC MI
and MIAnalyze to a 3-variable dataset that is missing values for age and weight.
(Click here, and flip to pages 17-21.)
Below are some of the basics.
Analyses using this software proceed in three basic steps:
imputation,
analysis, and
synthesis. There are some common questions and problems which are discussed later on.
- Imputation. First you fill in the missing values with multiple
imputations.
PROC MI
DATA=/*data set with missing values*/
OUT=/*data set with values imputed*/
NIMPUTE=/*# of imputations per missing value*/;
VAR /*...variables in imputation model...*/;
RUN;
Here is some interpretation:
- After the DATA= option, you give the data set that you want to impute.
- After the OUT= option, you name the data set that will contain the imputed values.
The output data set will contain NIMPUTE versions of the original data set, each
version with different imputed values. The different versions of the data set are
indexed by a variable called _IMPUTATION_, which SAS creates.
- In the VAR statement, you list all the variables with missing values
along with all the variables that may be helpful for imputing missing values.
This list should include all the variables in your analysis model (below), as well as any
auxiliary variables that, while not part of your analysis, may be helpful in imputing
the variables that are.
- Analysis. Next you fit your model just as you would
if the data were complete. Using the BY statement, you fit the model separately for each version of the dataset.
PROC /*REG or LOGISTIC or...*/
DATA=/*imputed data set*/
MODEL /*dependent variable*/ = /*independent variables*/;
ODS OUTPUT
/*parameter estimate keyword*/=parameters
/*parameter covariance keyword*/=parameter_covariances;
BY _IMPUTATION_;
RUN;
The ODS statement uses the Output Delivery System to create a new data file
that contains your parameter estimates and the variances and covariances among
those estimates. There are separate estimates for each imputed data set. These
estimates will be used in the final step.
Unfortunately the ODS keywords are not consistent across procedures. You may need to look in the procedure documentation to find out the appropriate keywords.
For some
older procedures, you can create output data sets without using ODS.
There are some examples of which keywords go with various parameters starting on page 8 of
this document.
- Synthesis of results.
In the final step, you combine the results from the different imputed data sets.
The inputs to this step are the estimates, variances, and covariances
for the different imputed data sets. In the previous step, you saved these into data sets
called parameters and parameter_covariances.
PROC MIAnalyze
PARMS=parameters
COVB=parameter_covariances;
VAR intercept /*regressors*/ ;
RUN;
The output is a single set of estimates and
standard errors, as well as confidence intervals and t tests. The standard errors
account for the variation across imputed data sets, as well as the usual sampling
variation.
Common questions and problems.
- Q: How do I get summary statistics such as R squared?
A: MIAnalyze doesn't provide these, but you can get them pretty easily.
For each imputed data set, there is an estimate of R squared. Just average these estimates.
- Q: I'm getting an error regarding failure to converge because of singularity.
A: Singularity is another word for perfect collinearity. I have written a
page
on common
sources and remedies.
- Q: I'm getting an error regarding failure to impute a value within the bounds that I have specified.
A: For this reason, MI's BOUNDS options is not particularly helpful. I recommend
imputing without bounds and truncating the values afterward.
- Q: PROC MI software is designed to impute conditionally normal variables. How can I impute categorical variables?
A: This is a fairly tricky question. My advice is given
here.
- Q: What if I want to do my analyses in another program, such as Stata?
A: I can easily write a spreadsheet that will combine multiply imputed results. The
first person to approach me with this problem will get a spreadsheet tailored to their
needs.
- Q: Is Stata's impute command just as good?
A: No. It doesn't account for random variation, so it will impute the same value every time. Multiple imputation is based on imputing several random values, and
accounting for the variation among them.
- Q: Is SPSS's Missing Values Analysis (MVA) just as good?
MVA has serious problems which I summarize in a published
review.
It is not suitable for sociological research.