Q: (Combining estimates from imputed datasets) Using SAS, I'd like to combine results from an analysis of multiply imputed data, but I don't have the covariance matrix of the estimates.

A: The following macro does the same thing as the MIANALYZE procedure, but does not require the covariance matrix of the estimates. Instead, the macro only requires point estimates and standard errors from each imputed data set.

(Note: Beginning with version 9, SAS PROC MIANALYZE no longer requires the covariance matrix of the estimates. This reduces the need for my macro.)


PROGRAMMER: Paul von Hippel

MOST RECENT REVISION: August 26, 2005

PURPOSE: Given a file containing estimates and standard errors from several imputed data sets, return a single set of valid estimates and standard errors

OUTPUT: For each parameter, an estimate, standard error, t statistic, fraction of missing information, degrees of freedom, p values, and the lower and upper bound of a confidence interval (CI_LOWER, CI_UPPER).
(All formulas from Little and Rubin, 20002. Degrees of freedom uses the improved finite-sample formula of Barnard and Rubin, 1999.)

KNOWN LIMITATION: The estimates will be listed in alphabetical order.

EXAMPLE OF USAGE:

First make, say, 5 imputed copies of your data:

PROC MI DATA=mydata OUT=imputed NIMPUTE=5;
 VAR y x1 x2 etc;
RUN;

Then run 5 regressions, one for each imputed data set:

PROC REG DATA=imputed;
 MODEL y = x1 x2 etc;
 BY _IMPUTATION_;
 ODS OUTPUT PARAMETERESTIMATES=ests_imp;
RUN;

Not that the boldfaced line

ODS OUTPUT PARAMETERESTIMATES=ests_imp;
invokes the Output Delivery System (ODS) to create a results file that you've named ests_imp. (The name ests_imp is just an example. You can assign the results file any name you want.) The results file contains point estimates and standard errors from the 5 analyses of the 5 imputed data sets. In the REG procedure, a suitable results file can be obtained using the ODS keyword PARAMETERESTIMATES (as above). But if you're using a different procedure, a different keyword might be needed. Click here for a brief introduction to using ODS.

It's a really good idea to have a look at the results file. This way you can verify that it exists, and see what names have been assigned to the different columns of output.

PROC PRINT DATA=ests_imp;
RUN;

The %MI_ANALYZE macro will combine the 5 sets of results into a single set of valid estimates:

%MI_ANALYZE (
 OUTESTS=mi_ests,
 INESTS=ests_imp,
 EST=estimate,
 SE=stderr,
 LABEL=variable
);

%MI_ANALYZE has 5 required arguments (OUTESTS, INESTS, EST, SE,LABEL, DF_COMP). You need to fill in appropriate values for these arguments; the values used above (in lowercase) will not be appropriate in all settings.

Here is what the arguments mean:

The following arguments, which were not used in this example, are optional:

References

Little, R.J.A. (2002). Statistical Analysis with Missing Data, 2nd ed. New York: Wiley.

Barnard, J., and Rubin, D.B. (1999). "Small-sample degrees of freedom with multiple imputation." Biometrika 86(4), 948-955.


Macro code


%macro mi_analyze (inests=, outests=mi_ests, df_comp=5000,
 est=, se=, label=, confidence=.95, print=1);
data work.mi_input;
 set &inests;
 SESq = &se**2;
run;
proc sort data=work.mi_input;
  by &label;
run;
proc means data=work.mi_input mean var n;
 var &est SESq;
 by &label;
 ods output Summary=&outests;
run;
data &outests;
 set &outests;
 Est = &est._mean;
 num_imputations = &est._n;
 within_var = SESq_Mean;
 between_var = (1+1/num_imputations) * &est._Var;
 total_var = within_var + between_var;
 SE = sqrt (total_var);
 t = Est / SE;
 frac_missing = between_var / total_var; /* Can be 0 or 1 in rare cases */
 df_comp = &df_comp;
 df_obs = (&df_comp+1)/(&df_comp+3) * (1-frac_missing) * &df_comp;
 if frac_missing=0 then do;
  df_rubin=5000;
  df=df_obs;
 end;
 else do;
  df_rubin = (num_imputations - 1) / frac_missing**2; /* Rubin (1987). OK for large samples. */
  df = (1/df_rubin + 1/df_obs)**-1; /* Barnard & Rubin (1999). Better, esp. in small samples */
 end;
 p = 2 * (1-probt(abs(t),df));
 ci_half_width = se * tinv (1-(1-&confidence)/2, df);
 lcl = est - ci_half_width;
 ucl = est + ci_half_width;
 keep &label est se frac_missing num_imputations df_comp df lcl ucl t p;
run;
%if &print=1 %then %do;
proc print data=&outests;
run;
%end;
proc datasets library=work;
 delete mi_input;
run; quit;
%mend mi_analyze;