Q: (Regression with sampling weights):
"I'm fitting a regression model to a survey that includes sampling weights. Should I use the weights in my analysis?"

A: We should distinguish between two types of sampling weights. By reading the survey documentation, you can often determine which type of weight you are dealing with. The following classification and advice comes from Winship and Radbill (1994):

  1. One type of weight is based entirely on recorded variables such as race and poverty. If your model incorporates these variables, including appropriate nonlinear and interaction terms, then you shouldn't do a weighted regression. In fact, a weighted regression will needlessly increase your standard errors, therby reducing the power of statistical tests and widening confidence intervals.
  2. Another type of weight is not based just on recorded variables, but is meant to compensate for sample features such as selection bias, or cluster sampling with probability proportional to size. If you have this type of weight, then you should do a weighted regression, since no recorded variables can replace the weights.

Suppose you have the first type of weight, but are not sure that your model incorporates all appropriate variables, interactions, and nonlinearities. You can check and improve your model using the following procedure (DuMouchel and Duncan 1983; see also Winship and Radbill 1994).

  1. Do not carry out a weighted analysis. Instead, include the weight variable as a regressor, and allow the weight variable to interact with all the other regressors in your model.
  2. If your model is correctly specified, then the weight variable and all its interactions will be insignificant.
  3. If the weight variable is significant, then your model probably omits an important variable. Consult the survey documentation to see what variables the weights are based on, and add those variables to your model.
  4. If the weight variable has a significant interaction, then your model probably omits an important interaction. For example, if weights are based on race and gender, and weights have a significant interaction with education, then your model probably needs an interaction between race and education, or between gender and education.
  5. Continue respecifying your model until none of the terms involving weight are significant. Then drop the weight terms and carry out an ordinary unweighted analysis.

If you must use sampling weights, you should be aware that many software packages do not use weights properly in standard error calculations. Winship and Radbill (1994) found that incorrect standard-error formulas were used in the basic regression routines of SPSS, Systat, STATA, and SAS. The resulting biases were not easy to predict; the true standard errors could be smaller or larger than those reported. SAS and STATA now incorporate correct standard-error formulas into their respective SVYREG and SURVEYREG procedures, but SPSS still has not implemented the correct formula. (I don't know about Systat.) A common expedient is to use SPSS with the weights rescaled to have a mean of 1. This helps, but does not fully correct the problem.

The advice above applies not only to normal regression, but "also...to probit, logit, and other types of generalized linear models" (Winship and Radbill 1994).

References

Winship, Christopher, and Larry Radbill. 1994. "Sampling Weights and Regression Analysis." Sociological Methods and Research 23(2):230-257.

DuMouchel, William H., and Greg Duncan. 1983. "Using Sample Survey Weights in Multiple Regression Analyses of Stratified Samples." Journal of the American Statistical Association 78(383):535-542.