Q: (Recoding a quantitative variable into categories) My model includes a regressor X (for example age) that is measured in quantitative units (for example years).  Results using this regressor are confusing. When I break X into artifical categories (20-29, 30-39, etc.), it seems to affect the response variable. But when I use X as originally coded, the effect goes away.

A: What you most likely have is a non-linear relationship. Your first result tells you that X has an effect. Your second result tells you that the effect is not linear.
    Non-linearity is actually quite common even when it does not make itself obvious this way. Non-linearity is an often overlooked aspect of model specification. Even if a model contains all plausible confounding variables, those variables may not be fully controlled if their effects are falsely thought to be non-linear.
    A common visual check for non-linearity is to fit the model using X in its original form, then plot the residuals against X. Sometimes the residual pattern will be curved, but since residuals are by definition noisy, it may be difficult to see any but the most obvious curvature.
    There are a few different ways to model non-linear effects:

  1. Break the regressor into a few different categories. You have already tried this. Whether it's satisfactory depends on how plausible the categories are. It makes a lot of sense to categorize a variable like education, since there are natural breakpoints when people complete certain degrees. Categorizing age makes less sense, since someone at the top of one category (age 29) may be little different from someone at the bottom of the next (age 30).
  2. Add polynomial terms such as X2 and X3. This is the most common textbook recommendation, but it may not be so easy to interpret, and it may not fit very well for large values of X. (When X is large, X2 and X3 are very large.)
  3. Use a simple transformation such as log(X) or 1/X. Again, a common textbook recommendation. While some relationships make a lot of sense on the transformed scale, others may be difficult to fit or interpret. Think about whether the expected relationship justifies the transformation you propose to use.
  4. Use a spline transformation. Splines are probably under-used in social research. They are flexible enough to capture a variety of non-linear relationships, but allow you to impose sensible constraints--for example, when you're sure that the effect of X is smooth and never changes direction. Splines are very helpful when you can't think of a simple transformation that captures the expected shape of the relationship, or when you're not sure what shape to expect. They are implemented in SAS PROC TRANSREG; the short article by Smith (1979) is a nice introduction. NB: Because they are so flexible, you will need to plot the spline transformation before you interpret the results.
  5. Use a local regression method. Local regression is even more flexible than spline transformation, but consequently easier to over-fit and harder to interpret. One type of local regression model is implemented in an experimental SAS procedure, PROC LOESS. A variety of local regression models are discussed in Hastie & Tibshirani (1990).



References

Hastie, TJ & Tibshirani, RJ. (1990). Generalized additive models. London: Chapman & Hall.
Smith, PL. (1979). Splines as a useful and convenient statistical tool. The American Statistician 33(2): 57-62.