/*
 THE FLIGNER-POLICELLO TEST:
 A SAS MACRO FOR COMPARING TWO GROUPS
 ASSUMING NEITHER NORMALITY NOR EQUAL VARIANCES

 PROGRAMMER: Paul von Hippel
*/

/*
 PURPOSE: Researchers often wish to test whether two groups differ in central tendency.
 There are a number of tests for doing this, but researchers should be aware of the
 assumptions on which these tests are based:
  (1) The pooled t-test assumes normality and equal variances.
  (2) Welch's t-test assumes normality but does not assume equal variances.
  (3) The Wilcoxon and the equivalent Mann-Whitney test do not assume normality,
but do assume equal spread and equal shape.
 If the population is not normal, the t-tests may approach validity for large samples,
 but are not valid for small samples.

 When the researcher wishes to assume neither normality nor equal variances, none of the
 tests above is appropriate. Instead, the Fligner-Policello test should be considered.
 The Fligner-Policello test assumes neither normality nor equal variances. It does not
 even assume that the two distributions have similar shape. It does make one assumption,
 which is satisfied when (but not only when) the distributions are symmetric.

 Unfortunately, the Fligner-Policello test has not, to my knowledge been implemented in
 commercial software. The SAS macro below is intended to make the Fligner-Policello
 test as accessible to SAS users as are the Wilcoxon and t-tests (which are implemented
 in the SAS procedures TTEST and NPAR1WAY).
*/

/*
 USE: The macro is called as follows
  %fligner_policello (data=mydataset, group=a_vs_b, variable=y)
 where mydataset is the name of your data set, a_vs_b is the name of a variable indicating
 which observations are in which group, and y is the name of the variable that you think
 may differ across the two groups.

 As an illustration, at the bottom of this file the macro is applied to data from a study
 of plasma glucose values in healthy and lead-poisoned Canada geese. The data are taken
 from Hollander & Wolfe (1999), Table 4.6, who in turn took them from March et al (1976).
 For this data set, the results obtained from the macro agree with those obtained in
 Hollander & Wolfe's (1999, pp. 136-8) illustrative hand calculation.
*/

/*
 REFERENCES
 Fligner, MA & Policello, GE. (1981). Robust rank procedures for the Behrens-Fisher problem.
  _Journal of the American Statistical Association_ 76(373): 162-168.
 Hollander, M & Wolfe, DA. (1999) Nonparametric statistical methods (2nd ed.). New York: Wiley.
 March, GL, John, TM, McKeown, BA, Sileo, L, & George, JC. (1976). The effects of lead
  poisoning on various plasma constituents in the Canada goose. _J. Wildl. Dis._ 12:14-19.
*/

%macro fligner_policello (data=, group=, variable=);
ods listing close;
proc sort data=&data;
 by &group;
run;

proc rank data=&data out=combined_rank_&data ties=mean;
 var &variable;
 ranks rank_combined;
run;

proc rank data=combined_rank_&data out=grouped_rank_&data ties=mean;
 var &variable;
 ranks rank_group;
 by &group;
run;

data placements;
 set grouped_rank_&data;
 placement = rank_combined - rank_group;
run;

proc means data=placements mean sum css n;
 var placement;
 by &group;
 ods output Summary=summary_stats;
run;

proc iml;
 use summary_stats;
 read all var { placement_mean placement_sum placement_css} into mean_sum_css;
 read all var { &group } into group;
 read all var { placement_n } into sample_size;
 numerator = mean_sum_css[1,2] - mean_sum_css[2,2];
 denominator = 2 * sqrt(
  mean_sum_css[1,3] + mean_sum_css[2,3]
  + mean_sum_css[1,1]*mean_sum_css[2,1]
 );
 fligner_policello = numerator / denominator;
 asymptotic_one_tailed_p =1-probnorm(abs(fligner_policello));
 asymptotic_two_tailed_p = 2*asymptotic_one_tailed_p;
 ods listing;
 title "For variable &variable...";
 print group sample_size;
 print fligner_policello;
 print "A positive value suggests that the first group has a larger median. A negative value suggests that the second group has a larger median.";
 print asymptotic_one_tailed_p;
 print asymptotic_two_tailed_p;
 print "NOTE: If both sample sizes are less than 12, the asymptotic p-value may be a poor approximation.";
 print "Correct one-tailed critical values for small samples can be found in Hollander & Wolfe, _Nonparametric statistical methods_, 2nd ed., Table A7.
  (For this statistic, one-tailed p-values are half as large as two-tailed p-values.)";
quit;
%mend fligner_policello;

data geese;
 input health $ plasma;
 datalines;
  healthy 297
  healthy 340
  healthy 325
  healthy 227
  healthy 277
  healthy 337
  healthy 250
  healthy 290
  poisoned 293
  poisoned 291
  poisoned 289
  poisoned 430
  poisoned 510
  poisoned 353
  poisoned 318
 ;
run;

%fligner_policello (data=geese, group=health, variable=plasma);