** STATA Heteroskedasticity Example Run log using x:\Het_example_out.log, replace infix age 5-6 ed 7-8 sex 13 race 14 region16 15 size 19-22 bwlaw 24 class10 29-30 /// memnum 31-32 contact 49 activism 50-51 using "s:\703 Kaufman\bwmarry." ** COMMENT reverse direction of class variable, create college dummy variable generate class=11-class10 generate eddum = ed> 12 & ed <. summarize ** COMMENT set list of names of independent variables global ivars "age ed eddum sex size class" ** COMMENT OLS results regress memnum $ivars, beta predict resmemst, rstand predict resmem, res predict predmem, xb generate ressq= resmemst^2 ** COMMENT Plot Squared Residual against suspicious variables: Ed, Eddum & Sex scatter ressq ed scatter ressq ed if ressq<20, name(education_no_outliers) scatter ressq eddum if ressq<20, name(college_no_outliers) scatter ressq sex if ressq<20, name(sex_no_outliers) ** COMMENT Illustrative: Plots against other variables Age, Size & Class scatter ressq age if ressq<20, name(age_no_outliers) scatter ressq size if ressq<20, name(size_no_outliers) scatter ressq class if ressq<20, name(soc_class_no_outliers) ** COMMENT Get Mean Squared Residual by Ed, Eddum & Sex Categories recode ed (0/3=1) (4/6=2) (7/9=3) (10/12=4) (13/15=5) (16/18=6) (19/20=7), gen (edcat) tabulate edcat, summarize(ressq) mean obs tabulate eddum, summarize(ressq) mean obs tabulate sex, summarize(ressq) mean obs ** COMMENT Illustrative: Mean Squared Residual by other variables recode age (18/27=1) (28/37=2) (38/47=3) (48/57=4) (58/67=5) (68/89=6), gen (agecat) recode size (0/50=1) (51/99=2) (100/199=3) (200/299=4) (300/399=5) (400/499=6) /// (500/599=7) (600/699=8) (700/799=9)(800/899=10) (900/999=11) (1000/8000=12), gen (sizecat) tabulate agecat, summarize(ressq) mean obs tabulate sizecat, summarize(ressq) mean obs tabulate class, summarize(ressq) mean obs ** COMMENT BP tests on OLS residuals estat hettest ed eddum, mtest(bon) estat hettest sex age size class, mtest(noadj) ** COMMENT Try unequal error variance as a linear function of ed mgls memnum $ivars, zvars(ed) zcon(yes) hvar(hetwgt) glsbeta memnum predict resgls1, res bpgls resgls1 hetwgt ed bpgls resgls1 hetwgt eddum bpgls resgls1 hetwgt "ed eddum" bpgls resgls1 hetwgt age bpgls resgls1 hetwgt class ** COMMENT Try unequal error variance by high versus low ed mgls memnum $ivars, zvars(eddum) zcon(yes) hvar(hetwgt) glsbeta memnum predict resgls2, res bpgls resgls2 hetwgt ed bpgls resgls2 hetwgt eddum bpgls resgls2 hetwgt "ed eddum" bpgls resgls2 hetwgt age bpgls resgls2 hetwgt class ** COMMENT White's test for heteroskedasticity quietly regress memnum $ivars, beta estat imtest, white ** COMMENT Goldfeld-Quandt test for heteroskedasticity quietly regress memnum age ed sex size class if eddum==1 scalar var1=e(rmse)^2 scalar df1=e(df_r) quietly regress memnum age ed sex size class if eddum==0 scalar var2=e(rmse)^2 scalar df2=e(df_r) sca list var1 var2 ftest var1 df1 var2 df2 "Goldfeld-Quandt for College vs Not" ** COMMENT Calculate Corrected OLS Var(B) & Display Corrected results quietly regress memnum $ivars predict resols, res mat b=e(b) qui gen varhet=1/hetwgt quietly summ varhet scalar trcome=r(sum) matrix accum xpx= $ivars matrix xpxinv=invsym(xpx) matrix accum hold= $ivars [pweight=varhet] matrix xpixomx = xpxinv*hold scalar trce= trcome - trace(xpixomx) matrix accum ssres= resols scalar sighat= ssres[1,1]/trce matrix varols= sighat*xpixomx*xpxinv ereturn post b varols display _newline(2)as text "OLS coefficients and Omega Corrected OLS Var(b)" _continue ereturn display ** COMMENT Calculate Long & Ervin's HC3 estimate of OLS Var(B) regress memnum $ivars, hc3 ** COMMENT White's estimate (HC0) of OLS Var(B) quietly regress memnum $ivars drop resols predict resols, res gen resolssq=resols^2 matrix accum xpeex = $ivars [pweight=resolssq] matrix whvar=xpxinv*xpeex*xpxinv mat b=e(b) ereturn post b whvar display _newline(2)as text "OLS coefficients and HC0: White's Corrected OLS Var(b)" _continue ereturn display ** COMMENT Test model misspecification as an alternative to heteroskedasticity ** COMMENT Try parabolic Ed gen edsq=ed^2 quietly regress memnum $ivars edsq estat hettest ed eddum edsq, mtest(bon) ** COMMENT Try log Ed gen lned=log(ed+1) quietly regress memnum age lned eddum sex size class estat hettest lned eddum , mtest(bon) ** COMMENT Try Ed by College interaction gen edcoll=ed*eddum quietly regress memnum $ivars edcoll estat hettest ed eddum edcoll, mtest(bon) ** COMMENT Try parabolic Age gen agesq=age^2 quietly regress memnum $ivars agesq estat hettest ed eddum agesq, mtest(bon) ** COMMENT Try parabolic Class gen classsq=class^2 quietly regress memnum $ivars classsq estat hettest ed eddum classsq, mtest(bon) ** COMMENT Sex by Ed, Eddum Interaction gen sexed=sex*ed gen sexeddum=sex*eddum quietly regress memnum $ivars sexed sexeddum estat hettest ed eddum sexed sexeddum, mtest(bon) ** MGLS with syntax for multi-variable linear form and no constant to estimate heteroskedasticity gen eddum2=1-eddum mgls memnum $ivars, zvars(eddum eddum2) zcon(no) hvar(hetwgt)