I have a question regarding heteroscedasticity in binary logistic regression, and would be most grateful for any kind of feedback.

I am building a model for estimating credit risk using binary logistic regression on a set of panel data. If any of you are familiar with Moody’s Riskcalc model for private firms I am basically using the same approach.

Recently I came across a paper that briefly mentioned that heteroskedasticity (extra binomial variation) in panel data might cause problems in binary logistic regression. I have searched in papers and other literature for answers but it all seems quite confusing. Some seem to believe that heteroscedasticity isn’t a problem in logistic regression. Others say heteroskedasticity could be a potential problem that must be handled by using the estimation technique GEE instead of a standard logistic regression.

So my question to you is quite simple: do you agree that heteroskedasticity can cause such serious problems that it warrants the use of GEE instead of a standard model? What are the costs of using a standard model in the presence of heteroscedasticity? What tests could be used to detect the presence of heteroscedasticity?

The reason I am asking is that I’ve already developed my model. It took me several months of work considering data preparation, manual forward stepwise selections of variables (I didn’t have access to any good code that could automate the process) and out of time validation. The final did generate an out of time result that’s better than chance (measured in AR/OC/AUC). So it does work, even though it certainly could be improved. But as you might guess I am quite reluctant to make any changes in case they are not absolutely necessary.

Any input is much appreciated!

Best regards,

Majkel

PS.

For those of you who are interested here’s a short description of my paneldata:

I’ve used data stretching from 2001-2007 as my estimation sample. The data consists of a set of accounting ratios. Each accounting ratio makes up a separate vector (so I have one vector for all net income/assets ration observed between 2001-2007 for example, a second vector for all debt/equity and so on)and each observation matched with a dichotomous variable indicating whether the firm with the particular ratio defaulted within 1 year of the observation.

DS.

To specify matters a bit: as I mentioned above we’ve used GLM estimation (spss), but we did in fact try to reestimate our final model using GEE. The use of GEE generated the same weights but considerably lower standard deviations.