Sparse Cells, Inflated Odds: Bias Reduction Methods for Complex Survey Data with an Application to Hygiene Availability

Authors

DOI:

https://doi.org/10.29020/nybg.ejpam.v18i4.7150

Keywords:

Complex survey design, Sparse data, Firth penalized logistic regression (FPLR), Survey-weighted logistic regression (SLR), Multiple indicator cluster survey (MICS)

Abstract

Complex survey designs, such as UNICEF’s Multiple Indicator Cluster Surveys (MICS), often generate sparse cells across strata, leading to inflated odds ratios (ORs), wide confidence intervals (CIs), and convergence failures in survey-weighted logistic regression (SLR). Using Sudan MICS 2014 data on 16,679 households, this study aims to demonstrate a practical workflow to mitigate sparse-data bias: (1) SLR for population-representative estimates, (2) Firth penalized logistic regression (FPLR) to reduce bias from sparse cells, and (3) geographic collapsing of states into broader regions to improve stability. Results highlight implausibly large state-level ORs under SLR (e.g., OR=153.19 for Central Darfur) were substantially reduced after FPLR and regional collapsing. Substantively, strong gradients in handwashing facility access emerged by education, wealth, and geography, while sex of household head showed no effect. This study illustrates that combining design-based estimation, penalized likelihood, and collapsing strategies yields more stable inference in complex surveys with sparse data, offering practical guidance for applied researchers.

Downloads

Published

2025-11-05

Issue

Section

Mathematical Statistics

How to Cite

Sparse Cells, Inflated Odds: Bias Reduction Methods for Complex Survey Data with an Application to Hygiene Availability. (2025). European Journal of Pure and Applied Mathematics, 18(4), 7150. https://doi.org/10.29020/nybg.ejpam.v18i4.7150