Robust Missing Data Methods for Categorical Regression
Small Business Information
MARTINGALE RESEARCH CORPORATION, 3112 CONESTOGA DRIVE, PLANO, TX, 75074
AbstractDESCRIPTION (provided by applicant): Improved methods for obtaining robust statistical inferences from categorical regression models in the presence of missing data and model misspecification would be an invaluable tool to the epidemiological and health care research communities. Presently epidemiological models are typically designed to identify patterns of alcohol-related symptoms, define criteria of alcohol use disorders, and evaluate policies regulating use and distribution of alcoholic beverages. Such models frequently rely on datasets that contain incomplete-data. While commercially available statistical software provides some automated missing value procedures (e.g., data imputation, Expectation-Maximization), further theoretical and empirical research is required to develop more robust statistical methods. In its Phase I feasibility study Martingale Research successfully developed robust estimation and inference algorithms that combine recent advances in stochastic estimation, asymptotic statistics, and generalized logistic regression that are suited to categorical regression modeling for epidemiological problems in the presence of missing data and model misspecification. These results were verified in simulation studies and the methods were applied to an alcohol-related research problem. Additionally, new theoretical research that unifies missing data and model misspecification was developed to support the development of new robust missing data inferential statistics. Phase II research will extend Phase I findings to develop and implement new robust missing data methods for categorical regression modeling in the areas of: i) hypothesis testing on parameter estimates, ii) standard error estimation, iii) model selection criteria, and iv) specification testing. The Phase II experimental design will utilize Monte Carlo simulation bootstrapping methods for the purposes of evaluating the missing data methods using representative alcohol-related databases. Specifically, the simulation studies will empirically characterize the appropriateness of the large sample assumptions for both consistent estimation and statistical inference. These simulation study methodologies in conjunction with the new robust missing data methods will be integrated into a prototype user-friendly standalone software package for the purposes of supporting epidemiological and health related regression modeling. In summary, Phase II research will establish the essential technical foundation for Phase III commercialization with the long-term objective of providing a suite of new missing data handling methods as an advanced statistical tool for recession modeling that improves epidemiological and health-related research.
* information listed above is at the time of submission.