Robust Missing Data Methods for Categorical Regression

Award Information
Agency:
Department of Health and Human Services
Branch
n/a
Amount:
$1,853,953.00
Award Year:
2004
Program:
SBIR
Phase:
Phase II
Contract:
2R44AA013768-02
Award Id:
60432
Agency Tracking Number:
AA013768
Solicitation Year:
n/a
Solicitation Topic Code:
n/a
Solicitation Number:
n/a
Small Business Information
MARTINGALE RESEARCH CORPORATION, 3112 CONESTOGA DRIVE, PLANO, TX, 75074
Hubzone Owned:
N
Minority Owned:
N
Woman Owned:
N
Duns:
n/a
Principal Investigator:
STEVENHENLEY
(972) 881-8370
STEVENH@MARTINGALE-RESEARCH.COM
Business Contact:
STEVENHENLEY
(972) 881-8370
STENENH@MARTINGALE-RESEARCH.COM
Research Institute:
n/a
Abstract
DESCRIPTION (provided by applicant): Improved methods for obtaining robust statistical inferences from categorical regression models in the presence of missing data and model misspecification would be an invaluable tool to the epidemiological and health care research communities. Presently epidemiological models are typically designed to identify patterns of alcohol-related symptoms, define criteria of alcohol use disorders, and evaluate policies regulating use and distribution of alcoholic beverages. Such models frequently rely on datasets that contain incomplete-data. While commercially available statistical software provides some automated missing value procedures (e.g., data imputation, Expectation-Maximization), further theoretical and empirical research is required to develop more robust statistical methods. In its Phase I feasibility study Martingale Research successfully developed robust estimation and inference algorithms that combine recent advances in stochastic estimation, asymptotic statistics, and generalized logistic regression that are suited to categorical regression modeling for epidemiological problems in the presence of missing data and model misspecification. These results were verified in simulation studies and the methods were applied to an alcohol-related research problem. Additionally, new theoretical research that unifies missing data and model misspecification was developed to support the development of new robust missing data inferential statistics. Phase II research will extend Phase I findings to develop and implement new robust missing data methods for categorical regression modeling in the areas of: i) hypothesis testing on parameter estimates, ii) standard error estimation, iii) model selection criteria, and iv) specification testing. The Phase II experimental design will utilize Monte Carlo simulation bootstrapping methods for the purposes of evaluating the missing data methods using representative alcohol-related databases. Specifically, the simulation studies will empirically characterize the appropriateness of the large sample assumptions for both consistent estimation and statistical inference. These simulation study methodologies in conjunction with the new robust missing data methods will be integrated into a prototype user-friendly standalone software package for the purposes of supporting epidemiological and health related regression modeling. In summary, Phase II research will establish the essential technical foundation for Phase III commercialization with the long-term objective of providing a suite of new missing data handling methods as an advanced statistical tool for recession modeling that improves epidemiological and health-related research.

* information listed above is at the time of submission.

Agency Micro-sites


SBA logo

Department of Agriculture logo

Department of Commerce logo

Department of Defense logo

Department of Education logo

Department of Energy logo

Department of Health and Human Services logo

Department of Homeland Security logo

Department of Transportation logo

Enviromental Protection Agency logo

National Aeronautics and Space Administration logo

National Science Foundation logo
US Flag An Official Website of the United States Government