Robust Classification Methods for Categorical Regression

Award Information
Agency: Department of Health and Human Services
Branch: N/A
Contract: 9R44CA139607-02A1
Agency Tracking Number: AA014302
Amount: $2,735,266.00
Phase: Phase II
Program: SBIR
Awards Year: 2008
Solicitation Year: N/A
Solicitation Topic Code: N/A
Solicitation Number: N/A
Small Business Information
101 East Park Blvd., Suite 600, Plano, TX, 75074
DUNS: 174995134
HUBZone Owned: N
Woman Owned: N
Socially and Economically Disadvantaged: N
Principal Investigator
 () -
Business Contact
Phone: () -
Research Institution
DESCRIPTION (provided by applicant): Improving statistical methods to provide better classification performance and new analytical capabilities for categorical regression would be invaluable to the medical and health care research communities. Categorical regression models (e.g., binary logistic, multinomial logistic) are used extensively to identify patterns of alcohol-related symptoms, screen for disorders, and assess policies. In addition, such models are used extensively in other areas of research such as mental illness, cancer, traumatic injuries, and AIDS-related pathologies. However, many such models are developed with inadequate support to fully analyze and exploit the intrinsically probabilistic nature of their results. This is of critical importanc e as health researchers, clinicians, and administrators are often faced with classification decisions using categorical regression models to identify unacceptable risks, adequate outcomes, and acceptable guidelines for screening, diagnoses, treatment, and quality of care. Commercially available statistical software does not offer sophisticated methods for robust estimation of posterior probabilities in the presence of model misspecification, missing covariates, and nonignorable missing data generating proce sses. Such robust missing data handling methods provide natural mechanisms for dealing with verification bias and modeling correlated, longitudinal, or survey data with complex sampling designs. Moreover, commercially available statistical software does no t provide automated methods for using estimated posterior probabilities to make optimal classification decisions with respect to different optimality criteria. In particular, automated features such as optimizing multiple decision criteria (allocation rule s) that trade off specificity against sensitivity, decision threshold confidence intervals, statistical tests for evaluating correct specification of posterior probabilities, statistical tests for comparing competing classifier thresholds, and methods for multi-outcome classification and inference are not readily available. Phase II research will extend Phase I findings for binary logistic regression to develop and implement automated robust classification methods for multinomial logistic regression modelin g, which also applies to the larger class of nonlinear categorical regression models that output posterior probabilities. The Phase II software prototype will provide: 1) new user-selectable robust decision threshold estimators, 2) robust confidence interv als on decision threshold estimators, 3) new classifier threshold comparison tests, 4) new outcome probability specification tests, 5) efficient missing data handling methods in the presence of nonignorable nonresponse data, and 6) second-order analytic an d simulation-based Bayesian methods for improved small sample and rare event outcome probability estimation. These new methodologies will be integrated into a prototype user-friendly software package, evaluated with extensive simulation studies, and then a pplied to real world classification problems encountered in: alcohol, mental illness (depression, bipolar, schizophrenia), cancer (prostate), trauma (emergency room), and infectious disease (AIDS) through collaborations with domain experts in those respect ive fields. In summary, Phase II research will establish the essential technical foundation for Phase III commercialization with the objective of providing a suite of new classification analysis methods as an advanced statistical tool that improves epidemi ologic, clinical, and public health research.

* Information listed above is at the time of submission. *

Agency Micro-sites

SBA logo
Department of Agriculture logo
Department of Commerce logo
Department of Defense logo
Department of Education logo
Department of Energy logo
Department of Health and Human Services logo
Department of Homeland Security logo
Department of Transportation logo
Environmental Protection Agency logo
National Aeronautics and Space Administration logo
National Science Foundation logo
US Flag An Official Website of the United States Government