Robust Classification Methods for Categorical Regression
Small Business Information
101 East Park Blvd., Suite 600, Plano, TX, 75074
AbstractDESCRIPTION (provided by applicant): Improving statistical methods to provide better classification performance and new analytical capabilities for categorical regression would be invaluable to the medical and health care research communities. Categorical regression models (e.g., binary logistic, multinomial logistic) are used extensively to identify patterns of alcohol-related symptoms, screen for disorders, and assess policies. In addition, such models are used extensively in other areas of research such as mental illness, cancer, traumatic injuries, and AIDS-related pathologies. However, many such models are developed with inadequate support to fully analyze and exploit the intrinsically probabilistic nature of their results. This is of critical importanc e as health researchers, clinicians, and administrators are often faced with classification decisions using categorical regression models to identify unacceptable risks, adequate outcomes, and acceptable guidelines for screening, diagnoses, treatment, and quality of care. Commercially available statistical software does not offer sophisticated methods for robust estimation of posterior probabilities in the presence of model misspecification, missing covariates, and nonignorable missing data generating proce sses. Such robust missing data handling methods provide natural mechanisms for dealing with verification bias and modeling correlated, longitudinal, or survey data with complex sampling designs. Moreover, commercially available statistical software does no t provide automated methods for using estimated posterior probabilities to make optimal classification decisions with respect to different optimality criteria. In particular, automated features such as optimizing multiple decision criteria (allocation rule s) that trade off specificity against sensitivity, decision threshold confidence intervals, statistical tests for evaluating correct specification of posterior probabilities, statistical tests for comparing competing classifier thresholds, and methods for multi-outcome classification and inference are not readily available. Phase II research will extend Phase I findings for binary logistic regression to develop and implement automated robust classification methods for multinomial logistic regression modelin g, which also applies to the larger class of nonlinear categorical regression models that output posterior probabilities. The Phase II software prototype will provide: 1) new user-selectable robust decision threshold estimators, 2) robust confidence interv als on decision threshold estimators, 3) new classifier threshold comparison tests, 4) new outcome probability specification tests, 5) efficient missing data handling methods in the presence of nonignorable nonresponse data, and 6) second-order analytic an d simulation-based Bayesian methods for improved small sample and rare event outcome probability estimation. These new methodologies will be integrated into a prototype user-friendly software package, evaluated with extensive simulation studies, and then a pplied to real world classification problems encountered in: alcohol, mental illness (depression, bipolar, schizophrenia), cancer (prostate), trauma (emergency room), and infectious disease (AIDS) through collaborations with domain experts in those respect ive fields. In summary, Phase II research will establish the essential technical foundation for Phase III commercialization with the objective of providing a suite of new classification analysis methods as an advanced statistical tool that improves epidemi ologic, clinical, and public health research.
* information listed above is at the time of submission.