Markov Chain Monte Carlo and Exact Logistic Regression

Award Information
Agency: Department of Health and Human Services
Branch: N/A
Contract: N/A
Agency Tracking Number: 1R43CA093112-01
Amount: $113,111.00
Phase: Phase I
Program: SBIR
Awards Year: 2001
Solicitation Year: N/A
Solicitation Topic Code: N/A
Solicitation Number: N/A
Small Business Information
HUBZone Owned: N
Woman Owned: N
Socially and Economically Disadvantaged: N
Principal Investigator
 () -
Business Contact
Phone: (617) 661-2011
Research Institution
DESCRIPTION (provided by applicant): Logistic regression is a very popular model for the analysis of binary data with widespread applicability in the physical, behavioral and biomedical sciences. Parameter inference for this model is usually based on maximizing the unconditional likelihood function. However unconditional maximum likelihood inference can produce inconsistent point estimates, inaccurate p-values and inaccurate confidence intervals for small or unbalanced data sets and for data sets with a large number of parameters relative to the number of observations. Sometimes the method fails entirely as no estimates can be found that maximize the unconditional likelihood function. A methodologically sound alternative approach that has none of the aforementioned drawbacks is the exact conditional approach in which one generates the permutation distributions of the sufficient statistics for the parameters of interest conditional on fixing the sufficient statistics of the remaining nuisance parameters at their observed values. The major stumbling block to this approach is the heavy computational burden it imposes. Monte Carlo methods attempt to overcome this problem by sampling from the reference set of possible permutations instead of enumerating them all. Two competing Monte Carlo methods are network based sampling and Markov Chain Monte Carlo (MCMC) sampling. Network sampling suffers from memory limitations while MCMC sampling can produce incorrect results if the Markov chain is not ergodic or if the process is not in the steady state. We propose a novel approach which combines the network and MCMC sampling, draws upon the strengths of each of them and overcomes their individual limitations. We propose to implement this hybrid network-MCMC method in our LogXact software and as an external procedure in the SAS system. PROPOSED COMMERCIAL APPLICATION: There is great demand for logistic regression software that can handle small, sparse or unbalanced data sets by exact methods. Our LogXact package is the only software that can provide exact inference for data sets which are not "toy problems". Yet even LogXact quickly breaks down on moderate sized problems. The new generation of hybrid network-MCMC algorithms will handle substantially larger problems that nevertheless need exact inference. The commercial potential is considerable since such data sets are common in scientific studies.

* Information listed above is at the time of submission. *

Agency Micro-sites

SBA logo
Department of Agriculture logo
Department of Commerce logo
Department of Defense logo
Department of Education logo
Department of Energy logo
Department of Health and Human Services logo
Department of Homeland Security logo
Department of Transportation logo
Environmental Protection Agency logo
National Aeronautics and Space Administration logo
National Science Foundation logo
US Flag An Official Website of the United States Government