Markov Chain Monte Carlo and Exact Logistic Regression

Award Information
Agency:
Department of Health and Human Services
Branch
n/a
Amount:
$113,111.00
Award Year:
2001
Program:
SBIR
Phase:
Phase I
Contract:
n/a
Award Id:
53900
Agency Tracking Number:
1R43CA093112-01
Solicitation Year:
n/a
Solicitation Topic Code:
n/a
Solicitation Number:
n/a
Small Business Information
675 MASSACHUSETTS AVE, CAMBRIDGE, MA, 02139
Hubzone Owned:
N
Minority Owned:
N
Woman Owned:
N
Duns:
n/a
Principal Investigator:
CYRUSMEHTA
() -
Business Contact:
(617) 661-2011
MEHTA@CYTEL.COM
Research Institute:
n/a
Abstract
DESCRIPTION (provided by applicant): Logistic regression is a very popular model for the analysis of binary data with widespread applicability in the physical, behavioral and biomedical sciences. Parameter inference for this model is usually based on maximizing the unconditional likelihood function. However unconditional maximum likelihood inference can produce inconsistent point estimates, inaccurate p-values and inaccurate confidence intervals for small or unbalanced data sets and for data sets with a large number of parameters relative to the number of observations. Sometimes the method fails entirely as no estimates can be found that maximize the unconditional likelihood function. A methodologically sound alternative approach that has none of the aforementioned drawbacks is the exact conditional approach in which one generates the permutation distributions of the sufficient statistics for the parameters of interest conditional on fixing the sufficient statistics of the remaining nuisance parameters at their observed values. The major stumbling block to this approach is the heavy computational burden it imposes. Monte Carlo methods attempt to overcome this problem by sampling from the reference set of possible permutations instead of enumerating them all. Two competing Monte Carlo methods are network based sampling and Markov Chain Monte Carlo (MCMC) sampling. Network sampling suffers from memory limitations while MCMC sampling can produce incorrect results if the Markov chain is not ergodic or if the process is not in the steady state. We propose a novel approach which combines the network and MCMC sampling, draws upon the strengths of each of them and overcomes their individual limitations. We propose to implement this hybrid network-MCMC method in our LogXact software and as an external procedure in the SAS system. PROPOSED COMMERCIAL APPLICATION: There is great demand for logistic regression software that can handle small, sparse or unbalanced data sets by exact methods. Our LogXact package is the only software that can provide exact inference for data sets which are not "toy problems". Yet even LogXact quickly breaks down on moderate sized problems. The new generation of hybrid network-MCMC algorithms will handle substantially larger problems that nevertheless need exact inference. The commercial potential is considerable since such data sets are common in scientific studies.

* information listed above is at the time of submission.

Agency Micro-sites


SBA logo

Department of Agriculture logo

Department of Commerce logo

Department of Defense logo

Department of Education logo

Department of Energy logo

Department of Health and Human Services logo

Department of Homeland Security logo

Department of Transportation logo

Enviromental Protection Agency logo

National Aeronautics and Space Administration logo

National Science Foundation logo
US Flag An Official Website of the United States Government