Markov Chain Monte Carlo and Exact Logistic Regression
Small Business Information
CYTEL SOFTWARE CORPORATION
675 MASSACHUSETTS AVE, CAMBRIDGE, MA, 02139
AbstractDESCRIPTION (provided by applicant): Logistic regression is a very popular model for the analysis of binary data with widespread applicability in the physical, behavioral and biomedical sciences. Parameter inference for this model is usually based on maximizing the unconditional likelihood function. However unconditional maximum likelihood inference can produce inconsistent point estimates, inaccurate p-values and inaccurate confidence intervals for small or unbalanced data sets and for data sets with a large number of parameters relative to the number of observations. Sometimes the method fails entirely as no estimates can be found that maximize the unconditional likelihood function. A methodologically sound alternative approach that has none of the aforementioned drawbacks is the exact conditional approach in which one generates the permutation distributions of the sufficient statistics for the parameters of interest conditional on fixing the sufficient statistics of the remaining nuisance parameters at their observed values. The major stumbling block to this approach is the heavy computational burden it imposes. Monte Carlo methods attempt to overcome this problem by sampling from the reference set of possible permutations instead of enumerating them all. Two competing Monte Carlo methods are network based sampling and Markov Chain Monte Carlo (MCMC) sampling. Network sampling suffers from memory limitations while MCMC sampling can produce incorrect results if the Markov chain is not ergodic or if the process is not in the steady state. We propose a novel approach which combines the network and MCMC sampling, draws upon the strengths of each of them and overcomes their individual limitations. We propose to implement this hybrid network-MCMC method in our LogXact software and as an external procedure in the SAS system. PROPOSED COMMERCIAL APPLICATION: There is great demand for logistic regression software that can handle small, sparse or unbalanced data sets by exact methods. Our LogXact package is the only software that can provide exact inference for data sets which are not "toy problems". Yet even LogXact quickly breaks down on moderate sized problems. The new generation of hybrid network-MCMC algorithms will handle substantially larger problems that nevertheless need exact inference. The commercial potential is considerable since such data sets are common in scientific studies.
* information listed above is at the time of submission.