Markov Chain Monte Carlo and Exact Logistic Regression
Department of Health and Human Services
Agency Tracking Number:
Solicitation Topic Code:
Small Business Information
CYTEL SOFTWARE CORPORATION
CYTEL SOFTWARE CORPORATION, 675 MASSACHUSETTS AVE, CAMBRIDGE, MA, 02139
Socially and Economically Disadvantaged:
AbstractDESCRIPTION (provided by applicant): Today, software for fitting logistic regression models to binary data belongs in the toolkit of every professional biostatistician, epidemiologist, and social scientist. A natural follow-up to this development is the adoption of exact logistic regression by mainstream biostatisticians and data analysts for any setting in which the accuracy of a statistical analysis based on large-sample maximum likelihood theory is in doubt. Cutting-edge researchers in biometry and numerous other fields have already recognized that it is necessary to supplement inference based on large-sample methods with exact inference for small, sparse and unbalanced data. The LogXact software package developed by Cytel Software Corporation fills this need. It has been used since its inception in 1993 to produce exact inferences for data generated from a wide range fields including clinical trials, epidemiology, disease surveillance, insurance, criminology, finance, accounting, sociology and ecology. In all these applications exact logistic regression was adopted because the limitations of the corresponding asymptotic procedures were clearly recognized in advance by the investigators and the exact inference was computationally feasible. But most of the time it will not be obvious whether asymptotic or exact methods are applicable. Ideally one would prefer to run both types of analyses if there is any doubt about the appropriateness of the asymptotic inference. However, because of the computational limits of the exact algorithms, investigators are currently inhibited from attempting the exact analysis. There is uncertainty about the how long the computations will take and even whether they will produce any results at all before the computer runs out of memory. The current project eliminates this uncertainty by introducing a new generation of numerical algorithms that utilize network based Monte Carlo rejection sampling. The Phase 1 progress report has demonstrated that these new algorithms can speed up the computations by factors of 50 to 1000 relative to what is currently available in LogXact. More importantly they can predict how long a job will take so that the user may decide whether to proceed at once or at a better time. The Phase 2 effort aims to incorporate this new generation of computing algorithms into future versions of LogXact.
* information listed above is at the time of submission.