Methods Analyzing Categorical Data

Award Information
Agency: Department of Health and Human Services
Branch: N/A
Contract: 1 R43 CA64112-1,
Agency Tracking Number: 24887
Amount: $375,000.00
Phase: Phase II
Program: SBIR
Awards Year: 1997
Solicitation Year: N/A
Solicitation Topic Code: N/A
Solicitation Number: N/A
Small Business Information
675 Massachusetts Avenue, Cambridge, MA, 02139
HUBZone Owned: N
Woman Owned: N
Socially and Economically Disadvantaged: N
Principal Investigator
 Cyrus Mehta
 (617) 661-2011
Business Contact
Phone: () -
Research Institution
Binary logistic regression and its extensions to unordered polytocous response, orderedpolytocous response, and Poisson response are among the most popular mathematical models for theanalysis of categorical data with widespread applicability in the biomedical sciences. The usual methodof inference for such models is unconditional maximum likelihood. For large well balanced data sets, orfor data with only a few parameters this approach is satisfactory. However, unconditional maximumlikelihood estimation can produce inconsistent point estimates, inaccurate p-values and inaccurateconfidence intervals for small or imbalanced data sets, and for sets with a large number of parametersrelative to the number of observations. Sometimes the method fails entirely as no estimates can befound which maximize the unconditional likelihood function. A methodologically sound alternativeapproach which as none of the above drawbacks is the exact conditional approach. Here one estimatesthe parameters of interest by computing the exact permutation distributions of their sufficient statistics,conditional on the observed values of the sufficient statistics for the remaining "nuisance" parameters.The major stumbling block to exact permutational inference has always been the heavy computationalburden it imposes. Despite the availability of fast numerical algorithms for the exact computations, therenumerous instances where a data set is tool large to be analyses by the exact methods, yet too sparseor imbalanced for the maximum likelihood approach to be reliable. What is needed is a reliable MonteCarlo alternative to the exact conditional approach which can bridge the gap between the exact andasymptotic methods of inference. The problem is technically hard because conventional Monte Carlomethods lead to massive rejection of samples that do not satisfy the constraints of the conditionaldistribution. We will build a network sampling approach to the Monte Carlo problem that we believe isa major break-through for this difficult but important problem.

* Information listed above is at the time of submission. *

Agency Micro-sites

SBA logo
Department of Agriculture logo
Department of Commerce logo
Department of Defense logo
Department of Education logo
Department of Energy logo
Department of Health and Human Services logo
Department of Homeland Security logo
Department of Transportation logo
Environmental Protection Agency logo
National Aeronautics and Space Administration logo
National Science Foundation logo
US Flag An Official Website of the United States Government