Methods Analyzing Categorical Data
Small Business Information
Cytel Software Corp. (Currently CYTEL SOFTWARE CORPORATION)
675 Massachusetts Avenue, Cambridge, MA, 02139
AbstractBinary logistic regression and its extensions to unordered polytocous response, orderedpolytocous response, and Poisson response are among the most popular mathematical models for theanalysis of categorical data with widespread applicability in the biomedical sciences. The usual methodof inference for such models is unconditional maximum likelihood. For large well balanced data sets, orfor data with only a few parameters this approach is satisfactory. However, unconditional maximumlikelihood estimation can produce inconsistent point estimates, inaccurate p-values and inaccurateconfidence intervals for small or imbalanced data sets, and for sets with a large number of parametersrelative to the number of observations. Sometimes the method fails entirely as no estimates can befound which maximize the unconditional likelihood function. A methodologically sound alternativeapproach which as none of the above drawbacks is the exact conditional approach. Here one estimatesthe parameters of interest by computing the exact permutation distributions of their sufficient statistics,conditional on the observed values of the sufficient statistics for the remaining "nuisance" parameters.The major stumbling block to exact permutational inference has always been the heavy computationalburden it imposes. Despite the availability of fast numerical algorithms for the exact computations, therenumerous instances where a data set is tool large to be analyses by the exact methods, yet too sparseor imbalanced for the maximum likelihood approach to be reliable. What is needed is a reliable MonteCarlo alternative to the exact conditional approach which can bridge the gap between the exact andasymptotic methods of inference. The problem is technically hard because conventional Monte Carlomethods lead to massive rejection of samples that do not satisfy the constraints of the conditionaldistribution. We will build a network sampling approach to the Monte Carlo problem that we believe isa major break-through for this difficult but important problem.
* information listed above is at the time of submission.