Innovations in Teaching and Learning Genetics Edited by Patricia J. Pukkila

HOME

HELP

FEEDBACK

SUBSCRIPTIONS

Do-It-Yourself Statistics: A Computer-Assisted Likelihood Approach to Analysis of Data From Genetic Crosses

Leonard G. Robbins^a
^a Dipartimento di Biologia Evolutiva, Università di Siena, 53100 Siena, Italy and Genetics Program and Department of Zoology, Michigan State University, East Lansing, Michigan 48824-1312

Corresponding author: Leonard G. Robbins, Università di Siena, Via P. A. Mattioli 4, 53100 Siena, Italy., robbins{at}unisi.it (E-mail)

ABSTRACT

TOP
ABSTRACT
BASIC METHODOLOGY
EXAMPLES
DISCUSSION
LITERATURE CITED

Graduate school programs in genetics have become so full thatcourses in statistics have often been eliminated. In addition,typical introductory statistics courses for the "statisticsuser" rather than the nascent statistician are laden with methodsfor analysis of measured variables while genetic data are mostoften discrete numbers. These courses are often seen by studentsand genetics professors alike as largely irrelevant cookbookcourses. The powerful methods of likelihood analysis, althoughcommonly employed in human genetics, are much less often usedin other areas of genetics, even though current computationaltools make this approach readily accessible. This article introducesthe MLIKELY.PAS computer program and the logic of do-it-yourselfmaximum-likelihood statistics. The program itself, course materials,and expanded discussions of some examples that are only summarizedhere are available at http://www.unisi.it/ricerca/dip/bio_evol/sitomlikely/mlikely.html.

AS most of us still impress on our introductory genetics students,genetics started with the counting of offspring produced bycrosses. Although many of us now spend a large fraction of ourtime at a chemical bench, crosses, and the discrete data theygenerate, still remain a core tool in our work. Remarkably,the early synergism between genetics and statistics is now mostlyabsent from the pages of this journal. Virtually all of us arefamiliar with log of odds (LOD) score analysis of human geneticdata, and most of us can do a {chi} ² test against a priori expectationsor a {chi} ² contingency test. Nevertheless, in most articles in GENETICSthat contain cross data there is either no statistical analysisat all or transformation of the discrete data to frequencies—accompaniedby confidence intervals and statistics, if any, that were originallydevised for dealing with continuous variables. The questionswe are actually interested in asking go well beyond the fewmethods of discrete-data analysis we've learned, so we eitherrely on an eye-ball approach or fall back on the continuous-variablemethods taught in the usual statistics courses. Given the lossof power and plethora of mathematical and inferential pitfallsthat the latter entails, the eyeball may often be the betterinstrument.

This need not be the case, and in many areas of research thisis not the case. Areas as diverse as animal behavior, clinicaltrials, and signal and image processing are replete with powerfulexamples of discrete analysis. (The CCAR database, for example,contains nearly 3000 entries for "maximum-likelihood" for the3 years 1993–1995.) There are, of course, examples ingenetics as well: LOD scores (see CROW 1993 and MORTON 1995 for historical views of human gene mapping); sporadically appearing,but cumulatively numerous, applications of likelihood methodsto problems in formal genetics (a far from exhaustive sampleincludes KASTENBAUM 1958 ; SANDLER and KASTENBAUM 1958 ; ROBBINS 1971 , ROBBINS 1977 , ROBBINS 1999 ; SNOW 1979 ;KING and MORTIMER 1991 ; LYCKEGAARD and CLARK 1991 ; HILLIKERet al. 1994 ; MCPEECK and SPEED 1995 ; ZHAO et al. 1995A ,ZHAO et al. 1995B ); and widespread application by the mathematicalsophisticates of population genetics, quantitative genetics,and numerical taxonomy. WEIR 1994 , WEIR 1995 has also madea convincing case for the use of likelihood ratios in forensics,the newest area of applied population genetics. Yet the generalpicture in formal genetics may be seen with a simple count ofa random issue of GENETICS (May 1992; vol. 131, no. 4). Excludingpopulation and quantitative genetics articles, there were 13reports containing discrete data. Of the 13, only 2 (significantly<¹/₂; {chi} ² = 8.07, 1 d.f., P ~ 0.004) included any statisticalanalysis at all. Obviously, statistical tests may not have beenneeded in these articles; simple perusal of percentages canoften be convincing. Nevertheless, methods for analysis of discretedata are available, their use is not difficult, and they canbe revealing. They can also help us avoid designing complicatedcrosses that cannot, in the end, be analyzed.

Methods for discrete analysis have been a step-sister in statistics,but discrete multivariate analysis, with a thorough exegesisin BISHOP et al. 1975A classic book allows the same rigorousapproach to discrete data that conventional analysis of varianceprovides for measurement data. BISHOP et al. 1975A , however,is written for statisticians and can be intimidating. Perhapsthat is why these methods have not found their way into mostareas of genetics even though they are commonly used in wealthierfields such as clinical trials where professional statisticiansare routinely members of the team, and analytical power mustbe kept high to keep the number of human subjects low. For analysisof crosses, however, the full-blown artistry of discrete multivariateanalysis is not usually needed.

The value of a multivariate mode of thinking is well illustratedby the erroneous presentation of the a priori {chi} ² test in a popularintroductory genetics textbook (GRIFFITHS et al. 1993 ; awkwardlycorrected in GRIFFITHS et al. 1999 ). In their example, datafrom a test cross with an observed recombinant fraction under50% are used to test for linkage. Instead of the appropriatetest of a 1:1 ratio of parental:nonparental, however, GRIFFITHSet al. 1993 tests for a 1:1:1:1 ratio among all four productsof the test cross. Unfortunately, this is a test for Mendelianindependent assortment and not a test of whether the recombinantfraction is statistically different from 50%. This test compoundstesting for linkage and testing for equal recovery of reciprocalproducts; two variables that really need to be separated. Theyassert that the sample data do not support linkage ( {chi} ² = 5.2,3 d.f., P = 0.156), but done correctly there is, in fact, asignificant indication of linkage ( {chi} ² = 5.0, 1 d.f., P = 0.025),while there are no significant differences (for example, thereare no significant marker viability effects) in recovery ofreciprocal products ( {chi} ² = 0.202, 2 d.f., P = 0.904). In the newbook, a contingency test of statistically independent recoveryof the allelic combinations, in place of the a priori test forMendelian independent assortment, yields a result ( {chi} ² = 5.02,1 d.f., P = 0.025) very close to that of the simpler test fordeviation from a 1:1 ratio of parental:nonparental.

The foregoing example does not illustrate a need for analysisof maximum likelihood, nor even for the contingency test usedin GRIFFITHS et al. 1999 ; if the question had been correctlyposed, a simple a priori {chi} ² test would have been adequate. Itdoes, however, illustrate how failing to separate differentbiological processes can lead one astray. Testing for linkage,a recombination fraction under 50%, is not the same as testingfor all possible distortions of genotype frequencies.

Not every problem in formal genetics can be resolved just withclear thinking and {chi} ² tests. The advent of powerful personalcomputers, however, makes the methods that are needed accessibleto those who, like myself, are neither mathematicians, statisticians,nor professional programmers. Several of the most common questionsgeneticists must contend with can be asked: What are the bestestimates of genetic parameters? Does a hypothesis adequatelyaccount for the observed effects? How can we test whether anexperiment and control respond differently to a variable we'reinterested in, when both the experiment and control are alsoaffected by some other variable? Is there significant variationin what we're scoring? Is there a correlation between two variables?How important is the correlation? Moreover, with use of thecomputer allowing us to strip away much of the mathematicalcomplexity, the major task left for the geneticist is the cleardefinition of the question(s) to be asked.

With the hope of creating an enhanced awareness of these methodsamong the next generation of geneticists, a set of real-worldexamples and a program for numerical approximation of maximumlikelihoods were used as the core of a graduate-level courseoffered first in 1996 at Michigan State University and againin 1998 at the University of Siena. The course presented a guideto this mode of analysis by means of examples, some alreadypublished and some new. In each case, I chose actual experimentsrather than invented examples. For some of the examples, thegenetics is nontrivial and the explanation of the crosses islengthy, but this allows the student or reader to judge thevalue and difficulty of applying this method to real-world situations.The examples are as follows: (1) mapping a dominant of reducedpenetrance; (2) testing for a correlation between two chromosome-behaviorphenotypes; (3) testing whether a meiotic mutant affects chiasmainterference; and (4) testing for the effects of a gene on viabilityin the presence of confounding variables. The first two examplesare covered here, while the latter two are only briefly describedwith the full discussions included at the web site.

BASIC METHODOLOGY

TOP
ABSTRACT
BASIC METHODOLOGY
EXAMPLES
DISCUSSION
LITERATURE CITED

Maximum-likelihood estimates and hypothesis testing:
In the following sections, computer-assisted techniques forestimating parameters (such as map distances), testing for goodness-of-fit,and comparing hypotheses are described. All of them are basedon the method of maximum likelihood (FISHER 1922 ; EDWARDS1992 ).

Suppose that the probability of getting an offspring of a givenclass is p and that N of these offspring were observed in anexperiment. The likelihood of getting those N offspring is definedas: L = p^N. Crosses yield multiple offspring classes, each withits own probability, but, because different offspring are independentlyproduced, the likelihood for the entire experiment is the productof the likelihoods. For example,¹ a test cross involving twogenes that are map units apart yields CO crossover offspringand NCO noncrossover offspring, and the likelihood is L = ^CO(1 - )^NCO. The value of that maximizes L is the estimate of that we use; in this case it is CO/(NCO + CO). If we were dealingwith a more complex situation where there are many parameters,we would want to find the values of all of the parameters thatsimultaneously maximize L. Noting that as a number increases,its logarithm increases as well, we can, with the same effectand usually more easily, find the parameter values that maximizethe logarithm of L. In the following, the maximum values ofthese functions are denoted and ln .

Most of the time we are not only interested in estimating theparameters, but in testing whether a hypothesis provides a sufficientexplanation for the experimental variation or in testing whetherthere are significant differences between two (or more) hypotheses.For example, if we suspect a correlation between two variables,we would want to test three things. First, we need to test whetherthere is significant variation in these parameters in the firstplace. That is, does a hypothesis of no variation in one orthe other parameter fail a goodness-of-fit test? Second, wewill want to know whether a model that includes a correlationwith slope other than zero is significantly better than thehypothesis of no variation (equivalent to a correlation withslope = 0). That is, we must compare two hypotheses. Third,we will want to know how much of the variation is explainedby the correlation, i.e., its sufficiency; another goodness-of-fittest.

In many cases, the obvious test for goodness-of-fit is a straightforward {chi} ². That is, we use the maximum-likelihood estimates of the parametersto find the probabilities of each class, multiply these probabilitiesby the appropriate total(s) to get expected numbers, and calculate {chi} ² as a measure of the difference between observations and expectations.The degrees of freedom are then the number of independent observationsless the number of parameters estimated from the data.

When we wish to compare two hypotheses, H1 and H2, however,a different measure is often more appropriate or more convenient.This is the G (also known as G²) statistic (BISHOP et al. 1975B, Chapter 4):

G is distributed approximately as {chi} ² with degrees of freedomequal to the difference between the numbers of parameters ofthe two hypotheses. The approximation to {chi} ² is asymptotic andbecomes more exact as sample size increases.

Note that in many cases a test for sufficiency, usually statedas a test for goodness-of-fit, can also be described as a comparisonof two hypotheses. For example, if H1 includes m parameters(unknowns to be solved for) and there are m independent observations(knowns), and the parameters can take any numerical value, themaximum-likelihood estimates of the parameters are identicalto what would be obtained by solving m equations in m unknowns.Because H1 merely describes all of the variation, there is notest for its sufficiency (aside from the possibility of gettingutterly absurd parameter values) and calculating {chi} ² will yielda value of 0. A comparison of another hypothesis, H2, to H1by a G test is then logically equivalent to testing H2 for goodness-of-fit.The values of G for H2 vs. H1 and the {chi} ² for goodness-of-fitof H2 will generally be the same, or very nearly so. For suchtests, the choice of whether it is done as a {chi} ² or G test islargely a matter of convenience (if, for example, the valuesof ln _H1 and ln _H2 have already been found), esthetics, or habit.

How can we find the parameter values that maximize ln L? Forpedigree data, in years past we would have gone to MORTON's(1955) tables, but we would now most likely use one of the readilyavailable LOD score computer programs (TERWILLIGER 1994 ). Insome other situations, we might also be able to turn to theliterature for an analytical solution. If the crosses do notcorrespond to an already worked-out situation, but we are skilledin the calculus and linear algebra, we might try to find thepartial derivatives of ln L with respect to each parameter,set them equal to zero, and solve the set of simultaneous equations.Failing that, and even a skilled mathematician sometimes will,we can turn to a computer to approximate the maximum by numericalmethods. Indeed, if we are willing to travel this less elegantroute, all we need to tell the computer is the probability foreach offspring class, and the computer can do the rest.

The MLIKELY.PAS program:
MLIKELY.PAS is a Pascal program that, in its current version,is compiled under TURBO PASCAL 4.0 (Borland Intl.). It is notuser friendly—it lacks a graphical user interface, doesnot support a mouse, and requires that the user convert a fewequations into Pascal syntax and paste them into the program,which then must be compiled and run. It is, however, geneticistfriendly. It can work with virtually any set of crosses, whethersimple or complex. The expressions the user needs to write aremost often direct translations of a Punnett square or logictree. And running the program requires only answering a seriesof questions and entering the data. The heuristic used is brutallysimple; the user provides first guesses of the parameter valuesor accepts the program's defaults, and the computer increasesand decreases those values, moving sequentially through thelist of parameters using ever smaller intervals, until it findsthe maximum of ln L to whatever precision is desired. A fewtricks are used to speed operation:

The likelihood surface maybe smooth in some areas and roughin others. Where rough, largeincrements may miss a peak. Wheresmooth, however, large incrementsare more efficient. Hence,if the iteration process continuesfor several rounds at a givenincrement, the interval changesto a larger value.
To even out the sensitivity of parametersthat are very smalland very large, the increments are madeas fractions of theprevious guess (as long as that prior guesswas not exactlyzero).
To ensure that a path through the likelihoodspace can neverbe retraced, the proportions by which parametersare increasedand decreased are not the same but are relativelyprime.
The user can specify limited ranges for the parameters(forexample, it makes no sense to try crossover frequenciesoutsidethe range 0 to 0.5).
The size of the multipliers,and the number of cycles at anincrement before reverting toa larger one, were optimized fora problem somewhat more complexthan any reported here.

Although incorporated piecemeal in MLIKELY.PAS either intuitivelyor empirically, these procedures are not uncommon in optimizationalgorithms, and constraining parameter ranges is similar tothe use of "hints" in speeding artificial intelligence schemes.MLIKELY.PAS provides output in a variety of formats: screen-readable,printable, and word-processor and spreadsheet importable, andsaves the data in a reusable file so that they need be enteredonly once.

There are algorithms that can find a maximum more quickly. Forexample, the "optimizer" found in the QUATTRO PRO (Corel) spreadsheetpackage can estimate the derivatives first to speed the searchfor a maximum. MLIKELY.PAS is not, in any case, unreasonablyslow. Iteration times are indicated in the examples that follow,in each case for runs on a 80486/33 computer with each parameterestimated to a precision of better than 1 part in 10⁸. Evenwith a less-than-state-of-the-art PC, the running time is mostoften far less than the thinking time needed to define the problemin the first place.

The simple heuristic used in MLIKELY.PAS can cause two problemsthat the user should be aware of. First, because the parametersare handled sequentially, if two or more of the starting guessesare impossible, i.e., if they give negative expected frequencies,the program will not find the maximum, but will issue a warningmessage. Starting with more reasonable guesses is the cure forthis problem. Second, a likelihood function may have more thanone peak. As with other iterative peak-finding procedures, oncein the neighborhood of a peak, even if it is not the highestin the entire landscape, the program may halt at that localmaximum. It is even possible to have a model so badly structuredthat ln L is an oscillating function, such as a sine wave, butthis is unlikely in any genetics application. A program designedto find likelihoods for only one class of problem can usuallybe rigged to avoid this. In contrast, although MLIKELY.PAS canbe fooled by local maxima, and is not usable for every typeof application to which maximum likelihood analysis applies,it can accommodate any model for which one can write the probabilitiesof getting each observed class.

That multiple peaks in an iterative process can be dangeroushas certainly been seen in the study of human molecular evolution;the primacy of a mitochondrial Eve, while appealing, was supportedby a maximum-parsimony tree that was not unique (HEDGES et al.1992 ; TEMPLETON 1992 ). In more than 20 years of using MLIKELY.PASand its ancestors, however, there has never been a false-peakproblem except when I made a gross mistake in writing the probabilitiesin the first place, set absurd bounds for the parameters, or,more often, made a typographic error in putting them into theprogram.

The generally good behavior of the iteration algorithm usedin MLIKELY.PAS could be a result of mere luck, but has probablyoccurred because formal genetics problems, as opposed to problemsin taxonomy, are often well structured even when they involvemany parameters. For example, in describing recombination inseveral regions, there will be several single-crossover frequenciesto be estimated, but all of them behave in an algebraicallysimilar fashion.

The behavior of MLIKELY.PAS during the iteration process aswell as its output can provide useful indications of potentialproblems. For example, in the second example of this report,which considers testing for correlation using discrete data,an example of the effect of improperly bounding a parameter'ssearch space is considered. It is nevertheless good practiceto start with several widely different sets of parameter guessesto check that you always end up at the same peak.

MLIKELY.PAS also calculates {chi} ² for a goodness-of-fit test ofthe hypothesis. The user must supply Pascal statements definingthe sum(s) by which to multiply the probabilities to get expectednumbers. For data from a single cross, this is simply the sumof all observations and that variable is already calculatedby the program, but for a series of crosses the sum for eachcross must be specified. Because the {chi} ² calculation is includedin MLIKELY, it is often more convenient to use this test ofgoodness-of-fit rather than an equivalent G test when only asingle hypothesis is being tested; only a single set of equationsneed be written. In contrast, the likelihood-ratio approachof a contrast between the hypothesis of interest and a foilthat explains all of the variation requires writing (or editing)two versions of the equations, compiling and running the programtwice, and then calculating G.

Inclusion of these calculations in MLIKELY serves another purposeas well; seeing that the sum of the expected numbers equalsthe sum of the observed numbers. Moreover, examination of thelisting of the {chi} ² values of the individual cells gives a goodcheck that the probabilities have been sensibly defined andaccurately entered.

This article includes only enough information about the structureand running of the program to permit understanding how it servesthe geneticist. MLIKELY (including all source code), samplesets of equations and data, as well as documentation files areavailable at the web site. Downloading carries two conditions:(1) neither the program, nor any substantial part of the program,may be used for commercial purposes nor incorporated into anotherprogram without my written permission; and (2) any improvementsmade, or versions modified for other Pascal compilers, willbe shared with me so that they can be incorporated in futurereleases.

EXAMPLES

TOP
ABSTRACT
BASIC METHODOLOGY
EXAMPLES
DISCUSSION
LITERATURE CITED

Parameter estimation—mapping a mutant of reduced penetrance:
The top of Fig 1 illustrates a mouse genetics problem recentlyfaced by J. Asher. [This question arose in work following fromASHER et al. 1996 . Unfortunately, Dr. Asher died before thework could be completed.] In this cross, he wished to map mutationB, a dominant of reduced penetrance, with respect to two RFLP(and, therefore, codominant) markers. Meiosis produces noncrossovers,single crossovers, and double crossovers, but because B is notfully penetrant, some B-bearing progeny will be B⁺ in phenotype.For example, some of the A B C noncrossovers may be recoveredas A + C phenotype progeny equivalent to one of the double crossoverclasses. As shown in the bottom panel of Fig 1, writing equationsfor the probabilities of DCO, SCO, and NCO, adding the effectof reduced penetrance to get the probabilities of each of theprogeny types, and translation of the algebraic descriptionof this situation into Pascal syntax are straightforward. ThePascal version includes a preamble declaring the names of thevariables that will be used, and defining mnemonic designatorsfor distances (expressed as crossover frequencies), the coefficientof coincidence and penetrance in terms of the array of parametersprovided in the program. It also includes a statement that findsthe expected numbers for each class by multiplying the probabilitiesby the sum of the observations. The Pascal translation of thegenetics is inserted into the MLIKELY. PAS program, which isthen compiled and run. The input needed consists of the eightobservations, which can be entered in response to questionsposed by the program at run time, or can be taken from a datafile written (in ASCII text format) in advance.

View larger version (44K):
[in this window]
[in a new window]
[Download PPT slide]

Figure 1. Mapping a dominant of reduced penetrance. (Top) Three genes are followed in a test cross. A and C are RFLP markers, while B (Sp^d; ASHER et al. 1996

) is a dominant mutation of reduced penetrance. (Bottom) Conventional genetic notation describing this cross and a Pascal translation. There are four parameters: two distances (expressed as recombination fractions rather than centimorgans for calculation purposes), one coefficient of coincidence, and one penetrance. Because of reduced penetrance, individuals of different genotypes can have the same phenotype. For example, the a + c phenotypic class includes both a/a +/+ c/c genotype individuals, = 1/2NCO, and individuals who are genotypically a/a B/+ c/c but are nonpenetrant for B, = 1/2DCO(1 - P). The Pascal version is inserted in MLIKELY.PAS, which is then compiled and run.

Starting with some wild guesses (d₁ = 0.1, d₂ = 0.1, C = 0.1,and P = 0.99), in less than a second, the program finds thevalues of the two distances, the coefficient of coincidence,and the penetrance that maximize the ln likelihood of gettingthe observed results (₁ = 0.00634, ₂ = 0.0525, = 0.0, and = 0.6605) and indicates, by the nonsignificant value of {chi} ² =3.547 (3 d.f., P = 0.315), that this model provides a sufficientdescription of the data.

We can also take this a step further and examine the precisionof these estimates. For example, we may be most interested indistance d₁, the short A to B interval. What is the largest,or smallest, estimate of this distance that is still consistentwith the data? To do this², we (1) set a series of fixed valuesfor d₁; (2) allow the program to find the values of the otherparameters that maximize the likelihood; and then (3) comparethe results with those for the maximum-likelihood estimate ofd₁, i.e., when d₁ = 0.00634.

We can use MLIKELY.PAS for the first two steps by changing justfour lines of code, so that d₁ is treated as a constant,

and repeating the iteration several times for different valuesof Con[1]. The data can be reentered, or the original data filemay be modified in any text editor to change the number of parametersfrom 4 to 3 and the number of constants from 0 to 1.

We then need a statistic that allows us to compare the results.Two related comparisons are shown in Fig 2, one using the Gstatistic (Fig 2A) and the other using LOD scores (Fig 2B).MLIKELY.PAS does not itself calculate either G or LOD scores,but both of those statistics are easy to calculate and graphusing a spreadsheet, and MLIKELY.PAS does provide a spreadsheet-importable(comma and space delimited) output file.

View larger version (12K):
[in this window]
[in a new window]
[Download PPT slide]

Figure 2. Maximum-likelihood-derived confidence intervals for the distance between genes A and B. (A) G-test comparisons; (B) LOD score comparisons. The equations describing the cross shown in Fig 1 were changed so that distance d₁ was treated as a constant. Maximum likelihoods were obtained for a series of values of d₁ and ln for each of these fixed-d₁ hypotheses was compared to ln for the variable-d₁ hypothesis. The peak of the curve occurs at the estimate of d₁ obtained under the variable-d₁ hypothesis and the smallest and largest A-B map distances, in centimorgans, consistent with the data are those at which the curves cross the selected probability or LOD-score criterion (dotted lines). MLIKELY.PAS was used to find the ln values and the spreadsheet-compatible output file was imported into Quattro Pro, which was then used to calculate values of the G statistic and LOD scores. Graphs were prepared using Corel Draw; calculated points are shown by tick marks while the curves are Bezier interpolations.

The values of the maximum ln L's were obtained with MLIKELY.PAS,and a spreadsheet program (QUATTRO PRO; Corel) was used to findG = 2 x [ln _(d₁xed) - ln _{(d₁ variable)}] for each fixed valueof d₁. These results are shown in Fig 2A. There are four parameterswhen d₁ is allowed to vary, and three when it is fixed, giving1 d.f., corresponding to P = 0.05 for G = 3.841 and P = 0.01for G = 6.635. Thus, the 95% upper bound for d₁ is less than3 map units, and the 99% upper bound is less than 4 map units.The probabilities provided by the G test correspond to thoseconventionally used in most hypothesis testing; they are theprobabilities of getting a difference at least that large bychance alone.

A different comparison, shown in Fig 2B, is often used in humangenetics. LOD (log of odds) scores are the log₁₀ of the ratioof the likelihoods under two hypotheses, or, equivalently, thedifference between the log₁₀ L's. The ln L output of MLIKELY.PAScan be converted to base 10 by multiplying by ln10 {approx} 2.30258,and the LOD scores are found by subtraction. Note that the conventionsused in pedigree analysis, a LOD of +3 to demonstrate linkageand -2 to exclude linkage, are substantially more stringentthan the usual critical values. This stringency is reasonablewhen dealing with the tests of multiple hypotheses implicitin using a progressive accumulation of families to decide whetherthere is linkage, but is overkill for most cross data. It iscertainly overly stringent here, where we are already certainthat the genes are linked.

Variation and correlation—the relationship between experimental variables:
There are probably innumerable circumstances in which one observestwo or more variable phenotypes and wants to know whether theyare correlated. Where the phenotypes are metric, such as bristlelengths in Drosophila, conventional regression analysis canbe appropriate, but regression analysis is also often used forcounted variables, such as crossovers and disjunctional events,where a maximum-likelihood approach is more powerful and morerevealing. To illustrate this, some unpublished data from mylaboratory on the behavior of Rex are analyzed. The resultsof similar analyses may also be found in PALUMBO et al. 1994, and some extensions to this approach are used in a recentarticle (ROBBINS 1999 ) that deals with sex-chromosome disjunctionand meiotic drive produced by ribosomal-RNA gene deficiencies.

Rex is a repeated, heterochromatically located element of Drosophilamelanogaster. Acting maternally, it promotes recombination betweenribosomal-RNA gene arrays (rDNA) during early embryonic mitoses(ROBBINS 1981 ; RASOOLY and ROBBINS 1991 ). We had repeatedlynoted that crosses of Rex females also seem to produce morethan the usual amount of sex-chromosome nondisjunction, amountingto ~1% exceptions, and had wondered whether this is also aneffect of Rex, or if it is an extraneous phenomenon unrelatedto the presence of Rex. The frequency of nondisjunction, thoughelevated, is low enough that mapping it to Rex would be an uninvitingtask. If not Rex-related, this slight meiotic perturbation wouldalso not be of much interest to us. Examination of data collectedfor other purposes, however, indicates that the frequenciesof nondisjunction and rDNA recombination are correlated, suggestingthat the two are functionally, even if not necessarily causally,related.

Those data came from crosses done along the way to mapping asuppressor of Rex, a Su(Rex). At one point in this process,a series of chromosomes that carried different segments of theX chromosome were tested for suppression of Rex activity. Asthis particular Su(Rex) turned out to be autosomal, each genotypetested actually consisted of several flies bearing the sameX segment, but a random sampling of Su(Rex) and non-Su(Rex)autosomes. The results of these crosses are shown in Table 1.Not only does the frequency of rDNA recombination appearto be (and is) heterogeneous because of the different frequenciesof the Su(Rex) in the 10 samples, but the frequency of nondisjunctionvaries as well. Are the two varying in a correlated fashion?We can find out by comparing the values of ln under three hypotheses:

H1:The frequencies of both nondisjunction and Rex-induced exchangeare different in each cross.

View this table:
[in this window]
[in a new window]

Table 1. Progeny recovered from

crosses

H2: The frequency of Rex-inducedexchange differs among crosses,but the frequency of nondisjunctionis the same in all 10 crosses.
H3: The frequencies of nondisjunctionand Rex-induced exchangeare related as nondisjunction = m x(rDNA exchange) + b. (Notethat a linear correlation is consideredhere, but a correlationof any other form could be just as easilyevaluated.)

There are three G-test comparisons to be made:

H1 "explains"all of the variation in the frequency of nondisjunction.H2explains none of the variation in nondisjunction. Hence,thecomparison of H1 vs. H2 tests whether there is statisticallysignificant variation in the frequency of nondisjunction—itis equivalent to a goodness-of-fit test of H2.
H3 explainsthat part of the variation of nondisjunction thatis linearlyrelated to the frequency of Rex-induced exchange.H2 explainsnone of that variation. Hence, the comparison H3vs. H2 is ameasure of the variation explained by the correlation—ittests the significance of the correlation.
Last, H1 vs. H3measures how much variation of nondisjunctionis left unexplainedafter the relationship with Rex-inducedexchange is accountedfor. It tests the sufficiency of the correlation—itisequivalent to a goodness-of-fit test of H3.

The first step needed for making these comparisons is writingthe probabilities of each of the progeny classes. Unfortunately,as illustrated in Fig 3, there are some complications causedby the actual cross used:

One of the X chromosomes of the Rexfemales also carried a deficiencythat is recessive lethal.Thus, some offspring genotypes diebecause of the presence ofthe lethal.

View larger version (62K):
[in this window]
[in a new window]
[Download PPT slide]

Figure 3. Meiotic nondisjunction in Rex/+ females and mitotic exchange between two rDNA arrays in their offspring. Normal disjunction (1 - n) yields both X/attached-XY and X/O zygotes, but half of the latter die because they carry the lethal rJ1 deficiency. A fraction (r) of the X/attached-XY zygotes are transformed to X/Y males or gynandromorphs by recombination between the two rDNA arrays of the attached-XY, but half of these also die because this exposes Df(1)w^rJ1. One-half of the products of nondisjunction also die because they are either nullo-X or metafemales.

The fathers carried an attached-XY () and thereforeproduce and 0 sperm, but the ratio of :0 sperm is not 1:1—/0males produce an excess of 0 sperm.
Recoverable Rex-inducedmitotic exchanges occur only in the embryos resulting fromnormal disjunction. The exchange productis an X/Y male (orgynandromorph), but if the X carries thelethal, it too dies.Thus, to completely describe each cross,we need parametersthat describe (i) the frequency of nondisjunction(n);(ii) thefrequency of Rex-induced exchange (r); and (iii) theproportionof sperm that carry the attached-XY (XY), and wemust stay attentiveto the classes that die.

The probabilities of the surviving genotypes among all zygotesare then

and

These are not, however, the probabilities of actually observingthese offspring because we do not observe the lethals, whichare ¹/₂(1 - XY)(1 - n) + ¹/₂(XY)(r)(1 - n) + ¹/₂n. To get theprobabilities among survivors, we must divide the probabilityof each surviving genotype by the total probability of survival,1 - ¹/₂(1 - XY)(1 - n) - ¹/₂(XY)(r) (1 - n) - ¹/₂n.

The equations needed to find the maximum-likelihood estimatesof the parameters under the three models and the iteration timesto find ln are shown in their Pascal incarnation in Fig 4.Because only the parameter values change from cross to cross,a single set of equations is contained within a loop. Only fourlines must be changed to accommodate each of the hypotheses.

View larger version (49K):
[in this window]
[in a new window]
[Download PPT slide]

Figure 4. Correlation of two phenotypes associated with the Rex element of Drosophila melanogaster. (Top) The parameters used to describe this cross and the probabilities of the offspring types. Note that these are the probabilities among all zygotes, including those that are lethal, and do not sum to one. (Bottom) Pascal coding used to test for a correlation between the two phenotypes. Parameters are assigned in accord with three hypotheses: H1, that all parameters vary from cross to cross; H2, that the nondisjunction rate is the same in all crosses; and H3, that the nondisjunction rate is correlated with the rDNA exchange rate. Probabilities of each class among total zygotes are first calculated and then converted to expected fractions of each class among survivors by dividing by total surviving. Expected numbers are the expected fractions times the observed total for each cross. Iteration times for MLIKELY.PAS containing these equations are shown here, and the results are shown graphically in Fig 5.

Each cross yields four offspring classes, three of which areindependent. Under H1, each cross is described by three separateparameters so there is a unique solution for each. They are

and

If each of the three parameters is a probability with valuesbetween 0 and 1, MLIKELY.PAS must reach the same solutions,and the goodness-of-fit {chi} ² at the end of the iteration processmust be 0. Thus, even if the algebraic solutions for XY, r,or n were not reasonably obvious, MLIKELY.PAS would providethe solutions. In other words, whenever the number of parametersequals the number of independent observations, MLIKELY. PASserves as a reasonably efficient equation solver.

The algebraic solutions could turn out to be <0 or >1 eitherbecause of sampling variation or because the three-probabilitymodel is truly nonsensical. Were that the case, as long as theparameters are constrained to the default 0–1 range, MLIKELY.PASwould yield parameter estimates that do not match the calculatedvalues and we would get a positive {chi} ² value. Either discrepancy,algebraic solutions that are <0 or >1, or a mismatch betweenthe algebraic and numerical solutions, should certainly cluethe investigator to question the adequacy of the model. Forthe data in Table 1, the algebraic solutions for XY, r, andn are all in the 0–1 range, running MLIKELY.PAS for H1yields the same values, and the {chi} ² for H1 is 0. Note, however,that the proportion of bearing sperm is not actually involvedin the hypotheses to be compared, and it would have been legitimateto assume that the value of the parameter XY was the same forall 10 crosses instead of separately evaluating it for eachcross. An appendix that considers the pros and cons of differentways of formulating H1 is included at the web site.

Under both H2 and H3, there are fewer parameters than independentobservations. Thus, there is more than one set of possible solutions,and the maximum-likelihood estimates are the minimum-variance,unbiased set. The estimates under the three hypotheses, andthe G-test comparisons, are shown graphically in Fig 5. UnderH1, we estimate the nondisjunction rates for each cross separately.Under H2, we obtain the maximum-likelihood estimate of a singlenondisjunction rate for all of the crosses. Under H3, we obtainthe maximum-likelihood estimates of the slope and interceptfor correlated behavior of nondisjunction rate and rDNA exchangerate. In addition, Fig 5 shows the results that are obtainedfrom conventional regression analysis that uses the frequenciesof rDNA crossovers and nondisjunctional offspring rather thanthe actual progeny counts.

View larger version (30K):
[in this window]
[in a new window]
[Download PPT slide]

Figure 5. Results of maximum-likelihood and regression analyses of the correlation of nondisjunction and rDNA exchange. G-test comparisons of the results of the MLIKELY.PAS runs described in Fig 4 indicate that there is highly significant variation in nondisjunction among the crosses (H2 vs. H1), provide a single, highly significant estimate of the correlation of the two phenotypes (H2 vs. H3), and indicate that the correlation accounts for all but a nonsignificant fraction of the variation in nondisjunction (H3 vs. H1). Regression analysis, arbitrarily treating either phenotype as the independent variable, provides two different estimates of the correlation, either of which is significant but not highly so, and leaves a substantial fraction of the variation of nondisjunction unexplained.

Likelihood and regression analyses give slightly different estimatesof the slope and intercept. Indeed, with regression analysisthere are two equally sensible lines of least-squares fit, withthe best estimate of the underlying parameters somewhere inbetween. Regression analysis assumes that the values of onevariable, the independent variable, are chosen by the experimenterand are not subject to sampling error. That is not in fact truein this kind of experiment, where both variables are actuallydetermined by the data. Unless we have reason to believe thatone parameter is known with greater precision than the other,either can be used as the independent variable. Maximum likelihood,in contrast, gives a single solution that takes account of theeffects of sampling variation on both variables.

In neither analysis was the intercept constrained to pass throughthe origin, but the maximum-likelihood estimate of the interceptis 0 and the intercepts of both regression lines are not significantlydifferent from 0. The statistics, however, are quite different.First, the maximum-likelihood method allows us to isolate asingle variable and test whether it shows significant experimentalvariation in the first place; this cannot be parsed out withregression analysis. Second, the method of maximum likelihoodprovides a far more powerful test of whether the correlationis significant. In this instance, the regression analysis pointsto a significant correlation, but only at the 0.025 level; theG test indicates that it is actually very highly significantindeed. In other words, regression analysis, by using frequenciesrather than the observed numbers, has thrown away a lot of information.Third, the likelihood analysis provides a direct test of whetherthe correlation adequately explains the experimental variation.Here, the unexplained variation is not only small, it is statisticallyinsignificant. Regression analysis also provides a measure,if not a direct test, of sufficiency. As long as the interceptis calculated rather than forced through the origin, R² is thefraction of the variance that is explained by the correlation.Here, with less than half of the variance explained by the (albeitsignificant) correlation, regression analysis suggests thata substantial fraction of the experimental variation has notbeen accounted for, while the likelihood analysis tells us thatonly an insignificant fraction of the variation remains unexplained.In large measure this vagueness indicated by the regressionanalysis results from the lack of fit between the experimentaldesign that has two variables subject to sampling errors andthe assumption of regression analysis that one variable is errorfree.

A note is in order at this point about the need for care indefining the space within which MLIKELY.PAS searches for themaximum-likelihood solutions. In general, a slope and interceptcan take on any positive or negative values, but allowing unconstrainediteration of the slope and intercept can lead to finding localand/or nonsensical bumps in the likelihood function. It is importantto provide hints to the program in the form of constraints onthe parameter ranges. Inspection of the data before runningthe program will generally suffice, and even if one fails todo that in advance, the absurdity of the result at a false maximumis quite evident. For these data, it is clear from inspectionthat the slope of the correlation must be positive. Given that,it is also evident that the intercept must be less than themaximum-likelihood estimate of the average rate of nondisjunction(H2). As long as either of these hints is provided to the programby setting the lower bound of the slope to zero or the upperbound of the intercept to the value previously found for theaverage, iteration proceeds quickly to the true maximum. If,however, a negative slope is allowed and an intercept greaterthan the average is allowed and the initial guess of the interceptis greater than that average, iteration to a local maximum ispossible. The conjunction of these errors will be obvious, however.If a negative slope is allowed and the initial guess of theintercept is set greater than the average nondisjunction ratebut less than the highest observed rate, the false solutionunder H3 (correlation) will be identical to the solution underH2 (invariant nondisjunction rate). If a negative slope is allowedand the initial guess of the intercept is set greater than thehighest observed nondisjunction rate, the false solution forH3 will be even worse—if plotted, the line will not evenremotely approach the data points. Even if inappropriate boundsare set, however, MLIKELY.PAS finds the correct solution aslong as the initial guess of the intercept is less than theaverage nondisjunction rate.

Further examples of the range of problems amenable to this approach:
The foregoing examples, estimating a parameter in the presenceof nuisance variables and analysis of correlation, illustratejust two of the many problems in formal genetics that can betackled using this approach. The web site, in addition to simpler,introductory examples, contains additional real-world examplesthat illustrate two hypothesis-testing problems that arise withregularity: (1) testing whether only a subset of parametersdiffer between a control and an experiment; and (2) taking accountof sampling variation in control crosses done to evaluate confoundingvariables. In outline, those examples are as follows:

SANDLERet al. 1968 suggested that a useful classificationof recombination-defectivemeiotic mutants could be based onwhether a mutant reduces mapdistances without affecting thecoefficient of coincidence,or whether it affects both recombinationand interference. Inthis example (abbreviated from ROBBINS1977 ), mutant and controlrecombination in four marked regionsare compared. A simplecontingency test shows that the mutantsuppresses recombination,but parsing crossover frequenciesand coefficients of coincidenceusing maximum-likelihood methodsis necessary to test whetherthe mutant affects interferenceper se.
In the first exampledescribed in this article, it was possibleto eliminate theeffects of a nuisance variable (penetrance)using a single setof data. Frequently, however, the effectof a confounding variablehas to be evaluated in a separatecross and, when an effectis found in the control, it must betaken into account in assessingthe experiment. HEARN et al.1991 wanted to determine whetherchromosomal rearrangementsthat variegate for the heterochromaticvisible lt also variegatefor nearby lethals by testing whetherviability of the rearrangementis sensitive to modifiers ofvariegation. A simple contingencytest would have sufficed wereit not for the possibility thatthe modifier might have an effecton viability separate fromits effect on variegation of thelethal locus. Recognizing this,they did control crosses thatlacked the variegating rearrangementto expose the effects ofthe modifier alone.

Differences in the control crosses must be removed before decidingwhether there is an effect in the experimental crosses. A simple,but flawed, approach would be adjusting the numbers in the experimentbased on the ratios observed in the control. However, samplingerrors are inherent in the control as well as the experiment,but "adjusting" the experimental data based on the controlsassumes that the controls are error-free. The preferable approach,used by Hearn et al. and detailed in the example, is to constructa model for these viabilities and interactions and apply itsimultaneously to all of the data.

DISCUSSION

TOP
ABSTRACT
BASIC METHODOLOGY
EXAMPLES
DISCUSSION
LITERATURE CITED

Maximum-likelihood analysis of data from crosses:
There are two themes running through the examples used to illustratethis approach. The first is the wide applicability of a simplenumerical approximation approach to finding maximum-likelihoodsolutions. The second is the insight to be gained from partitioningof variation by even a primitive application of discrete multivariateanalysis. In the teaching context, the first allows studentsto focus on the ideas without getting terribly involved in themechanics, and the second forces a clear definition of the experimentaldesign and the questions to be asked.

In many instances these ideas parallel each other, but thatis certainly not always the case. For some problems, such asin the example of testing whether a meiotic mutant affects interference,only a test of a single hypothesis is needed, but finding themaximum-likelihood estimates of the multiple exchange and interferenceparameters is made easier by use of the computer. There aresurely few geneticists who would be comfortable trying to solvea set of 14 simultaneous equations for the partial derivativesof L with respect to eight map distances and six coefficientsof coincidence. Numerical analysis makes this kind of problemtractable.

There are also problems for which multiple hypotheses must becompared, but for which the maximum likelihood is readily found.For example, R. Morell recently posed the following. He wasstudying a human dominant of reduced penetrance for which genotypescould nevertheless be determined unambiguously by molecularmeans, even in many instances to the point of knowing whetherthe particular allele segregating was, for example, a frame-shiftor a base substitution. Eyeball perusal of several pedigreessuggested that penetrance was not constant. There are severalthings worth examining in this situation. First, of course,is the question of whether these are significant differencesin penetrance or merely stochastic variation. If there are significantdifferences, one might want to know, for example, whether penetranceis higher for clearly null alleles than for missense allelesor whether other loci affect expression of this trait. In otherwords, we need to ask, as we would in an analysis of variancewere we following a measured variable rather than numbers ofaffected and unaffected individuals, whether there are significantdifferences in variation between and within groups. Testinga series of hypotheses was needed here, but, at the same time,there was no need to turn to numerical approximation to findthe several ln values. Only one variable was involved, penetrance,and the analytic solutions were easily found (MORELL et al.1997 ).

There are also situations outside of formal genetics in whichthis approach may be of value. For example, we have recentlyused this kind of analysis in measurement of ribosomal RNA genecopy number (P. CRAWLEY, unpublished results). Because copynumbers of a large number of genotypes were needed, dot-blothybridizations, with a single-copy reprobe used to control forloading, were counted using a storage-phosphor screen device.The data were therefore in the form of discrete numbers (photonsdetected in each of many dots) and maximum-likelihood methodsare appropriate for testing for differences among the genotypes.MLIKELY.PAS is not designed for the rather awkward bookkeepinginvolved in the complex data structure of multiple dots of multiplegenotypes on multiple blots with probes that may differ in concentrationand specific activity from run to run. Nevertheless, we usedit to test the utility of this approach. It does work, and itcertainly gives cleaner yes/no judgments of significance thandoes a chart of error bars.

Regardless of the particular questions being investigated, thereare several reasons why this approach to teaching statisticsis attractive, at least when a program like MLIKELY.PAS canbe used to preempt the need for great mathematical competence:

Thereis no need to adapt methods designed for other purposesor forcontinuous data. There are always assumptions in doingthat,which may not be obvious to the casual user of a statisticscookbook and may not hold. In writing the probabilities of eachobserved class, any assumptions are at least made evident. Itforces us to understand our own experiment, and when an assumptionis faulty it often becomes glaringly obvious by outcomes suchas a value of 1 for the best estimate of a parameter that isa probability.
In this approach, there is no need to learna myriad of differentprocedures nor to understand the finepoints of when they shouldor should not be used. Here, laziness,rather than necessity,is the mother of invention.
If maximum-likelihoodsolutions exist, the parameter estimatesare the minimum-varianceunbiased estimates. There are somecircumstances for which biasedestimators exist that are neverthelessalways closer to thepopulation parameter (e.g., pseudo-Bayesianestimators). Exceptfor those cases, however, the maximum-likelihoodestimates willprovide the most powerful tests of significancepossible. Themaximum-likelihood approach may reveal significantdifferencesin a given-sized sample when a method transplantedfrom continuous-variablestatistics would not.
This approach gives a more comprehensivepicture of what isgoing on than any single test of significance.Much as in conventionalanalysis of variance with continuousdata, we can assess notonly the significance of a suspectedagent, but the strengthof its effect and its sufficiency asan explanation of the observations.A correlation, for example,can be both statistically significantand, at the same time,unimportant. Is the phenomenon real?Is it strong enough thata biologist should care about it atall? Is it of primary orsecondary importance? A correlationthat explains only 1% ofthe experimental variation is probablynot of much biologicalimportance even if it is significantat a 0.0001 level.

Of course, this approach, particularly the use of numericalmethods for solving a nearly unrestricted optimization problem,has its limitations as well. The MLIKELY.PAS algorithm in whichparameters are varied in a fixed order sometimes requires thatthe user have an idea of reasonable guesses to enter as startingpoints; it may not recover from entirely unreasonable ones.The possibility of finding local maxima, and of missing thetrue maximum likelihood by an amount that would affect one'sinferences, cannot be ruled out, even though it has been oflittle practical import in a fairly wide variety of applications.For some hypotheses, the interactions of the parameters canbe so complex that iteration to a solution takes longer thanis reasonable, even though no more than a few minutes are neededfor each of the examples discussed here. Finally, likelihoodmethods are applicable in many situations other than formalgenetics, but MLIKELY.PAS was written specifically with crossesin mind. MLIKELY only works for situations where the observationscome in the form of a multinomial sample whose probabilitiescan all be explicitly stated in terms of the parameters to beestimated.

Some improvements to MLIKELY.PAS can also be envisioned, bothfor teaching purposes and for research uses. The interface mightbe improved to allow the user to judge when the precision reachedis close enough to cease iteration, but a faster, current-generationcomputer makes the speed gain entirely trivial. The programcould allow an option of varying the parameters in random sequenceat each iterative step. At the cost of increased computationtime, it would be more likely to recover from absurd initialguesses and less likely to halt at a local peak. A pseudo-Bayesianapproach could be implemented for data sets that contain smallnumbers and many cells where the observed number is zero, byrunning two successive iterations, the first using the actualdata, and the second using a set of numbers biased toward theinitial maximum-likelihood expectations. Finally, MLIKELY.PAS'data-handling structures are not well suited to every applicationfor which the maximum-likelihood approach would be an improvementover what one sees in the biological literature. Hopefully,colleagues working in other areas of genetics and molecularbiology, or a programmer or two, will be intrigued enough bythe power of these methods to adapt them to those situationsas well.

FOOTNOTES

¹ With apologies to the mathematically oriented, who mightreadNCO as N times C times O, common genetic abbreviationsare usedto name variables rather than following mathematicalconvention.Hence, NCO is to be read as "noncrossovers."
²The procedure outlined here is decidedly inelegant and provideswhat is more properly termed a support interval rather thana conventional confidence interval, but it requires understandingonly the basic concepts of statistical inference and does notrequire understanding variance and covariance nor knowledgeof linear algebra. It is also practical.

ACKNOWLEDGMENTS

I am grateful to Joe Felsenstein and Richard Lenski for theircritical, helpful, and encouraging comments and for their staminain wading through a manuscript that originally combined thecontents of both this report and a paper on meiotic drive. Iam especially grateful to Joe Felsenstein for the encouragementand time he gave a then-young graduate student more than 25years ago when I wrote the first version of the MLIKELY program.I am also grateful to Rob Morell and Ellen Swanson for theircriticisms and for the patience they have shown during the longgestation of this article. Finally, I thank the two anonymousreviewers and the corresponding editor who correctly suggestedthat the article would be more readable with some of the examplesand the appendix removed to the website. Research in my laboratoryhas been supported by National Science Foundation grant MCB-9305846and by start up funds from the Università di Siena.

LITERATURE CITED

TOP
ABSTRACT
BASIC METHODOLOGY
EXAMPLES
DISCUSSION
LITERATURE CITED

ASHER, J. H., JR., R. W. HARRISON, R. MORELL, M. L. CAREY, and T. B. FRIEDMAN, 1996 Effects of Pax3 modifier genes on craniofacial morphology, pigmentation, and viability: a murine model of Waardenburg syndrome variation. Genomics 34:285-298[Medline].

BISHOP, Y. M. M., S. E. FIENBERG and P. W. HOLLAND, 1975a Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge, MA.

BISHOP, Y. M. M., S. E. FIENBERG and P. W. HOLLAND, 1975b Formal goodness of fit: summary statistics and model selection, pp. 123–175 in Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge, MA.

CROW, J. F., 1993 Felix Bernstein and the first human marker locus. Genetics 133:4-7[Medline].

EDWARDS, A. W. F., 1992 Likelihood (expanded edition). Johns Hopkins University Press, Baltimore.

FISHER, R. A., 1922 On the mathematical foundations of theoretical statistics. Philos. Trans. R. Soc. Lond. A 222:309-368.

GRIFFITHS, A. J. F., J. H. MILLER, D. T. SUZUKI, R. C. LEWONTIN and W. M. GELBART, 1993 Linkage I: basic eukaryotic chromosome mapping, pp. 132–135 in An Introduction to Genetic Analysis. W. H. Freeman, New York.

GRIFFITHS, A. J. F., W. M. GELBART, J. H. MILLER and R. C. LEWONTIN, 1999 Recombination of genes, pp. 147–151 in Modern Genetic Analysis. W. H. Freeman, New York.

HEARN, M. G., A. HEDRICK, T. A. GRIGLIATTI, and B. T. WAKIMOTO, 1991 The effect of modifiers of position-effect variegation on the variegation of heterochromatic genes of Drosophila melanogaster.. Genetics 128:785-797[Abstract].

HEDGES, S. B., S. KUMAR, K. TAMURA, and M. STONEKING, 1992 Technical comment on human origins and analysis of mitochondrial DNA sequences. Science 255:737-739[Free Full Text].

HILLIKER, A. J., G. HARAUZ, A. G. REAUME, M. CLARK, and A. CHOVNICK, 1994 Meiotic gene conversion tract length distribution within the rosy locus of Drosophila melanogaster.. Genetics 137:1019-1026[Abstract].

KASTENBAUM, M. A., 1958 Estimation of relative frequencies of four sperm types in Drosophila melanogaster.. Biometrics 14:223-228.

KING, J. S. and R. K. MORTIMER, 1991 A mathematical model of interference for use in constructing linkage maps from tetrad data. Genetics 129:597-601[Abstract].

LYCKEGAARD, E. M. and A. G. CLARK, 1991 Evolution of ribosomal RNA gene copy number on the sex chromosomes of Drosophila melanogaster.. Mol. Biol. Evol. 8:458-474[Abstract].

MCPEECK, M. S. and T. P. SPEED, 1995 Modelling interference in genetic recombination. Genetics 139:1031-1044[Abstract].

MORELL, R., T. B. FRIEDMAN, J. H. ASHER, and L. G. ROBBINS, 1997 The incidence of deafness is non-randomly distributed among families segregating for Waardenburg syndrome Type 1 (WS1). J. Med. Genet. 34:447-452[Abstract].

MORTON, N. E., 1955 Sequential tests for the detection of linkage. Am. J. Hum. Genet. 7:277-318[Medline].

MORTON, N. E., 1995 LODs past and present. Genetics 140:7-12[Medline].

PALUMBO, G., S. BONACCORSI, L. G. ROBBINS, and S. PIMPINELLI, 1994 Genetic analysis of Stellate elements of Drosophila melanogaster.. Genetics 138:1181-1197[Abstract].

RASOOLY, R. S. and L. G. ROBBINS, 1991 Rex and a suppressor of Rex are repeated neomorphic loci in the Drosophila melanogaster ribosomal DNA. Genetics 129:119-132[Abstract].

ROBBINS, L. G., 1971 Nonexchange alignment: a meiotic process revealed by a synthetic meiotic mutant of Drosophila melanogaster.. Mol. Gen. Genet. 110:144-165[Medline].

ROBBINS, L. G., 1977 The meiotic effect of a deficiency in Drosophila melanogaster with a model for the effects of enzyme deficiency on recombination. Genetics 87:655-684[Abstract/Free Full Text].

ROBBINS, L. G., 1981 Genetically induced mitotic exchange in the heterochromatin of Drosophila melanogaster.. Genetics 99:443-459[Abstract/Free Full Text].

ROBBINS, L. G., 1999 Are unpaired chromosomes spermicidal? A maximum likelihood analysis of segregation and meiotic drive in ribosomal-DNA deficient Drosophila melanogaster males. Genetics 151:251-262[Abstract/Free Full Text].

SANDLER, L. and M. A. KASTENBAUM, 1958 A note on the frequency distribution of tetrads by rank in Drosophila melanogaster.. Genetics 43:215-222[Free Full Text].

SANDLER, L., D. LINDSLEY, B. NICOLETTI, and G. TRIPPA, 1968 Mutants affecting meiosis in natural populations of Drosophila melanogaster.. Genetics 60:525-558[Free Full Text].

SNOW, R., 1979 Maximum likelihood estimation of linkage and interference from tetrad data. Genetics 92:231-245[Abstract/Free Full Text].

TEMPLETON, A. R., 1992 Technical comment on human origins and analysis of mitochondrial DNA sequences. Science 255:737.

TERWILLIGER, J. D., 1994 Handbook of Human Genetic Linkage. Johns Hopkins University Press, Baltimore.

WEIR, B. S., 1994 The effects of inbreeding on forensic calculations. Annu. Rev. Genet. 28:597-621[Medline].

WEIR, B. S., 1995 DNA statistics in the Simpson matter. Nat. Genet. 11:365-368[Medline].

ZHAO, H. Y., M. S. MCPEECK, and T. P. SPEED, 1995a Statistical-analysis of chromatid interference. Genetics 139:1057-1065[Abstract].

ZHAO, H. Y., T. P. SPEED, and M. S. MCPEECK, 1995b Statistical-analysis of crossover interference using the chi-square model. Genetics 139:1045-1056[Abstract].

This article has been cited by other articles:

M. Boschi, M. Belloni, and L. G. Robbins
Genetic Evidence That Nonhomologous Disjunction and Meiotic Drive Are Properties of Wild-Type Drosophila melanogaster Male Meiosis
Genetics, January 1, 2006; 172(1): 305 - 316.
[Abstract] [Full Text] [PDF]

M. Belloni, P. Tritto, M. P. Bozzetti, G. Palumbo, and L. G. Robbins
Does Stellate Cause Meiotic Drive in Drosophila melanogaster?
Genetics, August 1, 2002; 161(4): 1551 - 1559.
[Abstract] [Full Text] [PDF]

This Article

Abstract

Full Text (PDF)

Alert me when this article is cited

Alert me if a correction is posted

Services

Similar articles in this journal

Similar articles in PubMed

Alert me to new issues of the journal

Download to citation manager

Citing Articles

Citing Articles via HighWire

Citing Articles via Google Scholar

Google Scholar

Articles by Robbins, L. G.

Search for Related Content

PubMed

PubMed Citation

Articles by Robbins, L. G.

				M. Belloni, P. Tritto, M. P. Bozzetti, G. Palumbo, and L. G. Robbins Does Stellate Cause Meiotic Drive in Drosophila melanogaster? Genetics, August 1, 2002; 161(4): 1551 - 1559. [Abstract] [Full Text] [PDF]