[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Re: Kerberosaurus manakini



what I find disturbing this the "need" some people have of doing a cladistic 
analysis of every new specimen/taxa when the material clearly too 
fragmentary/incomplete to produce anything meaningful. The Kerberosaurus is a 
case in point.
Ken

Kenneth Carpenter, Ph.D.
Curator of Lower Vertebrate Paleontology &
Chief Preparator
Dept. of Earth Sciences
Denver Museum of Natural History 
2001 Colorado Blvd.
Denver, CO 80205

Phone: (303)370-6392
Fax: (303)331-6492
email: KCarpenter@DMNS.org

For fun:
 http://dino.lm.com/artists/display.php?name=Kcarpenter


>>> Mickey Mortimer <Mickey_Mortimer111@msn.com> 17/Jun/04 >>>
Mike Milbocker wrote-

> >The authors state- "The purpose of the present paper is to establish the
> phylogenetic relationships of the new hadrosaurid genus Kerberosaurus, and
> not to propose a complete revision of Hadrosaurinae, which would require a
> more extended revision of North American specimens. Therefore, we only
> retained characters that can be directly observed on the material referred
> to as Kerberosaurus manakini at hand, keeping in mind that more exhaustive
> studies of hadrosaurid systematics are in preparation (J. J. Head and D.
B.
> Weishampel, pers. comm.)."
> This is NOT a reason to limit character selection to those that can be
coded
> for your fragmentary taxon.  This only skews the result of the analysis.
> Characters not codable for Kerberosaurus might nonetheless affect clades
it
> belongs to.  WHY don't the authors realize this?<
>
> Are you complaining that the authors should have looked at other
> Kerberosaurus specimens to fill in the missing characters or that they
> should have retained the characters but assigned them state value
> "undetermined". If you view this as a perturbation theory problem, it
would
> be less parsimonious to destabilize a well characterized phylogeny by
> introducing a taxon that is ill-determined. I think the authors took the
> only sensible approach with the limited data available. From perturbation
> theory, the small perturbation of leaving out the undetermined characters
is
> preferred to introducing alot of unconstrained varibles into the system -
> which is what you do when you code characters as "undetermined" and build
a
> systematics on this. Perhaps the authors even did this, and found the
number
> of equally parsimonious trees to be increased by this approach, which
would
> be consistent with perturbation theory.

I'm arguing they should have used characters not codable for Kerberosaurus
and coded them as "unknown" (?) in Kerberosaurus.  Regardless of what
perturbation theory says, studies have been done that show characters with a
poorly known distribution generally increase the accuracy of an analysis.
As Wiens (2003) states, "In general, adding the set of incomplete characters
increases accuracy relative to analyzing the set of complete characters
alone."  He finds it _can_ descrease accuracy, but only in cases where the
percent of coded taxa is low (~25%), and usually not when missing data is
confined to a monophyletic taxon.  The problems arise when disparate taxa
are the only complete ones, and long branch attraction (LBA) takes affect.
Still, Weins states, "A limited set of simulations, however, showed that
even under conditions where LBA was maximized, the overall accuracy of the
trees was not greatly reduced by including incomplete characters".  As
Kerberosaurus is monophyletic by definition in Bolotsky and Godefroit's
analysis, and a character that couldn't be coded for it would still be coded
for 90% of the taxa, there would most likely be a positive effect from
adding characters not codable for Kerberosaurus.  The final quote from
Wiens' Recommendations For Empirical Studies section is quite enlightening-

"It should also be noted that the simulations in this study were designed to
include a worst-case scenario for including incomplete characters, and that
this scenario may be relatively unusual. For example, simulations based on
randomly distributing incomplete taxa on 16 and 64-taxon phylogenies suggest
that adding incomplete characters either improves or has little effect on
accuracy, and should be either beneficial or mostly harmless. In no case did
adding incomplete characters significantly decrease accuracy, but under many
conditions adding incomplete characters significantly increased accuracy."

> >Finally- "Because missing data may influence cladistic analysis in rather
> unpredictable ways (Platnick et al., 1991), we also left out taxa known to
> be too incomplete, or requiring systematic revision, such as Hadrosaurus
> foulkii Leidy, 1858, Kritosaurus navajovius Brown, 1910 (5Anasazisaurus
> horneri Hunt and Lucas, 1993 1 Naashibitosaurus ostromi Hunt and Lucas,
> 1993), Lophorhothon atopus Langston, 1960, Claosaurus agilis (Marsh,
1872),
> Secernosaurus koerneri Brett-Surman, 1979, Aralosaurus tuberiferus
> Rozhdestvensky, 1968, or Shantungosaurus giganteus Hu, 1973."
> WHEN will people learn poorly coded taxa do NOT negatively influence
> results?  They _can_, but they can also have definite placements, or their
> unique character combinations can suggest better results.<
>
> I think you have been romanced by the black box, phylogeny programs do not
> cope well with poorly coded taxa, and the authors are quite correct.

Phylogeny programs can cope with poorly coded taxa just fine.  Wiens finds,
"Thus, the limited completeness of a taxon is not a constraint on whether it
can be accurately placed in an analysis, but may be a constraint on whether
or not it will improve the estimate of relationships among the complete
taxa."  And furthermore, "Recent simulation results suggest that highly
incomplete taxa can be included and accurately placed in phylogenetic
analyses, given enough overall characters in the analysis.  In fact, the
level of completeness seems to be a poor criterion for deciding whether or
not to include a taxon, ...".  "Given enough characters in the analysis, it
is possible to have extremely accurate resolution when including taxa that
are only 5% complete and that have nearly 2000 missing data cells each."
"As long as enough characters are sampled in the incomplete taxa to
accurately place them on the tree, then the amount of missing data seems to
have little impact. This general result appears to be extremely robust to
changes in the
simulation parameters, including the number of taxa (16 vs. 64), tree shape
(fully asymmetric vs. fully symmetric), type of data (binary vs. DNA),
different ways of distributing missing data among characters (the same set
of characters incomplete in every incomplete taxa vs. incomplete characters
selected randomly in each incomplete taxon; Fig. 1), and different ways of
distributing incomplete taxa on the tree (randomly selected taxa vs.
evenly-spaced on the phylogeny)."

> >In any case, the analysis of so few characters, obviously designed to
have
> a
> preconceived result (CI = .92), is of little use.  I mean, there are two
> (count them- two) discordent codings in the matrix (Maisaura lacks
character
> 7, unlike other hadrosaurines; lambeosaurines and saurolophins both have
> character 21).  The Sereno-esque "analyses" must stop!<
>
> Wow, one could say the same for your approach. The problem with orthodox
> cladistic coding is that it leaves out the numerous details that cannot be
> coded. I agree that including them can introduce subjectivity into the
> analysis, but when working with a sparce set of data, which most of
> paleontology does, the predictive power of cladistics is dubious as well.
> Cladistics is useful for illustrating the relationships among taxa for
which
> the fewest number of state changes occur between related individuals
> averaged over all individuals. But to make the fantastic leap that this is
> the only way data can be scientifically analyzed is really just as
> subjective.

No, one couldn't say the same for my approach (include all characters and
taxa you can find), since I have far more characters (243 currently) and a
low CI (~.3-.4) which indicates my analysis is not biased towards a
particular result.  Only 14 of my character states aren't discordant.
Now, if you want to argue against the utility of cladistic analyses
themselves, that's a different subject.  First, I should note parsimony is
not the only method by which to judge a cladistic analysis, with liklihood
and others existing as well.  What do you mean by "numerous details that
cannot be coded"?  Even if including subjective details could increase the
accuracy of phylogenetic hypotheses, how are we to determine if/when we
should use them if they are subjective?  Where is the science?  I'll refrain
from arguing my point further until I know exactly what kind of uncodable
data you are referring to.

Wiens, 2003. Incomplete taxa, incomplete characters, and phylogenetic
accuracy: Is there a missing data problem? Journal of Vertebrate
Paleontology 23(2):297-310.
http://life.bio.sunysb.edu/ee/wienslab/wienspdfs/2003/jvp.pdf 
>From http://life.bio.sunysb.edu/ee/wienslab/publicationpage.html 

Mickey Mortimer
Undergraduate, Earth and Space Sciences
University of Washington
The Theropod Database - http://students.washington.edu/eoraptor/Home.html