[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Re: Kerberosaurus manakini



Jaime Headden wrote-

>   Very amazing, in fact, that so much "complete" data doesn't result in a
> high CI, as dictated previously, but a low value of completeness does the
> same. Indicating that completeness and CI lack as secure a relationship as
> was implied. Hence the misunderstanding that appears to be occuring. A
> misrepresentation of the _ordering_ of the characters was implied due to
> the statement that Bolotsky and Godefroit were "loading" their tree with
> only support, somehow validated by thew arrangement of their characters;
> this is indicated (last post) as being incorrect.

Glad we cleared that up.

>>   As Mickey has shown, however, removing taxa with low percentage of
coded
>> characters DOES alter the matrix, so there is an effect. However, less
>> tested in these analyses is the effect of removing those characters which
>> have less than 50% coded positions among all taxa.
>
> <No, that was tested by Wiens (2003) as well.  He concluded this generally
> has a negative effect on the accuracy.>
>
>   I have read the paper, as well as use and lack of use of wild-card taxa
> (other papers, including Norell, Wroblewsky, and Wheeler, etc.). The
> implication, oddly enough requiring a contradictory statement from Mickey,
> was that Mickey had pointed out Wiens work. Hence, "As Mickey has shown."
> But I guess we need to be objectionable even when a criticism was never
> offered.

I do not have to be objectionable! ;-)
Seriously, the issue here is that you had two statements, and I was only
disagreeing with the second (i.e. the effect of removing poorly coded
characters is less tested).

> <By enforcing a less parsimonious topology, given your data set, yes.>
>
>   By enforcing ANY topology, trees with varying positions for the key
> taxa, such as enforcing an (Allosaurus (Ornithomimus (Tyrannosaurus
> (Deinonychus + bird)))) are not considered part of the data set; they are
> ignored. The topology will result in a lower consistency index of provided
> characters since some characters will in essence be given negative weight
> (or rather, some will be given positive weight, rather than null values in
> the matrix). I have never ran a data set with an enforced topology except
> to assess variation of values and bootrap support and arrangement of "hot"
> taxa (taxa that appear to "bounce" around the tree). To get the relatively
> high CI and shorter run, in fact some researchers, including our dear
> Holtz, have enforced topologies to test matrices. This is good for looking
> at testing the quality of the analysis, but not to find a "good"
> phylogeny.

Nope, you're wrong here, Jaime.  As a test, I ran the Atrociraptor matrix
(Currie and Varricchio, 2004) twice.  Once, unconstrained.  The CI of the 1
most parsimonious tree was .8033, which is likely what the .80 estimation
the authors reported was based.  Then, I ran it with the topology
(Troodontidae(Dromaeosaurus(Deinonychus,Velociraptor))) constrained.  This
topology was chosen because it is found in the unconstrained most
parsimonious tree.  Predictably, the CI of the constrained most parsimonious
tree was also .8033.  Constraining in itself does not affect Consistancy
Index.  Enforcing constraints does nothing to weight characters, it simply
reduces the amount of trees that are tested to those which correspond to the
enforced topology.  You will never be able to increase your CI by enforcing
a topology different from your MPT's.

> <Really?  Why would you cull a character a posteriori?  If your matrix
> finds a topology despite "bad characters", why erase them from your
> published matrix?  It only ends up misrepresenting your analysis.  And why
> wouldn't you report this action in your Methods section?>
>
>   I don't use this option. However, the logic is as follows:
>
>   A) If your matrix affords that a given character provides change in the
> phylogeny with or without it, it is a "spandrel," lacking any informative
> use. In fact, such a character only increases the CI and lowers the HI
> indeces, which may lead to false appraisal of the "better" analysis. The
> use of the higher CI problem faced by some versus a lower CI is felt as
> beneficial. Note: Kearney and Clark (2003) regard this issue in much
> better detail than does Wiens (2003).
>
>   Kearney, M. & Clark, J.M. 2003. Problems due to missing data in
>   phylogenetic analyses including fossil: A critical review. _Journal of
>   Vertebrate Paleontology_ 23 (2): 263-274.
>
>   Including of excessive characters that provide no additional data do
> nothing BUT higher the CI. At which point do they become important to the
> production of a phylogeny? They can be listed under "supportive
> information" without ever being a part of the analysis.

This philosophy is nonsensical to me.  Nearly every character in a
phylogenetic analysis is a spandrel.  As long as there is one other
character to support the node(s) supported by the character in question, the
latter is a spandrel.  All you need is one character per node to make your
tree, less if some characters are ordered or homoplasious.  Indeed, you can
make a pectinate topology (A(B(C(D(E(F,etc.)))))) with a single ordered
character.  Everything else would be a spandrel in that analysis.  Spandrels
DO provide additional data because they provide support for clades.  It may
be unnecessary support, with the other characters present, but phylogenetic
analyses are about mustering up all the support you can for everything, and
seeing what wins out.  Nobody deletes spandrels from their analysis, because
they have the beneficial aspect of making support for clades stronger.  If
there were no spandrels, everything would have a Bremer support of 1, and
bootstraps would be pitiful in even the most complete datasets.  Spandrels
have no definite effect on CI (or HI, which is just 1 minus the CI), since
they could have high or low CI's themselves.  This means Sereno's biased
matrices can't be due to culling of spandrels.  Indeed, looking at his
matrices indicates he has numerous spandrels.  Fourteen of Bolotsky and
Godefroit's 21 characters are spandrels (1-5, 8-12, 14, 15, 17 and 19).
What you may be thinking of is "parsimoniously uninformative characters".
These are a type of spandrel that cannot affect the topology in any
situation.  Apomorphies of a single OTU are an example (as you note below),
or characters coded the same for every taxon.  They are identified by PAUP
when the analysis is run, and do increase the CI if not excluded (though
PAUP gives an additional CI excluding uninformative characters).  This is
not a problem of Sereno's analyses though.  Wilson et al.'s basal theropod
analysis has no uninformative characters, for instance.

>   B) If the character is scorable only for a single taxon (apomorphy), it
> also lacks any value to the matrix, and does nothing but increase branch
> length of a taxon from it's neighbors.
>
>   These reasons would give you criteria for excluding a character
> post-analysis.

Agreed.  But they can't be why Sereno culls characters.

> Kearney and Clark (2003) and Wilkinson (2003) both discuss
> wildcard problems and how one can cull TAXA from the analysis due to lack
> of any effective change. Again, these can be applied outside the analysis.
> This would usually be effective in noting polytomies where effective
> beneficial information is absent.
> >snip<
>   One can therefore safely prune without missing or loosing resolution or
> information both _a priori_ and _a posteriori_. It seems ridiculous, but
> it _is_ effective and no loss of information occurs at deleting taxa or
> characters that have NO benefit but to raise ot decrease the indeces,
> increase number of MPT's (most parsimonious trees for those who don't
> know) and, decrease bootstrap or jackknife values.

I completely agree.

> <Because there are valid reasons to think such characters are correlated,
> as you know.>
>
>   We _assume_ correlation, and we assume it is negligible. This is still
> _a priori_ exclusion of information that can have a phylogenetic
> importance.
> >snip<
> There is otherwise
> little reason to exclude them other than the "correllation" problem of two
> characters sharing states or "being" the same thing; aka, "splitting" of
> characters and over-weighting of a condition. It is still phylogenetically
> informative. We cull these _a priori_ because we think, as do some others
> for other characters or taxa, they are detrimental to the analysis or
> would enforce topologies we "know" are wrong (aka, tyrannosaurs as
> "carnosaurs").

I'm sure anyone thinks correlated characters are detrimental to analyses due
to the weighting problem.  But yes, it is a difficult and somewhat
subjective thing to cull them.

> <Sereno is obviously not just culling correlated characters from his
> matrix, because these would not affect the CI in most cases.>
>
>   Loss of a character that shows an effective change in the matrix or
> phylogeny or length of branches _will_ cause a change in CI.

True, I should have said "affect the CI _in a predictable and significant
way_ in most cases".  I'm assuming correlated characters are randomly
"connected" to both high CI characters and low CI characters.

> <So the "extraneous" correlated character will have a similar CI to its
> partner, and the resultant CI of the matrix will be similar to a matrix
> without the extraneous character.>
>
>   See points above, as this agrees with Wilkinson (2003) -- however, it
> does so only for the "extraneous" character, except that support values do
> change. Notably, relation of two characters with effectively 3 states
> between then can have a variable expression, as in the relationship of
> tooth loss and extent coded in separate characters:
>
>                  12
> Hesperornis      01
> Incisivosaurus   01
> Caudipteryx      11
>
> 1. Maxilla: teeth present anteriorly (0); absent (1).
> 2. Maxilla: teeth present caudally (0); absent (1).
>
>   Limited to this, the run finds a topology of (Caudi (Hesper +
> Incisivo)).

According to you (pers. comm.), you mistyped the matrix.  It should have
been-
                  12
 Hesperornis      10
 Incisivosaurus   00
 Caudipteryx      11
In this case, the run would find the topology (Incisivo(Hesper, Caudi),
assuming an all zero outgroup to establish polarity.

> Ignoring the other characters that would show convergence or
> latency, this can be treated as one character in which the matrix then
> represents a series of polymorphies that attempt to show the same form of
> data -- the "machine" attempts to treat it as a set of two characters:
>
>                  1
> Hesperornis      (12)
> Incisivosaurus   (01)
> Caudipteryx      (22)
>
> 1. Maxilla: teeth present anteriorly (0); teeth present caudally (1);
> teeth absent anteriorly (2); teeth absent caudally (3). "unordered"
>
>   The result places (Incisivo (Caudi + Hesper)) due to convergent loss of
> a portion of the maxillary dentition -- or maybe its synapomorphic loss?
>
>   Thus the use of "correllated" characters DOES alter the field.

Again (pers. comm.), you mistyped the matrix.  It should have been-
                  1
 Hesperornis      (12)
 Incisivosaurus   (01)
 Caudipteryx      (23)
Assuming a (01) outgroup (the equivalent of an all zero outgroup in the
prior matrix) results in the same as before.  Your example is invalid.

> I fail to
> see very many correllated characters that are noted ONLY for the ingroup
> that possess equal distribution. If such were there, I would suspect
> "weighting" or "loading" the data set. Some features that crop up with
> regards to size-related features include the presence of rugose scarring
> or processes not otherwise present in smaller forms. However, both
> tyrannosaurids and *Caudipteryx* show rugosities and "laminae" of the ilia
> that appear to be absent in most of the rest of Coelurosauria (I note a
> few other taxa also possess these, many of which are small, as well as
> large hadrosaurs and sauropods) -- a function of age? Such inclusion thus
> has a phylogenetic signal, as it would tie two taxa together outside of
> mutual size ranges.

Yes, correlated characters are fun.  I tend to include characters with
unequal distribution, even if they are claimed to be correlated.  Obviously,
they aren't strictly correlated, so they must include some additional
information.  But maybe something not obvious makes the character strictly
correlated with another one in _some_ clades, but not in others.  There's
little way to know this, and thus subjectivity in character choice raises
its head again.

> <I can see this happening in some matrices.  I _cannot_ see this happening
> in _every single_ matrix Sereno makes, regardless of the number of
> characters, taxa or which group he is examining.  So Sereno's culling is
> not due to character correlation, which leaves the question of his reason
> for culling characters open.>
>
>   Ask him why, then. He is, to my knowledge, in the country. Do this
> instead of making assumptions of about why he did it and mocking him or
> his work or similar works for things that one has no knowledge of, only
> assumptions.

Perhaps I will.  Still, the lack of our ability to fathom a defendable
reason for culling characters such as Sereno does (he's not culling
uninformative, spandrels or correlated characters; the first would lower CI,
and the latter two have unpredictable affects on CI) is not encouraging.

> <As for your oh-so-ambiguous reference to my coelurosaur analysis posts,
> you hit the nail on the head with your use of the term "casually". In the
> publication of my phylogeny, discussion will be included. Right now
> though, my posts just serve as updates to those interested.>
>
>   Why? I do not see people ask. The two people who seemed most eager for
> updates were Ken Kinman and Stephan Pickering who, to my knowledge, only
> considered the content of the phylogeny, rather than the support for them.
> Perhaps a separate message board where data will actually be presented and
> responded to. This is not meant to be mocking or insulting, but when data
> results are presented, usually one should cite their derivation or
> supporting data. This would be equivalent to making statements of
> relationship as in "I find this to be more likely," and run with the ball,
> in papers by Holtz or Sereno or Brochu, who always presented their data in
> full. This provides the analytical utility of such, whereas before they
> lack it.

Well, if people want me to stop, I will.  The last update generated a thread
with nine posts though, which isn't bad for this forum.

Mickey Mortimer
Undergraduate, Earth and Space Sciences
University of Washington
The Theropod Database - http://students.washington.edu/eoraptor/Home.html