[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]
Consistency index was Re: Clarification of scope of paleoart->uses
Sure it is. If they're in the matrix, they're already scored for
the N taxa, and the person adding taxon N+1 only has to score it
once, not N+1 times. You've already done the work, so why throw it
away?
I'm not saying throw it away. Mention in the text what autapomorphies
you've discovered that were previously unknown. Just why put this
information into the matrix?
(This is assuming that you only use parsimony. Bayesian analyses do use
parsimony-uninformative characters to help determine the model of
evolution.)
> Keeping them in has disadvantages. It makes your matrix appear
> bigger than it is (...impressive as it is, of the 720 characters in
> the supermatrix by Sigurdsen & Green [2011] only 335 are
> informative; no surprise, because they only kept those 25 taxa, out
> of something like 110 or 120, that are represented in all three
> input matrices...) [...]
So state in your abstract how many of the characters are
parsimony-informative.
Great idea. Nobody does it.
> [...] and it increases the CI. Fine, PAUP* will give you the CI
> with and without parsimony-uninformative characters, but it seems
> to be normal to report the former instead of the latter and thus
> make the trees look more robust than they are. And of course, the
> bigger a matrix, the more opportunities there are for glitches.
A side-question: does anyone pay attention to CI? (In practice, it
seems to be basically a measure of how small the matrix is.) If any
number can top the Impact Factor for uninformativeness, it's surely
the CI.
I pay attention to the CI.
If it's insanely high, like the 0.8 to 0.9 of Sereno's early analyses,
this is a good reason to suspect that the characters were cherry-picked
(deliberately or just by laziness!) to support the authors' pet
hypothesis or that other manipulations were going on.
If it's low for the size of the matrix, like the 0.49 of McGowan (2002,
Zool. J. Linn. Soc., albanerpetontids and origin of lissamphibians) for
a matrix of 19 ingroup taxa and 41 characters, that shows that the
matrix is "balanced" and not (or not much) biased towards any particular
hypothesis, even though it's so tiny that one should expect random
imbalances from this alone.
Finally, if it's insanely high but manipulation would be a very
unparsimonious assumption, I am suitably impressed. The case I've seen
is Rexová et al. (2003, Cladistics). That's an analysis of a matrix with
85 Indo-European languages and 200 meanings. These meanings are taken
from a standardized list of 200 meanings that are considered "core
vocabulary" (words that are probably less easily borrowed than most
others -- body parts, basic kinship terms, personal pronouns...). The
aim of that study was to show that vocabulary data alone, without data
from grammar or from the sound system, are enough to reconstruct the
phylogeny of languages to a useful degree. Some historical linguists had
claimed that only morphology (grammar at the word level) is of any use,
which would mean that the phylogeny of families of isolating languages
(which lack grammatical endings or the like) would be impossible to
reconstruct; the CI of 0.84 proves them wrong. Indeed, this incredibly
high CI makes me think that core vocabulary could be used to look for
relatives of Indo-European, something very few people have ever
attempted and some, perhaps many, consider completely futile.