[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]
RE: Bootstrap question;
Tom,
Thanks again. I appreciate what you are saying, first because taxon exclusion
is typically a sin and second because every entry affects (more or less) every
other entry.
What I'm suggesting is this:
Start with 100 taxa. A heuristic search breaks them into two groups of 45 and
45 plus 10 outgroup taxa.
Knowing that the most derived taxa will have little affect on the most basal
taxa (after all, they've already contributed their DNA long before), is there
any problem when I delete the most derived 30 taxa from each clade leaving the
10 outgroup taxa plus 15 basals from each taxa? That makes 40 rather than 100.
Later analyses can each take up the rest.
I mean, dino people are doing essentially the same thing by not including
parrots and hummingbirds in Cretaceous dino-bird analyses.
Right?
I don't think any of us are starting from scratch every time we select taxa
from Nature's long list to place into our short list.
I am a big fan of novel pairings and there's only one way to get them. Once
you've got them, heuristically, it would be a shame to take months of computer
time to confirm them, via bootstrap, if there can be found a better way.
In any case, I do like your EEKS.
David Peters
St. Louis
PS. I do think we can know when we're getting it right if a high percentage of
characters are shared by putative sister taxa. That is, if they look alike in
almost all respects. What you and I have both seen are too few taxa used in a
cladogram and certain taxa end up being 'related' by default (ie. pteros and
dinos in that gawdawful 'Ornithodira').
PPS This has been the worst day I've seen for virus proliferation.
-----Original Message-----
From: "Thomas R. Holtz, Jr." <tholtz@geol.umd.edu>
Sent: May 23, 2005 9:40 AM
To: david peters <davidrpeters@earthlink.net>, dinosaur@usc.edu
Subject: RE: Bootstrap question; & a v? for Dan Varner
> From: owner-dinosaur@usc.edu [mailto:owner-dinosaur@usc.edu]On Behalf Of
> david peters
>
> The problem I've been having is replicate hangup, where a single replicate
> takes several hours with 99 to go. Very
> discouraging. Is there any sin to divide and conquer? i.e. splitting up the
> big cladogram into smaller parts for
> analysis? That appeared to give the right answer.
In two words: EEK and EEEK!!!
A) One doesn't know if an analysis gives the "right answer." That's the reason
for doing analyses: we don't know what the real
branching order was like, and so we use analytical techniques to approximate
them.
B) Breaking a big cladogram up into smaller parts carries with it a major
problem: namely, you automatically restrict the possible
sets of trees, and not necessarily in a useful way. For example, if you were
interested in testing the position of alvarezsaurids,
and most specifically the hypothesis that alvarezsaurids were nested WITHIN
birds closer to modern birds than is Archaeopteryx, than
to do this fairly you would have to run an analysis that included both
ornithomimosaurs and various groups of birds (and not just a
single Avialae OTU).
Or alternatively, if you hypothesized that Caudipteryx might be be a bird at
all, but was in fact basal to the whole
dromaeosaurid-bird clade, than you shouldn't break up the analysis such that
you only include Caudipteryx in the part that doesn't
include any taxa outside the bird-dromaeosaurid clade!!
That both the above have been done by smart, well-resepected paleontologists
doesn't mean that they were good analyses. In fact, it
was only when they (and others) did larger, more comprehensive analyses that we
started to get results that are better defensible.
Yes, long run times are boring. And occupy your computer. But that is part of
doing science. Just want to point out that REALLY big
analyses (like the major angiosperm ones, and some other 100s of OTU analyses
out there) take months of processing time. Them's the
breaks, unfortunately.
Thomas R. Holtz, Jr.
Vertebrate Paleontologist
Department of Geology Director, Earth, Life & Time Program
University of Maryland College Park Scholars
Mailing Address:
Building 237, Room 1117
College Park, MD 20742
http://www.geol.umd.edu/~tholtz/
http://www.geol.umd.edu/~jmerck/eltsite
Phone: 301-405-4084 Email: tholtz@geol.umd.edu
Fax (Geol): 301-314-9661 Fax (CPS-ELT): 301-405-0796