[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

RE: Bootstrap question;



Tom,

Thanks again. I appreciate what you are saying, first because taxon exclusion 
is typically a sin and second because every entry affects (more or less) every 
other entry.

What I'm suggesting is this:

Start with 100 taxa. A heuristic search breaks them into two groups of  45 and 
45 plus 10 outgroup taxa.

Knowing that the most derived taxa will have little affect on the most basal 
taxa (after all, they've already contributed their DNA long before), is there 
any problem when I delete the most derived 30 taxa from each clade leaving the 
10 outgroup taxa plus 15 basals from each taxa? That makes 40 rather than 100. 
Later analyses can each take up the rest. 

I mean, dino people are doing essentially the same thing by not including 
parrots and hummingbirds in Cretaceous dino-bird analyses.  

Right? 

I don't think any of us are starting from scratch every time we select taxa 
from Nature's long list to place into our short list. 
I am a big fan of novel pairings and there's only one way to get them. Once 
you've got them, heuristically, it would be a shame to take months of computer 
time to confirm them, via bootstrap, if there can be found a better way.

In any case, I do like your EEKS. 

David Peters
St. Louis

PS. I do think we can know when we're getting it right if a high percentage of 
characters are shared by putative sister taxa. That is, if they look alike in 
almost all respects. What you and I have both seen are too few taxa used in a 
cladogram and certain taxa end up being 'related' by default (ie. pteros and 
dinos in that gawdawful 'Ornithodira').


PPS This has been the worst day I've seen for virus proliferation.




-----Original Message-----
From: "Thomas R. Holtz, Jr." <tholtz@geol.umd.edu>
Sent: May 23, 2005 9:40 AM
To: david peters <davidrpeters@earthlink.net>, dinosaur@usc.edu
Subject: RE: Bootstrap question; & a v? for Dan Varner

> From: owner-dinosaur@usc.edu [mailto:owner-dinosaur@usc.edu]On Behalf Of
> david peters
>
> The problem I've been having is replicate hangup, where a single replicate 
> takes several hours with 99 to go. Very
> discouraging. Is there any sin to divide and conquer? i.e. splitting up the 
> big cladogram into smaller parts for
> analysis? That appeared to give the right answer.

In two words: EEK and EEEK!!!

A) One doesn't know if an analysis gives the "right answer." That's the reason 
for doing analyses: we don't know what the real
branching order was like, and so we use analytical techniques to approximate 
them.

B) Breaking a big cladogram up into smaller parts carries with it a major 
problem: namely, you automatically restrict the possible
sets of trees, and not necessarily in a useful way. For example, if you were 
interested in testing the position of alvarezsaurids,
and most specifically the hypothesis that alvarezsaurids were nested WITHIN 
birds closer to modern birds than is Archaeopteryx, than
to do this fairly you would have to run an analysis that included both 
ornithomimosaurs and various groups of birds (and not just a
single Avialae OTU).

Or alternatively, if you hypothesized that Caudipteryx might be be a bird at 
all, but was in fact basal to the whole
dromaeosaurid-bird clade, than you shouldn't break up the analysis such that 
you only include Caudipteryx in the part that doesn't
include any taxa outside the bird-dromaeosaurid clade!!

That both the above have been done by smart, well-resepected paleontologists 
doesn't mean that they were good analyses. In fact, it
was only when they (and others) did larger, more comprehensive analyses that we 
started to get results that are better defensible.

Yes, long run times are boring. And occupy your computer. But that is part of 
doing science. Just want to point out that REALLY big
analyses (like the major angiosperm ones, and some other 100s of OTU analyses 
out there) take months of processing time. Them's the
breaks, unfortunately.

                Thomas R. Holtz, Jr.
                Vertebrate Paleontologist
Department of Geology           Director, Earth, Life & Time Program
University of Maryland          College Park Scholars
        Mailing Address:
                Building 237, Room 1117
                College Park, MD  20742

http://www.geol.umd.edu/~tholtz/
http://www.geol.umd.edu/~jmerck/eltsite
Phone:  301-405-4084    Email:  tholtz@geol.umd.edu
Fax (Geol):  301-314-9661       Fax (CPS-ELT): 301-405-0796