[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

[dinosaur] Phylogenetics in general was Re: Placental Mammal Diversification Across the K-Pg Boundary (free pdf)



 
Gesendet: Samstag, 30. November 2019 um 22:36 Uhr
Von: "Ben Creisler" <bcreisler@gmail.com>
 
A new paper in open access:
 
Free pdf:

Mark S. Springer, Nicole M. Foley, Peggy L. Brady, John Gatesy and William J. Murphy (2019)
Evolutionary Models for the Diversification of Placental Mammals Across the KPg Boundary.
Frontiers in Genetics 10: 1241
doi: https://doi.org/10.3389/fgene.2019.01241
https://www.frontiersin.org/articles/10.3389/fgene.2019.01241/full


[...] At the same time, morphological cladistics has a poor track record of reconstructing higher-level relationships among the orders of placental mammals including the results of new pseudoextinction analyses that we performed on the largest available morphological data set for mammals (4,541 characters). [...]
 
Yes, to the extent that it has a track record. Arguably, it doesn't have any at all, because it's never been done right on eutherian phylogeny: the published matrices all suffer from containing too few taxa, too few characters, too many redundant characters, or two or three of these. For example, the famous matrix by O'Leary et al. (2013), the one with the 4,541 characters, contains only 86 terminal taxa, almost none of them extinct; it's really no surprise that the results are noticeably implausible in several respects, and that's before we get to wondering how many of the hundreds of tooth characters might really be the same.
 
I don't mean to simply scold my colleagues, mind you. As Springer et al. write in the section "Challenges for tip dating": "Finally, the collection of morphological data matrices is time[-]consuming and expensive relative to the amount of data returned, and is not practical for most taxa on the scale of O’Leary et al.’s (2013) data set with > 4,500 phenomic characters for 86 mammaliaform taxa. Nevertheless, the development of these data matrices is crucial for various aspects of timetree estimation, either indirectly for node dating approaches or directly for tip dating approaches."
 
That said, I wonder how expensive it really is. It costs a lot more in person-hours; but the work itself is much cheaper: no expensive machines, no expensive chemicals, no unbelievably expensive electrophoresis gel every week, no ultrafreezers, not even liquid nitrogen. Just pay someone's costs of living and a few travels, and the work will get done.
 
~~~~~~~~~~~~~~~~~~~~~~~~
 
Anyway, Springer et al. cite the following paper which I had managed to overlook:
 
Thomas J. D. Halliday, Mario dos Reis, Asif U. Tamuri, Henry Ferguson-Gow, Ziheng Yang and Anjali Goswami (2019)
Rapid morphological evolution in placental mammals post-dates the origin of the crown group.
Proceedings of the Royal Society B (Biological Sciences) 286: 20182418
doi: https://doi.org/10.1098/rspb.2018.2418
https://royalsocietypublishing.org/doi/10.1098/rspb.2018.2418
 
It contains a total-evidence analysis (genomes and morphology) of eutherian phylogeny done with maximum likelihood. The matrix is modified from an earlier one that was analyzed with parsimony. In section 2b the paper states:
 
"Current phylogenetic software implementations do not correctly account for ascertainment bias (removal of parsimony-uninformative characters) for greater than two morphological character states (Z. Yang 2015, personal observation). Each multistate character of Halliday et al. [8] was therefore separated into two or more binary characters prior to further coding, expanding the character number to 748. Ordered characters cannot be reasonably split without being hugely non-independent. We reduced the number of states within such characters to two by combining states within the sequence, defining the break as the one which resulted in the most even division of taxa. For example, if a character had three states, ‘above’ (represented by 20 taxa), ‘equal’ (represented by 40 taxa) and ‘below’ (represented by 30 taxa), the new character would have states ‘above or equal to’ and ‘below’. Unordered multistate characters were split such that trait presence/absence was scored separately if applicable, while characters composed of multiple, associated observations were split into their component parts."
 
Does anybody understand what this "ascertainment bias" is, and if it applies to anything but maximum likelihood?