Missing data in large, multispecies datasets. The problem, the solutions, what to do and what NOT to do.
Large trait databases from multiple species are a common source of data in studies of macroevolution and macroecology. These databases are never complete and show various levels of data missingness.
I will discuss common statistical methods for dealing with the missing data problem, focussing on multiple imputation (first proposed by Don Rubin in 1987), and present results of a simulation study where I developed new data imputation methods that incorporate species' phylogenetic (evolutionary) relationships.
Since most(?) hypotheses that are examined using these databases require incorporation of phylogenetic information, and imputation methods should ideally presage the subsequent analysis, phylogenetic imputation methods are sorely needed. I will finish with a discussion of future research directions.
About Mathematical biology seminars
We present regular seminars on diverse topics in mathematical biology. The seminars often show how dynamical systems, probability, or other mathematical techniques help us understand and manage biological systems, from microscopic cells to the world's largest ecosystems.
All are welcome, and past audiences have been diverse. The majority of the audience is made up of applied mathematicians, but pure mathematicians, biologists, and other scientists often attend as well.
Talks should be pitched at a level such that HDR students in mathematics and quantitative biology are able to understand the content.
These seminars are held at various times throughout the year.