# Statistical model selection, with a focus on high-dimensional data

**Project level:** PhD, Masters, Honours

A variety of assessment methods exist to aid in the choice of an optimal statistical model or the set of explanatory variables to be used with it. The ability to measure and store data increases yearly leading to a growing demand for the analysis of high-dimensional data in many fields e.g. genetics marketing finance and engineering. In most cases the number of observations is insufficient to identify important details of the originating distribution with much accuracy. However a great deal of useful analysis is still possible. This project seeks to determine the limits of such analysis and to produce methodology which allows accurate inferences to be drawn. It would build on methods such as those described in the following papers. Zhu J.X. McLachlan G.J. Ben-Tovim Jones L. Wood I.A. (2008) On Selection Biases with Prediction Rules Formed from Gene Expression Data Journal of Statistical Planning and Inference 138 374-386. Wood I.A. Visscher P.M. Mengersen K.L. (2007) Classification based upon gene expression data: bias and precision of error rates Bioinformatics 23 1363-1370.