Cornell Math - MATH 675, Fall 2003
MATH 675
High Dimension Statistical Inference with Applications to Genomics
(Fall 2003)
Instructor: Gene Hwang
There are many statistical concepts that are useful in Genomics. One particular problem with Genomics (e.g. Microarray Data Analysis) is that the number of populations or Genes is large. As a result there are a huge number of hypotheses. How to test these type of hypotheses simultaneously? We will discuss concepts such as family-wise error rate, false discovery rate (FDR) of Benjamini and Hochberg (1995 JRSS B) and Storey's papers relating to pFDR. We will also discuss the fundamental cornerstone of multiple testing, the closed testing method. A shortcut algorithm is called the stepdown testing. See Westfall and Young (1993).
What other statistical inferential technique may be useful for a large number of populations or Genes? The tradition one population approach assuming that all populations are different is too inefficient. It seems interesting to discuss techniques that can combine all observations from all populations together and when the populations are similar they "borrow the strength" from each other and when the populations are very different they go separate ways. In fact, Shrinkage (or Empirical Bayes) technique, or equivalently the BLUP in mixed model can do this. So the course will spend some time discussing these techniques. We will discuss the point estimation and the confidence interval construction. A new approach called the selected mean approach proves to be promising and will be discussed.
Other topics include permutation tests, Bootstrapping, and QTL identification.