Department of Mathematics,
University of California San Diego
****************************
Colloquium
Lucas Janson
Stanford University
Model-free knockoffs for high-dimensional controlled variable selection
Abstract:
A common problem in modern statistical applications is to select, from a large set of candidates, a subset of variables which are important for determining an outcome of interest. For instance, the outcome may be disease status and the variables may be hundreds of thousands of single nucleotide polymorphisms on the genome. For data coming from low-dimensional ($n \ge p$) linear homoscedastic models, the knockoff procedure recently introduced by Barber and Cand\'es solves the problem by performing variable selection while controlling the false discovery rate (FDR). In this talk I will discuss an extension of the knockoff framework to arbitrary (and unknown) conditional models and any dimensions, including $n < p$, allowing it to solve a much broader array of problems. This extension requires the design matrix be random (independent and identically distributed rows) with a covariate distribution that is known, although the procedure appears to be robust to unknown/estimated distributions. No other procedure solves the variable selection problem in such generality, but in the restricted settings where competitors exist, I will demonstrate the superior power of knockoffs through simulations. Finally, applying the new procedure to data from a case-control study of Crohn’s disease in the United Kingdom resulted in twice as many discoveries as the original analysis of the same data.
Host: Ery Arias-Castro
November 30, 2016
3:00 PM
AP&M 6402
****************************