SAMSI 2009 Seminar Algebraic Statistics and Experimental Design January 2009 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Workshop Algebraic Statistical Models 18 19 20 21 22 23 24 Organizational meeting, 12:00 P.M. Monday 25 26 27 28 29 30 31 Mark Huber, 11:30 A.M. Monday February 2009 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 Giovanni Pistone, 11:30 A.M. Monday 8 9 10 11 12 13 14 Wenjie Chen, 11:30 A.M. Monday 15 16 17 18 19 20 21 Luis Garcia, 11:30 A.M. Monday 22 23 24 25 26 27 28 Larry Cox, 11:30 A.M. Monday March 2009 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 Spring Break NCSU, no talks 8 9 10 11 12 13 14 Adrian Dobra, 11:30 P.M. Monday 15 16 17 18 19 20 21 Edwin O'Shea, 11:30 A.M. Monday 22 23 24 25 26 27 28 Rudy Yoshida, 11:30 A.M. Monday 29 30 31 Simon Lunagomez, 11:30 A.M. Monday April 2009 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 Sesa Slavkovic, 11:30 A.M. Monday 12 13 14 15 16 17 18 Saeid Yasamin, 11:30 A.M. Monday 19 20 21 22 23 24 25 Mathias Drton, 11:30 A.M. Monday 26 27 28 29 30 Eva Riccomagno, 11:30 A.M. Monday May 2009 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 Giovanni Pistone, 11:30 A.M. Monday 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Jan 26: Mark Huber "Approximating the number of linear extensions of a poset" Feb 2: Giovanni Pistone "Algebraic features of cumulants" Feb 9: Wenjie Chen "Reverse engineering fMRI data" Feb 16: Luis Garcia "Linear precision for toric patches in maximum likelihood estimation for toric models" Abstract: In geometric modeling, linear precision is the ability of a patch to replicate affine functions. While classical Bezier patches possess linear precision, it is not clear which exotic patches (e.g. toric patches) have this property. In fact, every patch has a unique reparametrization having linear precision---but the resulting blending functions are not necessarily rational functions. I will give background and explain how linear precision is related to maximum likelihood estimation. In particular, I will show that toric patches are linear projections of toric statistical models. I will also show how to use iterative proportional fitting to compute patches in geometric modeling. This is joint work with Frank Sottile. Feb 23: Larry Cox "Using Linear Programming to Construct Markov moves in Contingency Tables" Abstract: www.stat.duke.edu/~ihd/Cox.2.23.2009.abstract.pdf Mar 9: Adrian Dobra "Statistical issues in the analysis of contingency tables" Abstract: I discuss key questions relating to the analysis of categorical data: the validity of asymptotic approximations to the null distribution of test statistics, exact testing methods, sparsity, structural zeros, incompleteness as well as log-linear model selection. Relevant topics I plan to cover include: computation of sharp integer bounds for cell entries, simulation from probability distributions on spaces of tables, Markov bases and Bayesian inference with conjugate priors. I will give several examples and talk about open problems. Mar 16: Edwin O'Shea "FREQUENCY OF LARGE GAPS IN SMALL HIERARCHICAL MODELS (in progress)" Abstract: Examples of contingency tables on binary random variables with large integer programming gaps on the lower bounds of cell entries were constructed by Sullivant. We show that the marginals for which these constructed large gaps occur are exceptionally rare, thus reopening the question, as Sullivant put it, of `` whether linear programming is an effective heuristic for detecting disclosures when releasing margins of multi-way tables.'' This notion of exceptionally rare is made precise through the language of standard pairs. Mar 23: Ruriko Yoshida "Statistical/machine learning methods for cophylogeny" Abstract: In this talk, with statistical methods, such as Monte Carlo Markov Chain (MCMC) and Bootstrap, we are applying kernels on tree structures to assessing codivergences in gene trees; Such trees might be for hosts and parasites (or symbionts), or they may be for distinct, putatively orthologous genes in genomes. There are various reasons, why even codiverging orthologs might not share the same phylogeny. Technical reasons would include misalignments of the sequences, long-branch attraction (depending on the phylogenetic method used), or the possibility that some presumed orthologs are actually paralogs. But even if adequate precautions are taken against these technical problems, it is reasonable to expect some differences between gene trees. This is because species are interbreeding populations that maintain some degree of polymorphism. The mutations that give rise to observed polymorphisms, which are needed for phylogenetic inference, are independent of speciation events. The tendency for gene trees to approximate species trees is due to the tendency for haplotype (allele) lineages to fix shortly after speciation (on an evolutionary time scale). But, if speciation events occur close in time, the precise topology of the gene lineages that eventually fix in the different species need not be that of the other gene lineages. The basic innovations in this method are (1) studying biologically appropriate kernels on the tree structures for classifying gene trees, (2) to apply kernels for pairs of gene trees to allow rigorous tests of their codivergence or deviation from codivergence by classifying the "$\epsilon$ clouds" in the space of gene trees via kernel methods, and (3) to apply such methods formulated as max-margin classification, and solved by classifiers such as support vector machines (SVMs) to the large number of genes available from genome sequences in order to better assess the history of speciation and genome evolution. Mar 30: Simon Lunagomez "Parametrization of Conditional Independence Models via Geometric Graphs" Abstract: We formulate a novel approach to infer conditional independence models or Markov structure of a multivariate distribution. Specifically, our objective is to place informative prior distributions over decomposable graphs and sample from the induced posterior distribution. The key idea we develop in this paper is a parametrization of decomposable graphs using the geometry of points in $\rr^m$. This induces informative priors on decomposable graphs from specified priors on finite sets of points. Constructing graphs from finite point sets has been well studied in the fields of computational topology and random geometric graphs. We develop the framework underlying this idea and illustrate its performance on synthetic data. Apr 6: Sesa Slavkovic "Confidential Contingency Tables Entries: Bounds, Counting & Sampling" Abstract: The main focus of this talk is on partial information release strategy in which data owners due to confidentiality reasons may opt to release relevant marginal and conditional tables along with sample size instead of a full contingency table. We will discuss some results on calculation of bounds on cell entries, and on problems of counting and sampling of the tables from a fiber defined by given observed conditional values. We will draw comparisons between marginal and conditional table releases, and discuss implications for disclosure risk assessment and data utility. May 4: Giovanni Pistone "Algebraic statistical models in kriging"