A Unified Theory of Learning and Information (Dr. Ibrahim Abdulmohsin, Saudi Aramco)

<div>In this talk, I will introduce the notion of &quot;uniform generalization&quot; across all loss classes and show how it bridges the gap between statistical learning theory and information theory. Using uniform generalization, the phenomenon of overfitting can be fully captured by information-theoretic quantities, from which we derive uniform generalization bounds for finite hypothesis spaces, sample compression schemes, differential privacy, and finite domains, among others. We will prove that, unlike traditional guarantees of generalization in expectation, uniform generalization in expectation implies concertation with explicit rates. Moreover, we establish that under the Axiom of Choice, the existence of an empirical risk minimization (ERM) rule that generalizes uniformly in expectation across all loss classes is equivalent to the assertion that the hypothesis space has a finite VC dimension, thus connecting the VC dimension with information theory. Similarly, asymptotic bounds are established for stochastic convex optimization in terms of the number of model parameters. Finally, we present two immediate applications of this theoretical work. First, we show that the ERM of strongly-convex loss classes can be trivially scaled to big data using a simple parallelization algorithm whose performance is provably equivalent to the full ERM learning rule. Second, we propose a simple information criterion for model selection for big data and demonstrate experimentally that it outperforms the popular Akaike’s information criterion (AIC) and Schwarz’s Bayesian information criterion (BIC).</div>

Speakers

Ibrahim Alabdulmohsin

Saudi Aramco