E0 370: Statistical Learning Theory

August - November 2011

Department of Computer Science & Automation
Indian Institute of Science

[Course Description]  [Announcements]  [Assignments]  [Lectures] 

Course Information

Instructor: Dr. Shivani Agarwal (shivani@csa)
Meeting times/venue: Tu-Th 11:30am-1:00pm, CSA 252 (Multimedia Classroom)
First class meeting: Tue Aug 9
Informational office hours: Thu Aug 4, 11:30am-1:00pm, CSA 201
Office hours: By appointment

Course Description

The course aims to provide an introduction to some of the tools, techniques, and results that constitute the theoretical foundations of modern machine learning. Topics planned to be covered include the following: References: The course will not follow any single textbook; lecture notes will be made available online and pointers to relevant literature will be provided. In addition, the following references may be useful as supplementary material: Prerequisites: A strong foundation in probability and statistics, and some previous exposure to machine learning. Some background in linear algebra and optimization will be helpful.

Course information handout



Assignment 1

Guidelines: [hw1.pdf]
Templates: [hw1-template.tex] [hw1-template.pdf]

  1. David McAllester.
    Simplified PAC-Bayesian Margin Bounds.
    In Proceedings of the 16th Annual Conference on Learning Theory (COLT), 2003. [pdf]

    Related papers that may be helpful:

  2. Ralf Herbrich and Robert C. Williamson.
    Algorithmic Luckiness.
    Journal of Machine Learning Research, 3:175-212, 2002. [pdf]

Assignment 2

Guidelines: [hw2.pdf]
Templates: See Assignment 1 above

  1. Mehryar Mohri and Afshin Rostamizadeh.
    Stability Bounds for Stationary φ-mixing and β-mixing Processes.
    Journal of Machine Learning Research, 11:789-814, 2010. [pdf]

  2. Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, and Karthik Sridharan.
    Learnability, Stability and Uniform Convergence.
    Journal of Machine Learning Research, 11:2635-2670, 2010. [pdf]

Project Guidelines

Project reports are due by email before class on Nov 17 Nov 22. A good project report will
  1. clearly state and explain the problem/topic;
  2. provide a comprehensive, self-contained literature survey on the topic -- this means you must clearly explain and put into context related previous work, with all necessary details such as previous problem formulations and important results/proofs/techniques (much as you have been doing with your assignments, but focusing on the connections with your project); and
  3. describe any initial ideas or results you have developed.
Point 2 above is especially important -- remember that grades for your report will be based on what you write in the report (do not expect the instructor to read other papers to understand what you have written). Use the following template to prepare your report, which is expected to be roughly 12-16 pages in length (any additional material needed to understand the report can be included in an appendix). Feel free to talk to the instructor if you have any questions.

Templates for project report: [project.tex] [project.pdf]


Scribe guidelines

Lecture Date Topic Notes Preliminary Scribe Notes
Partial Notes in Progress
1 Tue Aug 9 Introduction
Probabilistic and game-theoretic settings of machine learning
2 Thu Aug 11 Support vector machines [2.pdf]
3 Tue Aug 16 Uniform convergence and growth function/VC-entropy [3.pdf]
4 Tue Aug 23 VC-dimension and Sauer's lemma [4.pdf]
5 Thu Aug 25 Covering numbers, pseudo-dimension, and fat-shattering dimension [5.pdf]
6 Tue Aug 30 Margin analysis [6.pdf]
7 Tue Sep 6 Rademacher averages [7.pdf]
8 Thu Sep 8 Algorithmic stability [8.pdf]
9 Tue Sep 13 Model selection [9.pdf]
10 Thu Sep 15 Excess error, approximation error, and estimation error [10.pdf]
11 Thu Sep 22 Learnability: basic definitions, examples, results [11.pdf]
12 Tue Sep 27 Learnability: more examples and results [12.pdf]
13 Thu Sep 29 Learnability of decision trees [13-reference]
14 Tue Oct 18 Learnability of decision trees under uniform distribution with queries,
using Fourier analysis on the Boolean hypercube
15 Thu Oct 20 Boosting [15.pdf]
16 Tue Oct 25 Consistency of nearest neighbor methods [16.pdf]
17 Thu Nov 3 Consistency of convex risk minimization methods    
18 Tue Nov 8 Online classification: perceptron and winnow [18.pdf]
19 Tue Nov 15 Online regression: gradient descent and exponentiated gradient    
20 Thu Nov 17 Online learning from experts: weighted majority and hedge   [20-scribe1.pdf]
21 Thu Nov 24 Online learning from experts: minimax regret   [21-scribe2.pdf]
22 Tue Nov 29 Online-to-batch conversions    

Reference for Lecture 13

References for Lecture 14

A more recent paper that explores agnostic learning of decision trees (as opposed to learning in the target function setting), also w.r.t. the uniform distribution over the Boolean hypercube and with membership queries: