CS195-5: Introduction to Machine Learning

Fall 2006




General info

Syllabus

Calendar

Projects

Matlab

LaTeX

Mailing List

Course Projects

Timeline

Wednesday, November 22
Abstract due (two pages).
Sunday, December 31
Project writeup is due. Up to 8 pages, in NIPS format. Look for "style files for creating camera-ready papers"; we recommend using LaTeX, but they also provide a template for Word.
No extensions!

Project format

The purpose of the project is to focus on a particular topic and explore it in relative depth. There are three major formats:

  1. A survey.

    A detailed survey of previous work (as reflected in literature) on a clearly defined topic. This should focus on a relatively advanced topic, and therefore probably deal mostly with specialized publications (journal and conference papers) rather than textbook material. The survey should compare and contrast a number of approaches, and not just reiterate points made in each paper.

  2. A novel application of machine learning.

    This should be a non-trivial application. E.g., just taking a classification problem, applying an arbitrarily chosen classifier and getting a result (which may be arbitrarily bad) is not an acceptable project.

  3. Analysis of a machine learning algorithm or model. This can be empirical (well designed, in-depth study of the behavior of a model or algorithm which is not obvious) or theoretical, or both. This broad category includes non-trivial modifications of existing algorithms. E.g., you may identify a shortcoming in a standard technique and suggest a way to fix it, leading to improved behavior of that technique.

Projects in the last two categories could potentially lead to a novel contribution in machine learning and its applications, beyond this course.

Write-up instructions

The writeup should be written as a conference paper. This means that it should communicate the ideas, methods and (most important) conclusions concisely, but with enough detail for a reasonably knoledgeable reader to follow.

The project paper must include the following components.

  • Introduction that clearly states the problem (what it is, and why it is important and/or difficult), and outlines on an intuitive level the solution proposed in the paper.
  • Background, briefly describing related work and its relevance, emphasizing, if appropriate, similarity/differences with what you are doing to solve the problem.
  • Technical description of the model/algorithm; this is the main "meat" of the paper.
  • Experimental evaluation, if relevant. This should be concise but provide enough information for the results to be reproduced, with a reasonable effort.
  • Conclusions. This is a very important part. In principle, a reader should be able to get a good idea what you did and what the take home message from your work is, by reading only the introduction and the conclusions sections.

The contents of a survey paper will of course be slightly different. The introduction in a survey should clearly state the topic, and outline the main conclusions. The

Data sets

The list below is provided simply to give you some ideas of areas in which there exist interesting data sets. This is by no means an exclusive list, and you are encouraged to use other data sets and problems as well.
    Data repositories. These contain many data sets, mostly small ones, suitable for a large-scale comparative study of algorithms or for testing a novel algorithm or model.
  • the Delve project at University of Toronto
  • The UCI ML repository (has pointers to other databases as well)