Computer Science 241

Statistical Models in Natural-Language Processing

Professor: Eugene Charniak
Chief Cook and Bottle Washer: Matt Lease
Time: Monday 3:00 - 5:30
Mail Group
Room: CIT 345
Text: Foundations of Statistical Natural Language Processing by Christopher Manning and Hinrich Schutze MIT Press 1999

This course covers statistical methods for learning a natural language and applying the knowledge to specific tasks. Topics include: entropy and cross entropy of a language, hidden Markov models, Viterbi algorithm, forward-backward algorithm, trigram models, part-of-speech tagging, probabilistic context-free parsing, inside-outside algorithm, learning probabilistic context-free grammars, statistical models of syntactic disambiguation, statistical anaphora resolution, deriving semantic word classes from statistical properties, and word-sense disambiguation.

Grading is based primarily on the project, and secondarily on the two in-class, 40 minute, exams. Class participation will also be considered. The project is done in groups of 2-4 students. All groups work on the same project. Collaboration between groups is allowed (indeed encouraged), up to, but not including, sharing of code (unless explicitly authorized in class). This semesters project looks at the problem of clustering sentences.

Class Schedule, Fall 2006

All chapter and page references are to the course text.
Week of Reading Assignments
Sept 10 Ch 14
Sept 17 Ch 2 (minus 2.1.10, 2.2.4) Ch 9 to 9.3.1
Sept 24 Ch 9, Ch 10
Oct 1 Ch 10
Oct 8 Ch 11
Oct 15 Exam, Ch 12
Oct 22 Ch 6
Oct 29 Ch 7
Nov 5 Ch 8
Nov 12 Exam
Nov 19 No Class
Nov 26 Project Discussions

Project Assignments

Computer Files for the project can be found in /pro/dpg/cs241/.

Sept 18

Read in Stripped Representations. Find number of words that occur 5 or more times. What sentence includes the ??? occurance of the word "stock". Implemence single-link clustering for word vectors. How well do the days of the week cluster? The stripped represetnation for WSJ sections 2-21 can be found in /pro/dpg/cs241/data/train.strip.


Home Courses
Eugene Charniak