skip navigation

This page looks better in modern browsers. Please upgrade.

Brown Home Brown Home Brown Home Brown CS

Thesis Proposal

 

"Unsupervised Bayesian Lexicalized Dependency Grammar Induction"

Will Headden

Wednesday, December 3, 2008 at 2:00 P.M.

Lubrano Conference Room (4th Floor CIT)

This thesis investigates learning dependency grammars for statistical natural language parsing from corpora without parse tree annotations. Most successful work in unsupervised dependency grammar induction has operated from parts-of-speech, ignoring words and using extremely simple probabilistic models. However, supervised parsing has long shown the value of more sophisticated models which include the use of lexical features. These more sophisticated models however require probability distributions with complex conditioning information, which must be smoothed to avoid sparsity issues.

In this work we explore several dependency grammars that make use of smoothing, and limited lexical features. Our preliminary results yield the highest dependency induction performance on the Penn Treebank WJS10 corpus to date. We propose lexicalizing the dependency grammars further. Additionally, while most previous work has learned from gold standard part-of-speech tags or the single-best result an unsupervised part-of-speech tagger, we instead propose to learn from words and induce the hidden states of the model together with dependencies. In sum, this proposed thesis seeks to extend unsupervised grammar induction by incorporating lexical conditional information, by investigating smoothing in an unsupervised framework, and by integrating hidden state learning with dependency learning.

Host: Mark Johnson


Page Owner: Webmaster Last Modified: Tue Nov 18 14:32:36 2008