Tech Report CS-94-07

Context-Sensitive Statistics for Improved Grammatical Language Models

Eugene Charniak and Glenn Carroll

February 1994

Abstract:

We develop a language model using probabilistic context-free grammars (PCFGs) that is ``pseudo context-sensitive'' in that the probability that a non-terminal $N$ expands using a rule $r$ depends on $N$'s parent. We derive the equations for estimating the necessary probabilities using a variant of the inside-outside algorithm. We give experimental results showing that, beginning with a high-performance PCFG, one can develop a pseudo PCSG that yields significant performance gains. Analysis shows that the benefits from the context-sensitive statistics are localized, suggesting that we can use them to extend the original PCFG. Experimental results confirm that this is both feasible and the resulting grammar retains the performance gains. This implies that our scheme may be useful as a novel method for PCFG induction.

(complete text in pdf or gzipped postscript)