We evaluate probabilistic lexicalized tree-insertion grammars
(PLTIGs) on a classification task relevant for automatic speech recognition.
The base-line is a trigram model, smoothed through absolute discounting. Both
language models are trained on an unannotated corpus, consisting of 10 000
sentences collected from the English section of Wikipedia. For the evaluation,
an additional 150 random sentences were selected from the same source, and for
each of these, approx. 3000 variations were generated. Each variated sentence was
obtained by replacing an arbitrary word by a similar word, chosen to be at most
2 edits from the original. In our experiments, the $N$-gram model preferred one
of these alternative sentences in 43.1 percent of the cases, while the PLTIG
was only mistaken in 3 percent of the cases.