Show report in:

UMINF 15.05

Probabilistic Lexicalized Tree-Insertion Grammars in Automatic Speech Recognition

We evaluate probabilistic lexicalized tree-insertion grammars (PLTIGs) on a classification task relevant for automatic speech recognition. The base-line is a trigram model, smoothed through absolute discounting. Both language models are trained on an unannotated corpus, consisting of 10 000 sentences collected from the English section of Wikipedia. For the evaluation, an additional 150 random sentences were selected from the same source, and for each of these, approx. 3000 variations were generated. Each variated sentence was obtained by replacing an arbitrary word by a similar word, chosen to be at most 2 edits from the original. In our experiments, the $N$-gram model preferred one of these alternative sentences in 43.1 percent of the cases, while the PLTIG was only mistaken in 3 percent of the cases.

Keywords

Automatic speech recognition, formal grammars, probabilistic lexicalized tree-insertion grammars, language models, N-grams

Authors

Johanna Björklund and Marcus Karlsson

Back	Edit this report
Entry responsible: Johanna Bjorklund

UMINF-series

Actions

Page Responsible: Frank Drewes
2025-07-19