We consider the problem of nding the occurrences of a pattern tree t in a treebank represented as a directed acyclic graph g, and propose a two-tiered technique, consisting of a preprocessing routine and a search algorithm. We assume that whereas the treebank itself is large and more or less static, the pattern tree is small and frequently updated. To model varying abstraction levels in the data, we work with partially ordered alphabets and compute simulation relations rather than equivalence relations. For instance, vertices and edges are labelled with elements from a pair of preorders S and P instead of unstructured alphabets. Under the above assumptions, we obtain a search algorithm that runs in time O(|S| + height(t)|t|(V/R)^2|P|), where (V/R) is the number of equivalence classes in the coarsest simulation relation Rg on the vertex set V of g. The size of the treebank thus only aects the running time of the search algorithm indirectly, and this is due to the groundwork done by the preprocessing routine in time O(|S| + height (g)|g|j(V/R)^2|P|).
Page Responsible: Frank Drewes 2024-11-21