|
Frederick Reiss et al. have shown that a massive gain in performance is possible when using an algebraic approach for rule-based information extraction instead of employing cascading regular expressions. The DIMA Group is currently developing an extension of a relational database based on these ideas (INDREX/MIA). The bachelor thesis describes this extension, the (hypergraph) schema used to store and query extracted data, and the difficult syntax of queries resulting from this. It then proposes an extensible, domain specific language to simplify the construction of basic queries by using designated environments integrated into SQL. The result is a short-hand notation for defining annotations, accessing their information, and constructing partial subgraphs via patterns.
|