<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//TaxonX//DTD Taxonomic Treatment Publishing DTD v0 20100105//EN" "../../nlm/tax-treatment-NS0.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:tp="http://www.plazi.org/taxpub" article-type="research-article" dtd-version="3.0" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">109</journal-id>
      <journal-id journal-id-type="index">urn:lsid:arphahub.com:pub:3dc5f44e-8666-58db-bc76-a455210e8891</journal-id>
      <journal-title-group>
        <journal-title xml:lang="en">JUCS - Journal of Universal Computer Science</journal-title>
        <abbrev-journal-title xml:lang="en">jucs</abbrev-journal-title>
      </journal-title-group>
      <issn pub-type="ppub">0948-695X</issn>
      <issn pub-type="epub">0948-6968</issn>
      <publisher>
        <publisher-name>Journal of Universal Computer Science</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3217/jucs-016-05-0833</article-id>
      <article-id pub-id-type="publisher-id">29643</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
        <subj-group subj-group-type="scientific_subject">
          <subject>H.3.3 - Information Search and Retrieval</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Evaluating Linear XPath Expressions by Pattern-Matching Automata</article-title>
      </title-group>
      <contrib-group content-type="authors">
        <contrib contrib-type="author" corresp="yes">
          <name name-style="western">
            <surname>Silvasti</surname>
            <given-names>Panu</given-names>
          </name>
          <email xlink:type="simple">psilvast@cs.hut.fi</email>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Sippu</surname>
            <given-names>Seppo</given-names>
          </name>
          <xref ref-type="aff" rid="A2">2</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Soisalon-Soininen</surname>
            <given-names>Eljas</given-names>
          </name>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
      </contrib-group>
      <aff id="A1">
        <label>1</label>
        <addr-line content-type="verbatim">Helsinki University of Technology, Helsinki, Finland</addr-line>
        <institution>Helsinki University of Technology</institution>
        <addr-line content-type="city">Helsinki</addr-line>
        <country>Finland</country>
      </aff>
      <aff id="A2">
        <label>2</label>
        <addr-line content-type="verbatim">University of Helsinki, Helsinki, Finland</addr-line>
        <institution>University of Helsinki</institution>
        <addr-line content-type="city">Helsinki</addr-line>
        <country>Finland</country>
      </aff>
      <author-notes>
        <fn fn-type="corresp">
          <p>Corresponding author: Panu Silvasti (<email xlink:type="simple">psilvast@cs.hut.fi</email>).</p>
        </fn>
        <fn fn-type="edited-by">
          <p>Academic editor: </p>
        </fn>
      </author-notes>
      <pub-date pub-type="collection">
        <year>2010</year>
      </pub-date>
      <pub-date pub-type="epub">
        <day>01</day>
        <month>03</month>
        <year>2010</year>
      </pub-date>
      <volume>16</volume>
      <issue>5</issue>
      <fpage>833</fpage>
      <lpage>851</lpage>
      <uri content-type="arpha" xlink:href="http://openbiodiv.net/C5D13D44-B317-5755-9C16-E2F26CDBF274">C5D13D44-B317-5755-9C16-E2F26CDBF274</uri>
      <uri content-type="zenodo_dep_id" xlink:href="https://zenodo.org/record/7001151">7001151</uri>
      <permissions>
        <copyright-statement>Panu Silvasti, Seppo Sippu, Eljas Soisalon-Soininen</copyright-statement>
        <license license-type="creative-commons-attribution" xlink:href="" xlink:type="simple">
          <license-p>This article is freely available under the J.UCS Open Content License.</license-p>
        </license>
      </permissions>
      <abstract>
        <label>Abstract</label>
        <p>We consider the problem of efficiently evaluating a large number of XPath expressions, especially in the case when they define subscriber profiles for filtering of XML documents. For each document in an XML document stream, the task is to determine those profiles that match the document. In this article we present a new general method for filtering with profiles expressed by linear XPath expressions with child operators (/), descendant operators (//), and wildcards (*). This new filtering algorithm is based on a backtracking deterministic finite automaton derived from the classic Aho-Corasick pattern-matching automaton. This automaton has a size linear in the sum of the sizes of the XPath filters, and the worst-case time bound of the algorithm is much less than the time bound of the simulation of linear-size nondeterministic automata.  Our new algorithm has a predecessor that can handle child and descendant operators but not wildcards, and has been shown to be extremely efficient when a documenttype definition (DTD) has been used to prune out all the wildcards and most of the descendant operators. But in some cases, such as when the DTD is highly recursive, it may not be possible to prune out all wildcards without producing a too large set of filters. Then it is important to have the full generality of an evaluation algorithm, as presented in this article, that can also handle wildcards.</p>
      </abstract>
    </article-meta>
  </front>
</article>
