<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//TaxonX//DTD Taxonomic Treatment Publishing DTD v0 20100105//EN" "../../nlm/tax-treatment-NS0.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:tp="http://www.plazi.org/taxpub" article-type="research-article" dtd-version="3.0" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">109</journal-id>
      <journal-id journal-id-type="index">urn:lsid:arphahub.com:pub:3dc5f44e-8666-58db-bc76-a455210e8891</journal-id>
      <journal-title-group>
        <journal-title xml:lang="en">JUCS - Journal of Universal Computer Science</journal-title>
        <abbrev-journal-title xml:lang="en">jucs</abbrev-journal-title>
      </journal-title-group>
      <issn pub-type="ppub">0948-695X</issn>
      <issn pub-type="epub">0948-6968</issn>
      <publisher>
        <publisher-name>Journal of Universal Computer Science</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3217/jucs-023-11-1019</article-id>
      <article-id pub-id-type="publisher-id">23680</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
        <subj-group subj-group-type="scientific_subject">
          <subject>F.4.2 - Grammars and Other Rewriting Systems</subject>
          <subject>I.2.7 - Natural Language Processing</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>An Evaluation of Structured Language Modeling for Automatic Speech Recognition</article-title>
      </title-group>
      <contrib-group content-type="authors">
        <contrib contrib-type="author" corresp="yes">
          <name name-style="western">
            <surname>Björklund</surname>
            <given-names>Johanna</given-names>
          </name>
          <email xlink:type="simple">johanna@cs.umu.se</email>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Cleophas</surname>
            <given-names>Loek</given-names>
          </name>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Karlsson</surname>
            <given-names>My</given-names>
          </name>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
      </contrib-group>
      <aff id="A1">
        <label>1</label>
        <addr-line content-type="verbatim">Umeå University, Umeå, Sweden</addr-line>
        <institution>Umeå University</institution>
        <addr-line content-type="city">Umeå</addr-line>
        <country>Sweden</country>
      </aff>
      <author-notes>
        <fn fn-type="corresp">
          <p>Corresponding author: Johanna Björklund (<email xlink:type="simple">johanna@cs.umu.se</email>).</p>
        </fn>
        <fn fn-type="edited-by">
          <p>Academic editor: </p>
        </fn>
      </author-notes>
      <pub-date pub-type="collection">
        <year>2017</year>
      </pub-date>
      <pub-date pub-type="epub">
        <day>28</day>
        <month>11</month>
        <year>2017</year>
      </pub-date>
      <volume>23</volume>
      <issue>11</issue>
      <fpage>1019</fpage>
      <lpage>1034</lpage>
      <uri content-type="arpha" xlink:href="http://openbiodiv.net/ABBCC62C-BB17-5937-96AB-2F3E3B4AE915">ABBCC62C-BB17-5937-96AB-2F3E3B4AE915</uri>
      <uri content-type="zenodo_dep_id" xlink:href="https://zenodo.org/record/5505777">5505777</uri>
      <history>
        <date date-type="received">
          <day>23</day>
          <month>04</month>
          <year>2017</year>
        </date>
        <date date-type="accepted">
          <day>26</day>
          <month>11</month>
          <year>2017</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>Johanna Björklund, Loek Cleophas, My Karlsson</copyright-statement>
        <license license-type="creative-commons-attribution" xlink:href="" xlink:type="simple">
          <license-p>This article is freely available under the J.UCS Open Content License.</license-p>
        </license>
      </permissions>
      <abstract>
        <label>Abstract</label>
        <p>We evaluated probabilistic lexicalized tree-insertion grammars (PLTIGs) on a classification task relevant for automatic speech recognition. The baseline is a family of n-gram models tuned with Witten-Bell smoothing. The language models are trained on unannotated corpora, consisting of 10,000 to 50,000 sentences collected from the English section of Wikipedia. For the evaluation, an additional 150 random sentences were selected from the same source, and for each of these, approximately 3,200 variations were generated. Each variant sentence was obtained by replacing an arbitrary word by a similar word, chosen to be at most 2 character edits from the original. The evaluation task consisted of identifying the original sentence among the automatically constructed (and typically inferior) alternatives. In the experiments, the n-gram models outperformed the PLTIG model on the smaller data set, but as the size of data grew, the PLTIG model gave comparable results. While PLTIGs are more demanding to train, they have the advantage that they assign a parse structure to their input sentences. This is valuable for continued algorithmic processing, for example, for summarization or sentiment analysis.</p>
      </abstract>
    </article-meta>
  </front>
</article>
