<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//TaxonX//DTD Taxonomic Treatment Publishing DTD v0 20100105//EN" "../../nlm/tax-treatment-NS0.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:tp="http://www.plazi.org/taxpub" article-type="research-article" dtd-version="3.0" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">109</journal-id>
      <journal-id journal-id-type="index">urn:lsid:arphahub.com:pub:3dc5f44e-8666-58db-bc76-a455210e8891</journal-id>
      <journal-title-group>
        <journal-title xml:lang="en">JUCS - Journal of Universal Computer Science</journal-title>
        <abbrev-journal-title xml:lang="en">jucs</abbrev-journal-title>
      </journal-title-group>
      <issn pub-type="ppub">0948-695X</issn>
      <issn pub-type="epub">0948-6968</issn>
      <publisher>
        <publisher-name>Journal of Universal Computer Science</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3217/jucs-014-18-2912</article-id>
      <article-id pub-id-type="publisher-id">29199</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
        <subj-group subj-group-type="scientific_subject">
          <subject>H.3.1 - Content Analysis and Indexing</subject>
          <subject>H.3.2 - Information Storage</subject>
          <subject>H.3.7 - Digital Libraries</subject>
          <subject>J.4 - SOCIAL AND BEHAVIORAL SCIENCES</subject>
          <subject>J.5 - ARTS AND HUMANITIES</subject>
          <subject>M.0 - KNOWLEDGE ACQUISITION</subject>
          <subject>M.4 - KNOWLEDGE MODELING</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>A Generic Architecture for the Conversion of Document Collections into Semantically Annotated Digital Archives</article-title>
      </title-group>
      <contrib-group content-type="authors">
        <contrib contrib-type="author" corresp="yes">
          <name name-style="western">
            <surname>Lladós</surname>
            <given-names>Josep</given-names>
          </name>
          <email xlink:type="simple">josep@cvc.uab.es</email>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Karatzas</surname>
            <given-names>Dimosthenis</given-names>
          </name>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Mas</surname>
            <given-names>Joan</given-names>
          </name>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Sánchez</surname>
            <given-names>Gemma</given-names>
          </name>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
      </contrib-group>
      <aff id="A1">
        <label>1</label>
        <addr-line content-type="verbatim">Universitat Autònoma de Barcelona, Barcelona, Spain</addr-line>
        <institution>Universitat Autònoma de Barcelona</institution>
        <addr-line content-type="city">Barcelona</addr-line>
        <country>Spain</country>
      </aff>
      <author-notes>
        <fn fn-type="corresp">
          <p>Corresponding author: Josep Lladós (<email xlink:type="simple">josep@cvc.uab.es</email>).</p>
        </fn>
        <fn fn-type="edited-by">
          <p>Academic editor: </p>
        </fn>
      </author-notes>
      <pub-date pub-type="collection">
        <year>2008</year>
      </pub-date>
      <pub-date pub-type="epub">
        <day>01</day>
        <month>10</month>
        <year>2008</year>
      </pub-date>
      <volume>14</volume>
      <issue>18</issue>
      <fpage>2912</fpage>
      <lpage>2935</lpage>
      <uri content-type="arpha" xlink:href="http://openbiodiv.net/DD0C86CF-98F8-53AA-8CFB-CF6B9B079E01">DD0C86CF-98F8-53AA-8CFB-CF6B9B079E01</uri>
      <uri content-type="zenodo_dep_id" xlink:href="https://zenodo.org/record/7000491">7000491</uri>
      <permissions>
        <copyright-statement>Josep Lladós, Dimosthenis Karatzas, Joan Mas, Gemma Sánchez</copyright-statement>
        <license license-type="creative-commons-attribution" xlink:href="" xlink:type="simple">
          <license-p>This article is freely available under the J.UCS Open Content License.</license-p>
        </license>
      </permissions>
      <abstract>
        <label>Abstract</label>
        <p>Mass digitization of document collections with further processing and semantic annotation is an increasing activity among libraries and archives at large for preservation, browsing and navigation, and search purposes. In this paper we propose a software architecture for the process of converting high volumes of document collections to semantically annotated digital libraries. The proposed architecture recognizes two sources of knowledge in the conversion pipeline, namely document images and humans. The Image Analysis module and the Correction and Validation module cover the initial conversion stages. In the former information is automatically extracted from document images. The latter involves human intervention at a technical level to define workflows and to validate the image processing results. The second stage, represented by the Knowledge Capture modules requires information specific to the particular knowledge domain and generally calls for expert practitioners. These two principal conversion stages are coupled with a Knowledge Management module which provides the means to organise the extracted and acquired knowledge. In terms of data propagation, the architecture follows a bottom-up process, starting with document image units, called terms, and progressively building meaningful concepts and their relationships. In the second part of the paper we describe a real scenario with historical document archives implemented according to the proposed architecture.</p>
      </abstract>
    </article-meta>
  </front>
</article>
