
<rss version="0.91">
    <channel>
        <title>Latest Articles from JUCS - Journal of Universal Computer Science</title>
        <description>Latest 67 Articles from JUCS - Journal of Universal Computer Science</description>
        <link>https://lib.jucs.org/</link>
        <lastBuildDate>Sun, 14 Jun 2026 13:10:06 +0000</lastBuildDate>
        <generator>Pensoft FeedCreator</generator>
        <image>
            <url>https://lib.jucs.org/i/logo.jpg</url>
            <title>Latest Articles from JUCS - Journal of Universal Computer Science</title>
            <link>https://lib.jucs.org/</link>
            <description><![CDATA[Feed provided by https://lib.jucs.org/. Click to visit.]]></description>
        </image>
	
		<item>
		    <title>Duygu-Turk: A Context-Aware Sentiment Analysis Framework for Turkish, Based on Plutchik’s Emotion Model</title>
		    <link>https://lib.jucs.org/article/160588/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 32(4): 615-643</p>
					<p>DOI: 10.3897/jucs.160588</p>
					<p>Authors: Rabia Tintin, Sait Can Yucebas</p>
					<p>Abstract: This study presents Duygu-Turk, a novel deep learning-based sentiment analysis framework specifically designed for the Turkish language which is characterized by its agglutinative and morphologically rich structure. Unlike conventional sentiment analysis models that rely on coarse polarity classification (positive, negative, neutral) and insufficient integration of Turkish-specific linguistic features, Duygu-Turk adopts a fine-grained classification approach based on Plutchik&rsquo;s Wheel of Emotions. The model identifies eight primary emotions, eight secondary emotions, and varying degrees of emotional intensity. Additionally, a non-monotonic logic mechanism is integrated to detect conditional sentiments, allowing for more context-sensitive classification. To enhance linguistic coverage, the model leverages morpho-semantic features, idiomatic expressions, suffixes, and contrastive conjunctions unique to Turkish. A new sentiment corpus consisting of 136,000 annotated Turkish sentences was constructed to train and validate the model. Experimental evaluations demonstrate that Duygu-Turk significantly outperforms transformer-based models such as BERT, DistilBERT, and ELECTRA, achieving F1 scores of 0.99 for polarity classification and 0.90 for multi-class emotion classification. These results highlight the model&rsquo;s potential as a robust and linguistically grounded solution for sentiment analysis in Turkish and other low-resource languages.</p>
					<p><a href="https://lib.jucs.org/article/160588/">HTML</a></p>
					
					<p><a href="https://lib.jucs.org/article/160588/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Tue, 28 Apr 2026 10:00:06 +0000</pubDate>
		</item>
	
		<item>
		    <title>A Robust Dot-focused Classification Approach to Convolutional Braille Recognition</title>
		    <link>https://lib.jucs.org/article/161636/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 32(4): 486-518</p>
					<p>DOI: 10.3897/jucs.161636</p>
					<p>Authors: Wicus J. van der Linden, Trienko L. Grobler, Lynette van Zijl</p>
					<p>Abstract: The effect of imbalanced data on the optical character recognition of Braille text is investigated by applying two techniques to a set of convolutional neural network image classification models. A multilabel classification framework is applied to identify the combination of Braille dots present in a character sample. This approach is compared to the multiclass classification framework prevalent in the literature, which directly identifies each sample as one of 64 possible Braille characters. Furthermore, data resampling methods are applied to investigate the impact of class imbalance on the multilabel and multiclass modelling approaches, respectively. The multilabel models are shown to achieve statistically significantly better performance than multiclass models, across different data resampling strategies. This includes better generalisation to out of distribution testing data from different Braille language codes, as well as robust performance under experimental image augmentation conditions. Furthermore, while multiclass models achieve better performance when trained on resampled data compared to training without resampling, this performance increase fails to rival the performance of the multilabel classification models across metrics and resampling strategies.</p>
					<p><a href="https://lib.jucs.org/article/161636/">HTML</a></p>
					
					<p><a href="https://lib.jucs.org/article/161636/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Tue, 28 Apr 2026 10:00:02 +0000</pubDate>
		</item>
	
		<item>
		    <title>SNAP Framework: Linked Prediction Based Anomaly Prevention With Suspicious Nodes on Social Network Graph</title>
		    <link>https://lib.jucs.org/article/152114/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 31(13): 1538-1563</p>
					<p>DOI: 10.3897/jucs.152114</p>
					<p>Authors: Vahide Nida Kılıç, Esra Saraç Eşsiz</p>
					<p>Abstract: In previous studies, the focus has predominantly been on anomaly detection, with minimal attention given to anomaly prevention. However, anomaly prevention holds greater significance than anomaly detection. Preventing anomalous behavior before it occurs and identifying potential anomalies in advance to enable timely intervention is both challenging and crucial. In this study, a Suspicious Nodes Anomaly Prevention framework for anomaly prevention has been developed. First, a novel K-medoid based Salp Swarm Anomaly Detection method is proposed within the framework. This method reveals unclustered data by applying clustering and determines the boundaries of clusters using a nature-inspired algorithm that optimizes the threshold. Since threshold determination is an optimization problem, it aligns well with nature-inspired algorithms. Additionally, the Enron email dataset was selected as it is a real-world dataset with accessible content information. Initially, content and node features were extracted from the Enron email dataset. The proposed anomaly detection method was then applied separately to each of these features. Nodes identified as anomalous by one feature but normal by others were of particular interest. These nodes were labeled as &ldquo;suspicious nodes,&rdquo; and their connections were analyzed to detect potentially harmful email content. This framework fills a significant gap in the anomaly detection literature by contributing an unprecedented approach to anomaly prevention, offering early intervention capabilities in various sectors by identifying risks in advance. In this study, the proposed framework demonstrates high efficacy in detecting anomalies, achieving a True Positive Rate of 94% in node-based anomaly detection and 78% in content-based anomaly detection, indicating a robust capability for early intervention and risk identification.</p>
					<p><a href="https://lib.jucs.org/article/152114/">HTML</a></p>
					
					<p><a href="https://lib.jucs.org/article/152114/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Fri, 28 Nov 2025 14:00:06 +0000</pubDate>
		</item>
	
		<item>
		    <title>An Efficient Workload-balancing Algorithm for a Parallel Environment Using Hybrid Spatio-temporal Indexes</title>
		    <link>https://lib.jucs.org/article/164671/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 31(9): 928-945</p>
					<p>DOI: 10.3897/jucs.164671</p>
					<p>Authors: Claudio Gutiérrez-Soto, Marco A. Palomino, Patricio Galdames</p>
					<p>Abstract: In recent years, we have witnessed the proliferation of applications that generate thousands of terabytes of data per day, due to the explosive increase in storage capacity across various devices. As a consequence, a new concept called Data Deluge has emerged. Data deluge refers to the situation where the quantity of data generated exceeds the processing power available, and spatio-temporal data is no exception to this phenomenon. In this context, the efficient processing of spatio-temporal queries becomes crucial to address this challenge, as slow query processing can result in obsolete answers, which may lead to errors. Considering this dynamic context of storage and processing, we explore a new online workload algorithm in a distributed parallel environment using hybrid spatio-temporal indexes. This algorithm is able to update the indexes with the most appropriate data, aiming to achieve more efficient query processing. To measure the efficiency of this algorithm, we present its time complexity along with an empirical evaluation of its performance, considering processing time, number of accessed nodes, and communication costs. The empirical results show a significant reduction in processing time, communication costs, and number of accessed nodes.</p>
					<p><a href="https://lib.jucs.org/article/164671/">HTML</a></p>
					
					<p><a href="https://lib.jucs.org/article/164671/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Thu, 14 Aug 2025 16:00:04 +0000</pubDate>
		</item>
	
		<item>
		    <title>Examining Deep Learning Techniques for Ethical Artificial Intelligence: Cleansing Malicious Comments from Users</title>
		    <link>https://lib.jucs.org/article/128450/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 31(7): 735-755</p>
					<p>DOI: 10.3897/jucs.128450</p>
					<p>Authors: Ji Woong Yoo, Kyoung Jun Lee, Arum Park</p>
					<p>Abstract: The advancement of AI has heightened the significance of ethical concerns, particularly in managing negative user feedback like malicious comments, necessitating thoughtful deliberation. The focus of this research is to explore the potential of deep learning techniques in addressing these issues and enhancing the ethical nature of AI systems. Specifically, we investigated the collection and processing of news comment data using Long Short-Term Memory (LSTM) algorithm and Word2Vec model. The primary objective was to evaluate how deep learning techniques can improve the quality of data obtained from news comments. Our findings demonstrate that deep learning models surpass CleanBot in accuracy and block rates for handling negative user comments, including malicious ones, enabling organizations to effectively manage such comments in online communities using AI-based methods. This study adds to the existing research by showing how advanced deep learning techniques can effectively identify and classify harmful comments by analyzing complex language patterns.</p>
					<p><a href="https://lib.jucs.org/article/128450/">HTML</a></p>
					
					<p><a href="https://lib.jucs.org/article/128450/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sat, 28 Jun 2025 09:00:05 +0000</pubDate>
		</item>
	
		<item>
		    <title>PIMTABSA: A Personality influenced Multitask model for Aspect Based Sentiment Analysis using LSTM</title>
		    <link>https://lib.jucs.org/article/129212/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 31(6): 603-622</p>
					<p>DOI: 10.3897/jucs.129212</p>
					<p>Authors: M. Priadarsini, J. Akilandeswari</p>
					<p>Abstract: In the expanding field of sentiment analysis, the integration of personality prediction into aspect-based sentiment analysis (ABSA) represents a novel and promising approach to enhance the accuracy and depth of sentiment detection. This paper proposes a unique framework that leverages the Big Five personality traits (Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism) alongside Long Short-Term Memory (LSTM) networks, under a multitask learning paradigm, to improve the performance of ABSA. This is, to the best of knowledge, the first work considering the use of personality traits as auxiliary tasks in order to capture the manifold subtle ways in which personality would influence the expression of sentiment towards the different aspects of products or services. And then, model uses the LSTM component to model the sequential character of the text, which makes the extraction accurate in terms of the aspect terms and sentiment polarities. The proposed model designs a multitask learning strategy simultaneously to predict sentiments and personality traits. Such joint learning will allow enhancing the model&#39;s understanding of textual context and sentiment expression. Thorough experiments on many benchmark datasets show that the proposed approach is competitive with the state of the art for the aspect-based sentiment analysis and provides some of the deepest insights into personality predictions. Model has obtained F1-scores of 79.78%, 83.67%, and 88.80 % on the Twitter, Laptop, and Restaurant datasets, respectively. These results highlight a significant improvement over existing methods in the literature. For instance, our model outperformed traditional approaches like RAM, which achieved 69.36% on the Twitter dataset, and even advanced techniques such as DualGCN+Bert, which scored 77.4% on Twitter. It can be generally concluded that this research finally opens the way to a new and meaningful opportunity for sentiment analysis applications: integrated into ABSA models, personality prediction advances applications ranging from personalized recommendation systems to the nuance market analysis tools. As far as we know this study is the first attempt to utilise personality feature to enhance sentiment prediction tasks.</p>
					<p><a href="https://lib.jucs.org/article/129212/">HTML</a></p>
					
					<p><a href="https://lib.jucs.org/article/129212/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Wed, 28 May 2025 10:00:04 +0000</pubDate>
		</item>
	
		<item>
		    <title>Detecting Suicidality from Reddit Posts Using a Hybrid CNN - LSTM Model</title>
		    <link>https://lib.jucs.org/article/119828/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 30(13): 1872-1904</p>
					<p>DOI: 10.3897/jucs.119828</p>
					<p>Authors: Seyedeh Aridis Ahadi, Kian Jazayeri, Sahand Tebyani</p>
					<p>Abstract: The identification of individuals who indicate suicidal behaviors on social media platforms has become more significant in recent years. The utilization of textual data may help in the development of systems aimed at predicting individuals&#39; mental health. This article proposes an integrated framework for the identification of suicidal thoughts in social media through the implementation of a layered classifier model consisting of a convolutional neural network (CNN) and a long short-term memory (LSTM) model. Various combinations of embedding techniques, activation functions, and solver algorithms are applied to the network. The mixture of these techniques forms 82 distinct methodologies employed, followed by comparing the results obtained. A collection of approximately 60,0000 user posts from 2018 to 2020 was compiled from Reddit for the study. It has resulted in the combination of TF-IDF (word embedding), RReLU (activation function), and Adam (solver algorithm) reaching the highest overall performance. The model achieved impressive accuracy, F1 Score, and AUC of 86%, with precision and recall score of 91% and 82% respectively. It was fitted in just 8.69 seconds, demonstrating its time efficiency as well. This approach has great potential for creating a platform in real life to not only reduce the social impacts of suicidality and mental illness, but also increase social access to mental health resources for all individuals.</p>
					<p><a href="https://lib.jucs.org/article/119828/">HTML</a></p>
					
					<p><a href="https://lib.jucs.org/article/119828/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sat, 28 Dec 2024 10:00:06 +0000</pubDate>
		</item>
	
		<item>
		    <title>Insights into Low-Resource Language Modelling: Improving Model Performances for South African Languages</title>
		    <link>https://lib.jucs.org/article/118889/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 30(13): 1849-1871</p>
					<p>DOI: 10.3897/jucs.118889</p>
					<p>Authors: Ruan Visser, Trieko Grobler, Marcel Dunaiski</p>
					<p>Abstract: To address the gap in natural language processing for Southern African languages, our paper presents an in-depth analysis of language model development under resource-constrained conditions. We investigate the interplay between model size, pretraining objectives, and multilingual dataset composition in the context of low-resource languages such as Zulu and Xhosa. In our approach, we initially pretrain language models from scratch on specific low-resource languages using a variety of model configurations, and incrementally add related languages to explore the effect of additional languages on the performance of these models. We demonstrate that smaller data volumes can be effectively leveraged, and that the choice of pretraining objective and multilingual dataset composition significantly influences model performance. Our monolingual and multilingual models, exhibit competitive, and in some cases superior, performance compared to established multilingual models such as XLM-R-base and AfroXLM-R-base.</p>
					<p><a href="https://lib.jucs.org/article/118889/">HTML</a></p>
					
					<p><a href="https://lib.jucs.org/article/118889/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sat, 28 Dec 2024 10:00:05 +0000</pubDate>
		</item>
	
		<item>
		    <title>Interaction and Fusion of Rich Textual Information Network for Document-level Relation Extraction</title>
		    <link>https://lib.jucs.org/article/130588/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 30(8): 1112-1136</p>
					<p>DOI: 10.3897/jucs.130588</p>
					<p>Authors: Yu Zhong, Bo Shen, Tao Wang, Jinglin Zhang, Yun Liu</p>
					<p>Abstract: Detecting relations between entities across multiple sentences in a document, referred to as document-level relation extraction, poses a challenge in natural language processing. Graph networks have gained widespread application for their ability to capture long-range contextual dependencies in documents. However, previous studies have often been limited to using only two to three types of nodes to construct document graphs. This leads to insufficient utilization of the rich information within the documents and inadequate aggregation of contextual information. Additionally, relevant relationship labels often co-occur in documents, yet existing methods rarely model the dependencies of relationship labels. In this paper, we propose the Interaction and Fusion of Rich Textual Information Network (IFRTIN) that simultaneously considers multiple types of nodes. First, we utilize the structural, syntactic, and discourse information in the document to construct a document graph, capturing global dependency relationships. Next, we design a regularizer to encourage the model to capture dependencies of relationship labels. Furthermore, we design an Adaptive Encouraging Loss, which encourages well-classified instances to contribute more to the overall loss, thereby enhancing the effectiveness of the model. Experimental results demonstrate that our approach achieves a significant improvement on three document-level relation extraction datasets. Specifically, IFRTIN outperforms existing models by achieving an F1 score improvement of 0.67% on Dataset DocRED, 1.2% on Dataset CDR, and 1.3% on Dataset GDA. These results highlight the effectiveness of our approach in leveraging rich textual information and modeling label dependencies for document-level relation extraction.</p>
					<p><a href="https://lib.jucs.org/article/130588/">HTML</a></p>
					
					<p><a href="https://lib.jucs.org/article/130588/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Wed, 28 Aug 2024 16:00:07 +0000</pubDate>
		</item>
	
		<item>
		    <title>A New Performance Metric to Evaluate Filter Feature Selection Methods in Text Classification</title>
		    <link>https://lib.jucs.org/article/111675/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 30(7): 978-1005</p>
					<p>DOI: 10.3897/jucs.111675</p>
					<p>Authors: Rasim Çekik, Mahmut Kaya</p>
					<p>Abstract: High dimensionality and sparsity are the primary issues in text classification. Using feature selection approaches, the most effective way to solve the problem is to select a subset of features. The most common and effective methods used for this process are filter techniques. Various performance metrics such as Micro-F1, Macro-F1, and Accuracy are used to evaluate the performance of filter methods used for feature selection on datasets  Such methods work depending on a classification algorithm. However, when selecting features in filter techniques, the information on the individual features is evaluated without considering the relationship between the features. In such an approach, the actual performance of the filter technique used in feature selection may not be determined. In such a case, it causes the existing methods to be insufficient in testing the validity of the proposed method. For this purpose, this study suggests a novel performance metric called Selection Error (SE) to determine the actual performance evaluation of filter techniques. The Selection Error metric allows us to analyze the information value of the selected features more accurately than existing methods without relying on a classifier. The feature selection performance of the filtering approaches was performed on six different datasets with both The Selection Error and traditional performance metrics. When the results are examined, it is seen that there is a strong relationship between the proposed performance metric and the classification performance metric results. The Selection Error aims to significantly contribute to the literature by demonstrating the success of filtering feature selection methods, regardless of classifier performance.</p>
					<p><a href="https://lib.jucs.org/article/111675/">HTML</a></p>
					
					<p><a href="https://lib.jucs.org/article/111675/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sun, 28 Jul 2024 16:00:06 +0000</pubDate>
		</item>
	
		<item>
		    <title>A BERT-GRU Model for Measuring the Similarity of Arabic Text</title>
		    <link>https://lib.jucs.org/article/111217/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 30(6): 779-790</p>
					<p>DOI: 10.3897/jucs.111217</p>
					<p>Authors: Rakia Saidi, Fethi Jarray, Didier Schwab</p>
					<p>Abstract: Semantic Textual Similarity (STS) aims to assess the semantic similarity between two pieces of text. As a challenging task in natural language processing, various approaches for STS in high-resource languages, such as English, have been proposed. In this paper, we are concerned with STS in low resource languages such as Arabic. A baseline approach for STS is based on vector embedding of the input text and application of similarity metric on the embedding space. In this contribution, we propose a cross-encoder neural network (Cross-BERT-GRU) to handle semantic similarity of Arabic sentences that benefits from both the strong contextual understanding of BERT and the sequential modeling capabilities of GRU. The architecture begins by inputting the BERT word embeddings for each word into a GRU cell to model long-term dependencies. Then, max pooling and average pooling are applied to the hidden outputs of the GRU cell, serving as the sentence -pair encoder. Finally, a softmax layer is utilized to predict the degree of similarity. The experiment results show a Spearman correlation coefficient of around 0.9 and that Cross-BERT-GRU outperforms the other BERT models in predicting the semantic textual similarity of Arabic sentences. The experimentation results also indicate that the performance improves by integrating data augmentation techniques.</p>
					<p><a href="https://lib.jucs.org/article/111217/">HTML</a></p>
					
					<p><a href="https://lib.jucs.org/article/111217/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Fri, 28 Jun 2024 16:00:04 +0000</pubDate>
		</item>
	
		<item>
		    <title>Automatic Sarcasm Detection on Cross-Platform Social Media Datasets: A GLoVe and Bi-LSTM Based Approach </title>
		    <link>https://lib.jucs.org/article/104790/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 30(5): 674-693</p>
					<p>DOI: 10.3897/jucs.104790</p>
					<p>Authors: Saima Farhan, Rubiya Shoukat, Aqsa Aslam</p>
					<p>Abstract: Sarcastic remarks on social media platforms have become commonplace, with people expressing their bad feelings in a quite positive manner or in a mocking way. This contradictive nature of sarcasm makes its detection a very challenging task. Many researchers have provided their solutions to perform automatic sarcasm detection from a single domain dataset. Most of them have considered only the content of the text and has ignored the context of the text. Understanding that the context of a text is the most important factor in determining either it is sarcastic or not. This study aims to detect sarcastic remarks from multi-domain dataset by using Bi-LSTM model employed with pre-trained GloVe word embeddings because the GloVe embeddings and Bi-LSTM both are good at capturing contextual information from the provided data. The dataset is generated by the concatenation of three publicly available datasets, the gosh tweets, the news headlines dataset, and the sarcasm corpus v2 dataset. GloVe embeddings will extract contextual and semantic features from the text while the Bi-LSTM model will get trained and tested on those features. The proposed model has achieved 86.35% of accuracy with 88% of Recall, Precision, and F1-score. Different experiments have been done to test the model&#39;s reliability. This study&#39;s findings indicate that the proposed model yields state-of-the-art or comparable results. The proposed study aids in improving the performance of sentiment analysis. It will also help individuals, and different organizations to identify accurate sentiments of people about individuals or products of an organization.</p>
					<p><a href="https://lib.jucs.org/article/104790/">HTML</a></p>
					
					<p><a href="https://lib.jucs.org/article/104790/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Tue, 28 May 2024 16:00:07 +0000</pubDate>
		</item>
	
		<item>
		    <title>Sentiment Analysis of Code-Mixed Text: A Comprehensive Review</title>
		    <link>https://lib.jucs.org/article/98708/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 30(2): 242-261</p>
					<p>DOI: 10.3897/jucs.98708</p>
					<p>Authors: Anne Perera, Amitha Caldera</p>
					<p>Abstract: Sentiment Analysis is the task of identifying and extracting the opinion expressed in a text to determine the writer&#39;s perception of an entity. Due to globalization, people often mix two or more languages and use phonetic typing and lexical borrowing in web communication. This concept is known as code-mixing. Although extracting the opinion of text written in monolingual languages is simple and straightforward, Sentiment Analysis of code-mixed text is challenging. Classifiers fail within the context of the code-mixed text as text may consist of creative writing, spelling variations, grammatical errors, and different word orders. Hence, SA of code-mixed text is an interesting, challenging, and popular research area. This paper presents the state-of-the-art in Sentiment Analysis of code-mixed text by discussing each concept in detail. The paper also discusses the focused areas, techniques used, limitations, and performances of the studies related to code-mixing.</p>
					<p><a href="https://lib.jucs.org/article/98708/">HTML</a></p>
					
					<p><a href="https://lib.jucs.org/article/98708/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Wed, 28 Feb 2024 16:00:06 +0000</pubDate>
		</item>
	
		<item>
		    <title>Artificial Intelligence as Catalyst for the Tourism Sector: A Literature Review</title>
		    <link>https://lib.jucs.org/article/101550/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 29(12): 1439-1460</p>
					<p>DOI: 10.3897/jucs.101550</p>
					<p>Authors: Anita Herrera, Ángel Arroyo, Alfredo Jiménez, Álvaro Herrero</p>
					<p>Abstract: The analysis of Artificial Intelligence techniques and models used in the tourism sector provides insightful information for the management and innovation of this industry. In this paper, we conduct a comprehensive review of the different techniques and models, in regards to Artificial Intelligence when applied to the tourism industry. Specifically, we present a categorization of Artificial Intelligence applications used in different areas of tourism. The results allow to recognize valid studies and useful tools for the activation and growth of the tourism sector, an industry that represents a significant increase in the Gross Domestic Product of various economies and supports the development of life conditions for their inhabitants. Artificial Intelligence applications generate more personalized travel experiences, improve the efficiency of tourism services and strengthen the tourism competitiveness of the destination.</p>
					<p><a href="https://lib.jucs.org/article/101550/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/101550/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/101550/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Thu, 28 Dec 2023 08:00:03 +0000</pubDate>
		</item>
	
		<item>
		    <title>The evaluation of a semi-automatic authoring tool for knowledge extraction in the AC&amp;NL Tutor</title>
		    <link>https://lib.jucs.org/article/86745/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 29(8): 866-891</p>
					<p>DOI: 10.3897/jucs.86745</p>
					<p>Authors: Ani Grubišić, Slavomir Stankov, Branko Žitko, Ines Šarić-Grgić, Angelina Gašpar, Emil Brajković, Daniel Vasić</p>
					<p>Abstract: This paper describes and evaluates the performance of a semi-automatic authoring tool (SAAT) for knowledge extraction in the AC&amp;NL Tutor, highlighting its strengths and weaknesses. We assessed the accuracy of automatic annotation tasks (Part-of-Speech tagging, Name Entity Recognition, Dependency parsing, and Coreference Resolution) performed on a dataset of 160 sentences from unstructured Wikipedia text on a computer. We compared the automatic annotations to the gold standard, created after human post-editing and validation. Human-error analysis included 3769 words, 582 subsentences, 1129 questions, 917 propositions, 1020 concepts, and 667 relations. It resulted in the error type classification and the set of custom rules further used for automatic error identification and correction. The results showed that an average of 68.7% of the error corrections referred to CoreNLP performance and 31.3% to the SAAT extraction algorithms. Our main contributions include an integrated approach to the comprehensive pre-processing of the text, knowledge extraction and visualization; the consolidated evaluation of natural language processing tasks and knowledge extraction output (sentences, subsentences, questions, concept maps) and the newly developed reference dataset.</p>
					<p><a href="https://lib.jucs.org/article/86745/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/86745/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/86745/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Mon, 28 Aug 2023 18:00:03 +0000</pubDate>
		</item>
	
		<item>
		    <title>Aggregating Users’ Online Opinions Attributes and News Influence for Cryptocurrencies Reputation Generation</title>
		    <link>https://lib.jucs.org/article/85610/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 29(6): 546-568</p>
					<p>DOI: 10.3897/jucs.85610</p>
					<p>Authors: Achraf Boumhidi, Abdessamad Benlahbib, El Habib Nfaoui</p>
					<p>Abstract: Reputation generation systems are decision-making tools used in different domains including e-commerce, tourism, social media events, etc. Such systems generate a numerical reputation score by analyzing and mining massive amounts of various types of user data, including textual opinions, social interactions, shared images, etc. Over the past few years, users have been sharing millions of tweets related to cryptocurrencies. Yet, no system in the literature was designed to handle the unique features of this domain with the goal of automatically generating reputation and supporting investors&rsquo; and users&rsquo; decision-making. Therefore, we propose the first financially oriented reputation system that generates a single numerical value from user-generated content on Twitter toward cryptocurrencies. The system processes the textual opinions by applying a sentiment polarity extractor based on the fine-tuned auto-regressive language model named XLNet. Also, the system proposes a technique to enhance sentiment identification by detecting sarcastic opinions through examining the contrast of sentiment between the textual content, images, and emojis. Furthermore, other features are considered, such as the popularity of the opinions based on the social network interactions (likes and shares), the intensity of the entity&rsquo;s demand within the opinions, and news influence on the entity. A survey experiment has been conducted by gathering numerical scores from 827 Twitter users interested in cryptocurrencies. Each selected user assigns 3 numerical assessment scores toward three cryptocurrencies. The average of those scores is considered ground truth. The experiment results show the efficacy of our model in generating a reliable numerical reputation value compared with the ground truth, which proves that the proposed system may be applied in practice as a trusted decision-making tool.</p>
					<p><a href="https://lib.jucs.org/article/85610/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/85610/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/85610/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Wed, 28 Jun 2023 12:00:03 +0000</pubDate>
		</item>
	
		<item>
		    <title>A Neuro Symbolic Approach for Contradiction Detection in Persian Text</title>
		    <link>https://lib.jucs.org/article/90646/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 29(3): 242-264</p>
					<p>DOI: 10.3897/jucs.90646</p>
					<p>Authors: Zeinab Rahimi, Mehrnoush Shamsfard</p>
					<p>Abstract: Detection of semantic contradictory sentences is a challenging and fundamental issue for some NLP applications, such as textual entailments recognition. In this study, contradiction means different types of semantic confrontation, such as negation, antonymy, and numerical. Due to the lack of sufficient data to apply precise machine learning and, specifically, deep learning methods to Persian and other low-resource languages, rule-based approaches are of great interest. Also, recently, the emergence of new methods such as transfer learning has opened up the possibility of deep learning for low-resource languages. This paper introduces a hybrid contradiction detection approach for detecting seven categories of contradictions in Persian texts: Antonymy, negation, numerical, factive, structural, lexical and world knowledge. The proposed method consists of 1) a novel data mining method and 2) a transformer-based deep neural method for contradiction detection . Also, a simple baseline is presented for comparison. The data mining method uses frequent rule mining to extract appropriate contradiction detection rules employing a development set. Extracted rules are tested for different categories of contradictory sentences. In the first step, a classifier checks whether the rules work for an input sentence pair. Then, according to the result, rules are used for three categories of negation, numerical, and antonym. In this part, the highest F-measure is obtained for detecting the negation category (90%), the average F-measure for these three categories is 86%, and for the other four categories, in which the rules have a lower F-measure of 62%, the transformer-based method achieved 76%. The proposed hybrid approach has an overall f-measure of higher than 80%.</p>
					<p><a href="https://lib.jucs.org/article/90646/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/90646/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/90646/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Tue, 28 Mar 2023 10:30:04 +0000</pubDate>
		</item>
	
		<item>
		    <title>Leveraging Structural and Semantic Measures for JSON Document Clustering</title>
		    <link>https://lib.jucs.org/article/86563/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 29(3): 222-241</p>
					<p>DOI: 10.3897/jucs.86563</p>
					<p>Authors: Uma Priya D, P. Santhi Thilagam</p>
					<p>Abstract: In recent years, the increased use of smart devices and digital business opportunities has generated massive heterogeneous JSON data daily, making efficient data storage and management more difficult. Existing research uses different similarity metrics and clusters the documents to support the above tasks effectively. However, extant approaches have focused on either structural or semantic similarity of schemas. As JSON documents are application-specific, differently annotated JSON schemas are not only structurally heterogeneous but also differ by the context of the JSON attributes. Therefore, there is a need to consider the structural, semantic, and contextual properties of JSON schemas to perform meaningful clustering of JSON documents. This work proposes an approach to cluster heterogeneous JSON documents using the similarity fusion method. The similarity fusion matrix is constructed using structural, semantic, and contextual measures of JSON schemas. The experimental results demonstrate that the proposed approach outperforms the existing approaches significantly.</p>
					<p><a href="https://lib.jucs.org/article/86563/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/86563/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/86563/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Tue, 28 Mar 2023 10:30:03 +0000</pubDate>
		</item>
	
		<item>
		    <title>EntailClass: A Classification Approach to EntailSum and End-to-End Document Extraction, Identification, and Evaluation</title>
		    <link>https://lib.jucs.org/article/84647/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 29(1): 3-15</p>
					<p>DOI: 10.3897/jucs.84647</p>
					<p>Authors: Purvaja Balaji, Helena Merker, Amar Gupta</p>
					<p>Abstract: The novelty of zero-shot text classification can address the fundamental challenge of the lack of labeled training data. With the current plethora of multidisciplinary, unstandardized text data, scalable classification models favor unsupervised methods over their supervised counterparts. Overall, the aim is to automate the labelling of each sentence in an input document consisting of section titles and section text. We propose an end-to-end pipeline that includes a document parser, a text classification model called EntailClass, and finally an evaluator to determine balanced accuracy. The suggested pipeline employs a zero-shot approach to classify text within any desired set of aspects. Moreover, text sentences are paired with their section titles and chronological order is maintained within sentences of the same aspect. The proposed automated, three-step pipeline represents a step towards solving the challenge of text classification without the need for an individual dataset for each aspect. It also offers the potential for seamless integration into existing workflows. This zero-shot, generalizable pipeline has achieved 87.2% accuracy and outperformed other state-of-the-art models when applied to supervisory documents.</p>
					<p><a href="https://lib.jucs.org/article/84647/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/84647/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/84647/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sat, 28 Jan 2023 10:30:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Natural Language Enhancement for English Teaching Using Character-Level Recurrent Neural Network with Back Propagation Neural Network based Classification by Deep Learning Architectures</title>
		    <link>https://lib.jucs.org/article/94162/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 28(9): 984-1000</p>
					<p>DOI: 10.3897/jucs.94162</p>
					<p>Authors: Zhiling Yang</p>
					<p>Abstract: Natural Language Processing (NLP) is an efficient method for enhancing educational outcomes. In educational settings, implementing NLP entails starting the learning process through natural acquisition. English teaching and learning have received increased attention from the relevant education departments as an integral aspect of the new curriculum reform. The environment of English teaching and learning is undergoing extraordinary changes as a result of the constant improvement and extension of teaching level and scale, as well as the growth of Internet information technology. As a result, the current research aims to look into techniques for efficiently using AI (artificial intelligence) apps to teach and learn English from the perspective of university students. This research can measure the levels as well as effectiveness of the employment of AI applications for teaching English based on deep learning techniques. There, the NLP based language enhancement has been carried out using Character-level recurrent neural network with back Propagation neural network (Cha_RNN_BPNN) based classification. With the help of this DL (deep learning) technique, it is possible to use AI methods to assist teachers in analysing and diagnosing students&#39; English learning behaviour, replacing teachers in part to answer students&#39; questions in a timely manner, and automatically grading assignments during the English teaching process. Experimental analysis shows Word Perplexity, Flesch-Kincaid (F-K) Grade Level for Readability, Cosine Similarity for Semantic Coherence, gradient change of NN, validation accuracy, and training accuracy of the proposed technique.</p>
					<p><a href="https://lib.jucs.org/article/94162/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/94162/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/94162/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Wed, 28 Sep 2022 10:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>English Teaching in Artificial Intelligence-based Higher Vocational Education Using Machine Learning Techniques for Students’ Feedback Analysis and Course Selection Recommendation</title>
		    <link>https://lib.jucs.org/article/94160/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 28(9): 898-915</p>
					<p>DOI: 10.3897/jucs.94160</p>
					<p>Authors: Xin Ma</p>
					<p>Abstract: Higher vocational education is a self-contained method of higher education that is aligned with global productivity and economic development. Its goal is to develop talented workers who contribute significantly to the economy and industry. Teaching analysis, teaching strategy, teaching practice, and assessment are all part of the course design process in high vocational education. Teaching assessment is one of the most effective methods for improving the quality of course teaching among teaching processes. This research proposes novel techniques in English teaching based on artificial intelligence for course selection based on students&#39; feedback. Here, the dataset has been collected based on the students&rsquo; feedback on courses for Higher Vocational Education in English teaching. This dataset has been processed to remove invalid data, missing values, and noise. The processed data features have been dimensionality reduction integrated with K-means neural network. And the extracted features have been classified with higher accuracy using recursive elimination-based convolutional neural network. Based on this feedback data classification, recommendation for courses in Higher Vocational Education in English teaching has been suggested. The experimental analysis shows various students&#39; feedback dataset validation and training in terms of accuracy of 96%, precision of 92%, recall of 93%, RMSE of 68%, and computational time of 65%.</p>
					<p><a href="https://lib.jucs.org/article/94160/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/94160/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/94160/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Wed, 28 Sep 2022 10:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>TwitterBulletin: An Intelligent and Real-Time Automated News Categorization Tool for Twitter</title>
		    <link>https://lib.jucs.org/article/69377/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 28(4): 345-377</p>
					<p>DOI: 10.3897/jucs.69377</p>
					<p>Authors: Sedef Demirci, Seref Sagiroglu</p>
					<p>Abstract: Social media platforms have become popular news sources thanks to their immense popularity and high speed of information dissemination. Using these platforms is essential for news organizations and journalists to track and discover news in digital journalism age. However, the abundance of meaningless data and the lack of organization on these platforms make it difficult to reach valuable news for journalists. In this paper, we create the first public dataset containing large number of real-world Turkish news tweets belonging to different news categories, to the best of our knowledge. We propose an artificial intelligence-based two-step approach to assist journalists for accessing the news shared by various sources on social media under the relevant categories like politics (elections, riots, etc.), health (pandemic, covid-19, etc.), etc. via a single platform by reducing the possibility of overlooking needed information. In the first step, we propose a machine learning based novel model for collecting and categorizing news posts on social media. We implement several traditional machine learning and deep learning based algorithms and evaluate their classification performance in terms of accuracy, precision, recall, and F1 score. In the second step, we develop a software tool, named TwitterBulletin, which automatically retrieves Turkish news tweets and groups them under news categories in real time by using the CNN classifier which achieves the best performance in the first step. The results show that the overall accuracy rate of TwitterBulletin is reasonably high and satisfactory despite the challenge of classifying short tweets.</p>
					<p><a href="https://lib.jucs.org/article/69377/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/69377/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/69377/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Thu, 28 Apr 2022 10:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Authorship Studies and the Dark Side of Social Media Analytics</title>
		    <link>https://lib.jucs.org/article/23994/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 26(1): 156-170</p>
					<p>DOI: 10.3897/jucs.2020.009</p>
					<p>Authors: Patrick Juola</p>
					<p>Abstract: The computational analysis of documents to learn about their authorship (also known as authorship attribution and/or authorship profiling) is an increasingly important area of research and application of technology. This paper discusses the technology, focusing on its application to social media in a variety of disciplines. It includes a brief survey of the history as well as three tutorial case studies, and discusses several significant applications and societal benefits that authorship analysis has brought about. It further argues, though, that while the benefits of this technology have been great, it has created serious risks to society that have not been sufficiently considered, addressed, or mitigated.</p>
					<p><a href="https://lib.jucs.org/article/23994/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23994/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23994/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Tue, 28 Jan 2020 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Quality Assessment of Photographed 3D Printed Flat Surfaces Using Hough Transform and Histogram Equalization</title>
		    <link>https://lib.jucs.org/article/22621/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 25(6): 701-717</p>
					<p>DOI: 10.3217/jucs-025-06-0701</p>
					<p>Authors: Jarosław Fastowicz, Krzysztof Okarma</p>
					<p>Abstract: Automatic visual quality assessment of objects created using additive manufacturing processes is one of the hot topics in the Industry 4.0 era. As the 3D printing becomes more and more popular, also for everyday home use, a reliable visual quality assessment of printed surfaces attracts a great interest. One of the most obvious reasons is the possibility of saving time and filament in the case of detected low printing quality, as well as correction of some smaller imperfections during the printing process. A novel method presented in the paper can be successfully applied for the assessment of at surfaces almost independently on the filament's colour. Is utilizes the assumption about the regularity of the layers visible on the printed high quality surfaces as straight lines, which can be extracted using Hough transform. However, for various colours of filaments some preprocessing operations should be conducted to allow a proper line detection for various samples. In the proposed method the additional brightness compensation has been used together with Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm. Results obtained for the database of 88 photos of 3D printed samples, together with their scans, are encouraging and allow a reliable quality assessment of 3D printed surfaces for various colours of filaments.</p>
					<p><a href="https://lib.jucs.org/article/22621/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/22621/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/22621/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Fri, 28 Jun 2019 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Fast Binarization of Unevenly Illuminated Document Images Based on Background Estimation for Optical Character Recognition Purposes</title>
		    <link>https://lib.jucs.org/article/22616/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 25(6): 627-646</p>
					<p>DOI: 10.3217/jucs-025-06-0627</p>
					<p>Authors: Hubert Michalak, Krzysztof Okarma</p>
					<p>Abstract: One of the key operations during the image preprocessing step in Optical Character Recognition (OCR) algorithms is image binarization. Although for uniformly illuminated images, obtained typically by atbed scanners, the use of a single global threshold may be sufficient for further recognition of individual characters, it cannot be applied directly in case of non-uniform lightened document images. Such problem may occur during capturing photos of documents in unknown lighting conditions making a proper text recognition impossible in some parts of the image. Since the application of popular adaptive thresholding methods, e.g. Niblack, Sauvola and their modifications, based on the analysis of the neighbourhood of each pixel is time consuming, a faster solution might be the division of images into blocks or elimination of non-uniform background. Such an approach can be considered as a balance solution filling the gap between global and local adaptive thresholding. The solution proposed in the paper, useful also for various mobile devices due to limited computational requirements, is based on the approximation of lighting distribution of the background using the reduced resolution images. The proposed method allows to obtain very good OCR results being superior in comparison to typical adaptive binarization algorithms both in terms of the resulting OCR accuracy and computational efficiency.</p>
					<p><a href="https://lib.jucs.org/article/22616/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/22616/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/22616/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Fri, 28 Jun 2019 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Open Domain Targeted Sentiment Classification Using Semi-Supervised Dynamic Generation of Feature Attributes</title>
		    <link>https://lib.jucs.org/article/23705/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 24(11): 1582-1603</p>
					<p>DOI: 10.3217/jucs-024-11-1582</p>
					<p>Authors: Shadi Abudalfa, Moataz Ahmed</p>
					<p>Abstract: Microblogging services have been significantly increased nowadays and enabled people to share conveniently their sentiments (opinions) with regard to matters of concerns. Such sentiments have shown an impact on many fields such as economics and politics. Different sentiment analysis approaches have been proposed in the literature to predict automatically sentiments shared in micro-blogs (e.g., tweets). A class of such approaches predicts opinion towards specific target (entity); this class is referred to as target-dependent sentiment classification. Another class, called open domain targeted sentiment classification, extracts targets from the micro-blog and predicts sentiment towards them. In this research work, we propose a new semi-supervised learning technique for developing open domain targeted sentiment classification by using fewer amounts of labelled data. To the best of our knowledge, our model represents the first semi-supervised technique that is proposed for open domain targeted sentiment classification. Additionally, we propose a new supervised learning model for improving accuracy of open domain targeted sentiment classification. Moreover, we show for the first time that SVM HMM is able to improve accuracy of open domain targeted sentiment classification. Experimental results show that our proposed technique outperforms other prominent techniques available in the literature.</p>
					<p><a href="https://lib.jucs.org/article/23705/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23705/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23705/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Wed, 28 Nov 2018 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>EduRP: an Educational Resources Platform based on Opinion Mining and Semantic Web</title>
		    <link>https://lib.jucs.org/article/23699/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 24(11): 1515-1535</p>
					<p>DOI: 10.3217/jucs-024-11-1515</p>
					<p>Authors: Maritza López, Giner Alor-Hernández, José Sánchez-Cervantes, María del Pilar Salas-Zárate, Mario Paredes-Valverde</p>
					<p>Abstract: Educational platforms have become important tools for e-learning; nonetheless, finding the appropriate educational resources to use often represents a tedious task for learners. Opinions in the educational domain are important information for decision making; they allow teachers to improve the teaching process and enable students to decide on the best educational resources. The large amount of data that is daily generated on the Web makes it difficult, however, to analyze opinions manually. Multiple opinion mining approaches are being proposed as a solution to this problem; this research work introduces EduRP, an education platform that integrates opinion mining techniques and ontology-based user profiling techniques. We specifically propose an opinion mining approach for Spanish text which consists of three main steps: 1) collect opinions from the EduRP platform, 2) process the opinions to normalize the text, and 3) obtain the polarity of the opinions using a machine learning approach. We also propose a profile customization approach that uses Semantic Web technologies, specifically ontologies, to integrate socio-demographic data from different social networks and from the platform itself. Finally, we assess the performance of our system under precision, recall, and F-measure metrics, obtaining average values of 81.85%, 81.80% and 81.54, respectively.</p>
					<p><a href="https://lib.jucs.org/article/23699/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23699/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23699/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Wed, 28 Nov 2018 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>GerIE - An Open Information Extraction System for the German Language</title>
		    <link>https://lib.jucs.org/article/22920/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 24(1): 2-24</p>
					<p>DOI: 10.3217/jucs-024-01-0002</p>
					<p>Authors: Akim Bassa, Mark Kroll, Roman Kern</p>
					<p>Abstract: Open Information Extraction (OIE) allows to extract relations from a text without the need of domain-speci_c training data. To date, most of the research on OIE has been focused to the English language and little or no research has been conducted on other languages, including German. To tackle this problem, we developed GerIE, an OIE system for the German language. We surveyed the literature on OIE in order to identify concepts that may apply to the German language. Our system is based on the output of a German dependency parser and a number of handcrafted rules to extract the propositions. To evaluate the system, we created two dedicated datasets: one derived from news articles and the other devised from texts from an encyclopedia. Our system achieves F-measures of up to 0.89 for correctly-preprocessed sentences.</p>
					<p><a href="https://lib.jucs.org/article/22920/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/22920/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/22920/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sun, 28 Jan 2018 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Comparative Evaluation of Algorithms for Sentiment Analysis over Social Networking Services</title>
		    <link>https://lib.jucs.org/article/23437/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 23(8): 755-768</p>
					<p>DOI: 10.3217/jucs-023-08-0755</p>
					<p>Authors: Akrivi Krouska, Christos Troussas, Maria Virvou</p>
					<p>Abstract: Twitter is a highly popular social networking service and a web-based communication platform with million users exchanging daily public messages, namely tweets, expressing their opinion and feelings towards various issues. Twitter represents one of the largest and most dynamic datasets for data mining and sentiment analysis. Therefore, Twitter Sentiment Analysis constitutes a prominent and an active research area with significant applications in industry and academia. The purpose of this paper is to provide a guideline for the decision of optimal algorithms for sentiment analysis services. In this context, five well-known learning-based classifiers (Naive Bayes, Support Vector Machine, k-Nearest Neighbor, Logistic Regression and C4.5) and a lexicon-based approach (SentiStrength) have been evaluated based on confusion matrices, using three different datasets (OMD, HCR and STS-Gold) and two test models (percentage split and cross validation). The results demonstrate the superiority of Naive Bayes and Support Vector Machine regardless of datasets and test methods.</p>
					<p><a href="https://lib.jucs.org/article/23437/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23437/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23437/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Mon, 28 Aug 2017 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Sentiment Classification of Spanish Reviews: An Approach based on Feature Selection and Machine Learning Methods</title>
		    <link>https://lib.jucs.org/article/23209/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 22(5): 691-708</p>
					<p>DOI: 10.3217/jucs-022-05-0691</p>
					<p>Authors: Mario Paredes-Valverde, Jorge Limon-Romero, Diego Tlapa, Yolanda Baez-Lopez</p>
					<p>Abstract: Sentiment analysis aims to extract users' opinions from review documents. Nowadays, there are two main approaches for sentiment analysis: the semantic orientation and the machine learning. Sentiment analysis approaches based on Machine Learning (ML) methods work over a set of features extracted from the users' opinions. However, the high dimensionality of the feature vector reduces the effectiveness of this approach. In this sense, we propose a sentiment classification method based on feature selection mechanisms and ML methods. The present method uses a hybrid feature extraction method based on POS pattern and dependency parsing. The features obtained are enriched semantically through common-sense knowledge bases. Then, a feature selection method is applied to eliminate the noisy and irrelevant features. Finally, a set of classifiers is trained in order to classify unknown data. To prove the effectiveness of our approach, we have conducted an evaluation in the movies and technological products domains. Also, our proposal was compared with well-known methods and algorithms used on the sentiment classification field. Our proposal obtained encouraging results based on the F-measure metric, ranging from 0.786 to 0.898 for the aforementioned domains.</p>
					<p><a href="https://lib.jucs.org/article/23209/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23209/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23209/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sun, 1 May 2016 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Cross-Language Source Code Re-Use Detection Using Latent Semantic Analysis</title>
		    <link>https://lib.jucs.org/article/23824/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 21(13): 1708-1725</p>
					<p>DOI: 10.3217/jucs-021-13-1708</p>
					<p>Authors: Enrique Flores, Alberto Barrón-Cedeño, Lidia Moreno, Paolo Rosso</p>
					<p>Abstract: Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional approaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text ,with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.</p>
					<p><a href="https://lib.jucs.org/article/23824/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23824/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23824/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Mon, 28 Dec 2015 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>From Terminology Extraction to Terminology Validation:An Approach Adapted to Log Files</title>
		    <link>https://lib.jucs.org/article/23116/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 21(4): 604-635</p>
					<p>DOI: 10.3217/jucs-021-04-0604</p>
					<p>Authors: Hassan Saneifar, Stéphane Bonniol, Pascal Poncelet, Mathieu Roche</p>
					<p>Abstract: Log files generated by computational systems contain relevant and essential information. In some application areas like the design of integrated circuits, log files generated by design tools contain information which can be used in management information systems to evaluate the final products. However, the complexity of such textual data raises some challenges concerning the extraction of information from log files. Log files are usually multi-source, multi-format, and have a heterogeneous and evolving structure. Moreover, they usually do not respect natural language grammar and structures even though they are written in English. Classical methods of information extraction such as terminology extraction methods are particularly irrelevant to this context. In this paper, we introduce our approach EXTERLOG to extract terminology from log files. We detail how it deals with the specific features of such textual data. The performance is emphasized by favoring the most relevant terms of the domain based on a scoring function which uses a Web and context based measure. The experiments show that EXTERLOG is a well-adapted approach for terminology extraction from log files.</p>
					<p><a href="https://lib.jucs.org/article/23116/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23116/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23116/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Wed, 1 Apr 2015 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>An Approach to Skew Detection of Printed Documents</title>
		    <link>https://lib.jucs.org/article/23103/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 20(4): 488-506</p>
					<p>DOI: 10.3217/jucs-020-04-0488</p>
					<p>Authors: Darko Brodić, Carlos A. B. Mello, Čedomir Maluckov, Zoran Milivojevic</p>
					<p>Abstract: In this paper, we propose an approach to estimate the text skew for printed documents. This is an important step to prevent errors in further stages of an automatic document processing system (as text segmentation). Our approach is based on the statistical analysis of the height of the connected components. In a nutshell, our algorithm is comprised of four steps: (i) removal of redundant data; (ii) establishment of the connected components, which represent filled convex hulls around each text element; (iii) enlargement of these components using morphological erosion; (iv) removal of the largest connected component to identify the first estimation of text skew. According to it, the connected components are enlarged by oriented morphological erosion and the longest of them is extracted. Statistical moments are applied to this longest component to evaluate its orientation and the global text skew of the document is identified. At the end of this process, the original document is rotated back based on the calculated angle. The performance of the proposed algorithm is examined by testing on a custom dataset. The results support the robustness of our approach.</p>
					<p><a href="https://lib.jucs.org/article/23103/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23103/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23103/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Tue, 1 Apr 2014 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Teaching Innova Project: the Incorporation of Adaptable Outcomes in Order to Grade Training Adaptability</title>
		    <link>https://lib.jucs.org/article/23627/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 19(11): 1500-1521</p>
					<p>DOI: 10.3217/jucs-019-11-1500</p>
					<p>Authors: Ángel Fidalgo, María Sein-Echaluce, Dolores Lerís, Oscar Castañeda</p>
					<p>Abstract: The education project presented in this paper endeavors to study the feasibility of incorporating adaptive systems into LMS systems, by using them both in training & learning process and at work. This case study is aimed at employability and job post improvement. For this purpose, we have created a process that is flexible both to the student pattern (and to the job pattern. The developed process is adaptable both to the student (via the incorporation of an adaptable system with an LMS system) and to the job model (via an adaptable system to the knowledge management). The evaluation was qualitative and measured the process (feasibility to apply adaptive systems) and the efficiency of the method (applicability and employability). The functionality of the specific developed tools allowed us to grade the degree of adaptability in the training process, to dynamically vary the training plan from the student's actions and to identify the resources that best met the job needs.</p>
					<p><a href="https://lib.jucs.org/article/23627/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23627/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23627/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sat, 1 Jun 2013 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>A Decoupled Architecture for Scalability in Text Mining Applications</title>
		    <link>https://lib.jucs.org/article/23015/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 19(3): 406-427</p>
					<p>DOI: 10.3217/jucs-019-03-0406</p>
					<p>Authors: Jorge Villalon, Rafael Calvo</p>
					<p>Abstract: Sophisticated Text Mining features such as visualization, summarization, and clustering are becoming increasingly common in software applications. In Text Mining, documents are processed using techniques from different areas which can be very expensive in computation cost. This poses a scalability challenge for real-life applications in which users behavior can not be entirely predicted. This paper proposes a decoupled architecture for document processing in Text Mining applications, that allows applications to be scalable for large corpora and real-time processing. It contributes a software architecture designed around these requirements and presents TML, a Text Mining Library that implements the architecture. An experimental evaluation on its scalability using a standard corpus is also presented, and empirical evidence on its performance as part of an automated feedback system for writing tasks used by real students.</p>
					<p><a href="https://lib.jucs.org/article/23015/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23015/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23015/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Fri, 1 Feb 2013 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Learning to Classify Neutral Examples from Positive and Negative Opinions</title>
		    <link>https://lib.jucs.org/article/23918/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 18(16): 2319-2333</p>
					<p>DOI: 10.3217/jucs-018-16-2319</p>
					<p>Authors: María-Teresa Martín-Valdivia, Arturo Montejo-Ráez, Alfonso Ureña-López, Mohammed Saleh</p>
					<p>Abstract: Sentiment analysis is a challenging research area due to the rapid increase of subjective texts populating the web. There are several studies which focus on classifying opinions into positive or negative. Corpora are usually labeled with a star-rating scale. However, most of the studies neglect to consider neutral examples. In this paper we study the effect of using neutral sample reviews found in an opinion corpus in order to improve a sentiment polarity classification system. We have performed different experiments using several machine learning algorithms in order to demonstrate the advantage of taking the neutral examples into account. In addition we propose a model to divide neutral samples into positive and negative ones, in order to incorporate this information into the construction of the final opinion polarity classification system. Moreover, we have generated a corpus from Amazon in order to prove the convenience of the system. The results obtained are very promising and encourage us to continue researching along this line and consider neutral examples as relevant information in opinion mining tasks.</p>
					<p><a href="https://lib.jucs.org/article/23918/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23918/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23918/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Tue, 28 Aug 2012 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>P Systems with Shuffle Operation and Catalytic-Like Rules</title>
		    <link>https://lib.jucs.org/article/23794/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 18(13): 1782-1801</p>
					<p>DOI: 10.3217/jucs-018-13-1782</p>
					<p>Authors: Yunyun Niu, Jinbang Xu, K. G. Subramanian, Rosni Abdullah</p>
					<p>Abstract: Shuffle operation on trajectories is useful in modeling parallel composition of wordsand languages. In this work, a new class of P systems with shuffle operation and catalytic-like rules is presented. Such a system has a membrane structure, where language-objects and shuffle-operation rules are placed in its regions. It can be used as a language generator. In this study, we propose a variant P system with shuffle operation on string-language objects. Some comparisonresults are obtained, which show that the power of shuffle operation is enlarged in the framework of P systems. Moreover, string-language objects are extended to array-language objects, and an-other variant P system with shuffle operation on picture-language objects is introduced. We also illustrate how to generate picture languages by using this kind of devices.</p>
					<p><a href="https://lib.jucs.org/article/23794/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23794/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23794/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sun, 1 Jul 2012 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Improving the Extraction of Text in PDFs by Simulating the Human Reading Order</title>
		    <link>https://lib.jucs.org/article/23164/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 18(5): 623-649</p>
					<p>DOI: 10.3217/jucs-018-05-0623</p>
					<p>Authors: Ismael Hasan, Javier Parapar, Álvaro Barreiro</p>
					<p>Abstract: Text preprocessing and segmentation are critical tasks in search and text mining applications. Due to the huge amount of documents that are exclusively presented in PDF format, most of the Data Mining (DM) and Information Retrieval (IR) systems must extract content from the PDF files. In some occasions this is a difficult task: the result of the extraction process from a PDF file is plain text, and it should be returned in the same order as a human would read the original PDF file. However, current tools for PDF text extraction fail in this objective when working with complex documents with multiple columns. For instance, this is the case of official government bulletins with legal information. In this task, it is mandatory to get correct and ordered text as a result of the application of the PDF extractor. It is very usual that a legal article in a document refers to a previous article and they should be offered in the right sequential order. To overcome these difficulties we have designed a new method for extraction of text in PDFs that simulates the human reading order. We evaluated our method and compared it against other PDF extraction tools and algorithms. Evaluation of our approach shows that it significantly outperforms the results of the existing tools and algorithms.</p>
					<p><a href="https://lib.jucs.org/article/23164/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23164/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23164/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Thu, 1 Mar 2012 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Key Person Analysis in Social Communities within the Blogosphere</title>
		    <link>https://lib.jucs.org/article/23086/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 18(4): 577-597</p>
					<p>DOI: 10.3217/jucs-018-04-0577</p>
					<p>Authors: Anna Zygmunt, Piotr Bródka, Przemysław Kazienko, Jarosław Koźlak</p>
					<p>Abstract: Identifying key persons active in social groups in the blogosphere is performed by means of social network analysis. Two main independent approaches are considered in the paper: (i) discovery of the most important individuals in persistent social communities and (ii) regular centrality measures applied either to social groups or the entire network. A new method for separating of groups stable over time, fulfilling given conditions of activity level of their members is proposed. Furthermore, a new concept for extracting user roles and key persons in such groups is also presented. This new approach was compared to the typical clustering method and the structural node position measure applied to rank users. The experimental studies have been carried out on real two-year blogosphere data.</p>
					<p><a href="https://lib.jucs.org/article/23086/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23086/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23086/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Tue, 28 Feb 2012 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Computational Analysis of Medieval Manuscripts: A New Tool for Analysis and Mapping of Medieval Documents to Modern Orthography</title>
		    <link>https://lib.jucs.org/article/23978/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 18(20): 2750-2770</p>
					<p>DOI: 10.3217/jucs-018-20-2750</p>
					<p>Authors: Mushtaq Ahmad, Stefan Gruner, Muhammad Afzal</p>
					<p>Abstract: Medieval manuscripts or other written documents from that period contain valuable information about people, religion, and politics of the medieval period, making the study of medieval documents a necessary pre-requisite to gaining in-depth knowledge of medieval history. Although tool-less study of such documents is possible and has been ongoing for centuries, much subtle information remains locked such manuscripts unless it gets revealed by effective means of computational analysis. Automatic analysis of medieval manuscripts is a non-trivial task mainly due to non-conforming styles, spelling peculiarities, or lack of relational structures (hyper-links), which could be used to answer meaningful queries. Natural Language Processing (NLP) tools and algorithms are used to carry out computational analysis of text data. However due to high percentage of spelling variations in medieval manuscripts, NLP tools and algorithms cannot be applied directly for computational analysis. If the spelling variations are mapped to standard dictionary words, then application of standard NLP tools and algorithms becomes possible. In this paper we describe a web-based software tool CAMM (Computational Analysis of Medieval Manuscripts) that maps medieval spelling variations to a modern German dictionary. Here we describe the steps taken to acquire, reformat, and analyze data, produce putative mappings as well as the steps taken to evaluate the findings. At the time of the writing of this paper, CAMM provides access to 11275 manuscripts organized into 54 collections containing a total of 242446 distinctly spelled words. CAMM accurately corrects spelling of 55% percent of the verifiable words. CAMM is freely available at http://researchworks.cs.athabascau.ca/.</p>
					<p><a href="https://lib.jucs.org/article/23978/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/23978/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/23978/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Wed, 1 Feb 2012 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Let Me Tell You a Story - On How to Build Process Models</title>
		    <link>https://lib.jucs.org/article/29893/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 17(2): 276-295</p>
					<p>DOI: 10.3217/jucs-017-02-0276</p>
					<p>Authors: João Carlos de A R Gonçalves, Flávia Santoro, Fernanda Baião</p>
					<p>Abstract: Process Modeling has been a very active research topic for the last decades. One of its main issues is the externalization of knowledge and its acquisition for further use, as this remains deeply related to the quality of the resulting process models produced by this task. This paper presents a method and a graphical supporting tool for process elicitation and modeling, combining the Group Storytelling technique with the advances of Text Mining and Natural Language Processing. The implemented tool extends its previous versions with several functionalities to facilitate group story telling by the users, as well as to improve the results of the acquired process model from the stories.</p>
					<p><a href="https://lib.jucs.org/article/29893/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/29893/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/29893/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Fri, 28 Jan 2011 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>An OCR Free Method for Word Spotting in Printed Documents: the Evaluation of Different Feature Sets</title>
		    <link>https://lib.jucs.org/article/29876/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 17(1): 48-63</p>
					<p>DOI: 10.3217/jucs-017-01-0048</p>
					<p>Authors: Israel Rios, Alceu Britto Jr, Alessandro Koerich, Luis Eduardo Soares Oliveira</p>
					<p>Abstract: An OCR free word spotting method is developed and evaluated under a strong experimental protocol. Different feature sets are evaluated under the same experimental conditions. In addition, a tuning process in the document segmentation step is proposed which provides a significant reduction in terms of processing time. For this purpose, a complete OCR-free method for word spotting in printed documents was implemented, and a document database containing document images and their corresponding ground truth text files was created. A strong experimental protocol based on 800 document images allows us to compare the results of the three feature sets used to represent the word image.</p>
					<p><a href="https://lib.jucs.org/article/29876/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/29876/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/29876/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sat, 1 Jan 2011 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>A New Approach to Water Flow Algorithm for Text Line Segmentation</title>
		    <link>https://lib.jucs.org/article/29875/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 17(1): 30-47</p>
					<p>DOI: 10.3217/jucs-017-01-0030</p>
					<p>Authors: Darko Brodić, Zoran Milivojevic</p>
					<p>Abstract: This paper proposes a new approach to water flow algorithm for the text line segmentation. Original method assumes hypothetical water flows under a few specified angles to the document image frame from left to right and vice versa. As a result, unwetted image frames are extracted. These areas are of major importance for text line segmentation. Method modifications mean extension values of water flow angle and unwetted image frames function enlargement. Results are encouraging due to text line segmentation improvement which is the most challenging process stage in document image processing.</p>
					<p><a href="https://lib.jucs.org/article/29875/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/29875/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/29875/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sat, 1 Jan 2011 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Text Line Detection and Segmentation: Uneven Skew Angles and Hill-and-Dale Writing</title>
		    <link>https://lib.jucs.org/article/29873/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 17(1): 16-29</p>
					<p>DOI: 10.3217/jucs-017-01-0016</p>
					<p>Authors: Ergina Kavallieratou, Fotis Daskas</p>
					<p>Abstract: In this paper a line detection and segmentation technique is presented. The proposed technique is an improved version of an older one. The experiments have been performed on the training dataset of the ICDAR 2009 handwriting segmentation contest in order to be able to compare, objectively, the performance of the two techniques. The improvement between the older and newer version is more than 24% while the average extra CPU time cost is less than 200 ms per page.</p>
					<p><a href="https://lib.jucs.org/article/29873/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/29873/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/29873/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sat, 1 Jan 2011 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>XML Database Transformations</title>
		    <link>https://lib.jucs.org/article/29847/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 16(20): 3043-3072</p>
					<p>DOI: 10.3217/jucs-016-20-3043</p>
					<p>Authors: Klaus-Dieter Schewe, Qing Wang</p>
					<p>Abstract: Database transformations provide a unifying umbrella for queries and updates. In general, they can be characterised by five postulates, which constitute the database analogue of Gurevich's sequential ASM thesis. Among these postulates the background postulate supposedly captures the particularities of data models and schemata. For the characterisation of XML database transformations the natural first step is therefore to define the appropriate tree-based backgrounds, which draw on hereditarily finite trees, tree algebra operations, and extended document type definitions. This defines a computational model for XML database transformation using a variant of Abstract State Machines. Then the incorporation of weak monadic second-order logic provides an alternative computational model called XML machines. The main result is that these two computational models for XML database transformations are equivalent.</p>
					<p><a href="https://lib.jucs.org/article/29847/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/29847/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/29847/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Mon, 1 Nov 2010 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Geometric Point Pattern Matching in the Knuth-Morris-Pratt Way</title>
		    <link>https://lib.jucs.org/article/29739/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 16(14): 1902-1911</p>
					<p>DOI: 10.3217/jucs-016-14-1902</p>
					<p>Authors: Esko Ukkonen</p>
					<p>Abstract: Given finite sets P and T of points in the Euclidean space Rd, the point pattern matching problem studied in this paper is to find all translations f ∈ Rd such that P + f ⊆ T. A fast search algorithm with some variants is presented for point patterns P that have regular grid-like geometric shape. The algorithm is analogous to the Knuth-Morris-Pratt algorithm of string matching. The time requirement of the search is O(r|T|) where r is the grid dimension of P. Pattern P has grid dimension r = 1 if it consists of evenly spaced points on a line. In general, a pattern P is an r-dimensional grid if it has for some p ∈ P and e1, ... , er ∈ Rd and positive integers m1, ... , mr a representation P = {p + i1e1 + ⋅⋅⋅ + irer | 0 ≤ ij ≤ mj} where the ij's are integers. Both P and T are given to the search algorithm in the lexicographic order.</p>
					<p><a href="https://lib.jucs.org/article/29739/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/29739/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/29739/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Wed, 28 Jul 2010 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Adaptive Binarization of Unconstrained Hand-Held Camera-Captured Document Images</title>
		    <link>https://lib.jucs.org/article/29563/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 15(18): 3343-3363</p>
					<p>DOI: 10.3217/jucs-015-18-3343</p>
					<p>Authors: Syed Bukhari, Faisal Shafait, Thomas Breuel</p>
					<p>Abstract: This paper presents a new adaptive binarization technique for degraded hand-held camera-captured document images. State-of-the-art locally adaptive binarization methods are sensitive to the values of free parameter. This problem is more critical when binarizing degraded camera-captured document images because of distortions like non-uniform illumination, bad shading, blurring, smearing and low resolution. We demonstrate in this paper that local binarization methods are not only sensitive to the selection of free parameters values (either found manually or automatically), but also sensitive to the constant free parameters values for all pixels of a document image. Some range of values of free parameters are better for foreground regions and some other range of values are better for background regions. For overcoming this problem, we present an adaptation of a state-of-the-art local binarization method such that two different set of free parameters values are used for foreground and background regions respectively. We present the use of ridges detection for rough estimation of foreground regions in a document image. This information is then used to calculate appropriate threshold using different set of free parameters values for the foreground and background regions respectively. Evaluation of the method using an OCR-based measure and a pixel-based measure show that our method achieves better performance as compared to state-of-the-art global and local binarization methods.</p>
					<p><a href="https://lib.jucs.org/article/29563/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/29563/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/29563/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Mon, 28 Dec 2009 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Semantic Information in Medical Information Systems: Utilization of Text Mining Techniques to Analyze Medical Diagnoses</title>
		    <link>https://lib.jucs.org/article/29287/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 14(22): 3781-3795</p>
					<p>DOI: 10.3217/jucs-014-22-3781</p>
					<p>Authors: Andreas Holzinger, Regina Geierhofer, Felix Mödritscher, Roland Tatzl</p>
					<p>Abstract: Most information in Hospitals is still only available in text format and the amount of this data is immensely increasing. Consequently, text mining is an essential area of medical informatics. With the aid of statistic and linguistic procedures, text mining software attempts to dig out (mine) information from plain text. The aim is to transform data into information. However, for the efficient support of end users, facets of computer science alone are insufficient; the next step consists of making the information both usable and useful. Consequently, aspects of cognitive psychology must be taken into account in order to enable the transformation of information into knowledge of the end users. In this paper we describe the design and development of an application for analyzing expert comments on magnetic resonance images (MRI) diagnoses by applying a text mining method in order to scan them for regional correlations. Consequently, we propose a calculation of significant co-occurrences of diseases and defined regions of the human body, in order to identify possible risks for health.</p>
					<p><a href="https://lib.jucs.org/article/29287/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/29287/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/29287/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sun, 28 Dec 2008 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>An Evaluation Technique for Binarization Algorithms</title>
		    <link>https://lib.jucs.org/article/29211/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 14(18): 3011-3030</p>
					<p>DOI: 10.3217/jucs-014-18-3011</p>
					<p>Authors: Pavlos Stathis, Ergina Kavallieratou, Nikos Papamarkos</p>
					<p>Abstract: Document binarization is an active research area for many years. The choice of the most appropriate binarization algorithm for each case proved to be a very difficult procedure itself. In this paper, we propose a new technique for the validation of document binarization algorithms. Our method is simple in its implementation and can be performed on any binarization algorithm since it doesnt require anything more than the binarization stage. As a demonstration of the proposed technique, we use the case of degraded historical documents. Then we apply the proposed technique to 30 binarization algorithms. Experimental results and conclusions are presented.</p>
					<p><a href="https://lib.jucs.org/article/29211/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/29211/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/29211/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Wed, 1 Oct 2008 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Systematic Characterisation of Objects in Digital Preservation: The eXtensible Characterisation Languages</title>
		    <link>https://lib.jucs.org/article/29201/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 14(18): 2936-2952</p>
					<p>DOI: 10.3217/jucs-014-18-2936</p>
					<p>Authors: Christoph Becker, Andreas Rauber, Volker Heydegger, Jan Schnasse, Manfred Thaller</p>
					<p>Abstract: During the last decades, digital objects have become the primary medium to create, shape, and exchange information. However, in contrast to analog objects such as books that directly represent their content, digital objects are not usable without a corresponding technical environment. The fast changes in these environments and in formats and technologies mean that digital documents have a short lifespan before they become obsolete. Digital preservation, i.e. actions to ensure longevity of digital information, thus has become a pressing challenge. The dominant strategies prevailing today are migration and emulation; for each strategy, different tools are available. When converting an object to a different representation, a validation of the content is needed to verify that the transformed objects are still authentically representing the same intellectual content. This validation so far is largely done manually, which is infeasible for large collections. Preservation planning supports decision makers in reaching accountable decisions by evaluating potential strategies against well-defined requirements. Especially the evaluation of different migration tools for digital preservation has to rely on validating the converted objects and thus on an analysis of the logical structure and the content of documents. Existing approaches for characterising and describing objects do not attempt to fully extract the informational content of digital objects and thus are not suffficient for an in-depth validation of transformed content. This paper describes the eXtensible Characterisation Languages (XCL) that support the automatic validation of document conversions and the evaluation of migration quality by hierarchically decomposing a document and representing documents from different sources in an abstract XML language. The description language XCDL provides an abstract representation of digital content in XML, while the extraction language XCEL allows an extraction engine to create such an abstract description by mapping file format structures to XCDL concepts. We present the context of the development of these languages and tools and describe the overall concept and features of the languages. We further give examples and show how the languages can be applied to the evaluation of digital preservation solutions in the context of preservation planning.</p>
					<p><a href="https://lib.jucs.org/article/29201/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/29201/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/29201/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Wed, 1 Oct 2008 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Table-form Extraction with Artefact Removal</title>
		    <link>https://lib.jucs.org/article/28942/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 14(2): 252-265</p>
					<p>DOI: 10.3217/jucs-014-02-0252</p>
					<p>Authors: Luiz Antônio Pereira Neves, João De Carvalho, Jacques Facon, Flávio Bortolozzi</p>
					<p>Abstract: In this paper we present a novel methodology to recognize the layout structure of handwritten filled table-forms. Recognition methodology includes locating line intersections, correcting wrong intersections produced by what we call artefacts (overlapping data, broken segments and smudges), extracting correct table-form cells and using as little previous table-form knowledge as possible. To improve layout structure recognition, a novel artefact identification and deletion method is also proposed. To evaluate the effectiveness of the methodology, a database composed of 350 handwritten filled table-form images damaged by different types of artefacts was used. Experiments show that the artefact identification method improves performance of the table-forms structure extractor that reached a success rate of 85%.</p>
					<p><a href="https://lib.jucs.org/article/28942/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/28942/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/28942/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Mon, 28 Jan 2008 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Informatics for Historians: Tools for Medieval Document XML Markup, and their Impact on the History-Sciences</title>
		    <link>https://lib.jucs.org/article/28938/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 14(2): 193-210</p>
					<p>DOI: 10.3217/jucs-014-02-0193</p>
					<p>Authors: Benjamin Burkard, Georg Vogeler, Stefan Gruner</p>
					<p>Abstract: This article is a revised and extended version of [VBG, 07]. We conjecture that the digitalization of historical text documents as a basis of data mining and information retrieval for the purpose of progress in the history sciences is urgently needed. We present a novel, specialist XML tool-suite supporting the working historian in the transcription of original medieval charters into a machine-readable form, and we also address some latest developments which can be found in the field since the publication of [VBG, 07].</p>
					<p><a href="https://lib.jucs.org/article/28938/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/28938/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/28938/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Mon, 28 Jan 2008 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Using an Evolving Thematic Clustering in a Text Segmentation Process</title>
		    <link>https://lib.jucs.org/article/28935/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 14(2): 178-192</p>
					<p>DOI: 10.3217/jucs-014-02-0178</p>
					<p>Authors: Sylvain Lamprier, Tassadit Amghar, Bernard Levrat, Frederic Saubion</p>
					<p>Abstract: The thematic text segmentation task consists in identifying the most important thematic breaks in a document in order to cut it into homogeneous passages. We propose in this paper an algorithm for linear text segmentation on general corpuses. It relies on an initial clustering of the sentences of the text. This preliminary partitioning provides a global view on the sentences relations existing in the text, considering the similarities in a group rather than individually. The method, so-called ClassStruggle, is based on the distribution of the occurrences of the members of each class. During the process, the clusters then evolve, by considering a notion of proximity and of layout in the text, in the aim to create groups that contain only sentences related to a same topic development. Finally, boundaries are created between sentences belonging to two different classes. First experimental results are promising, ClassStruggle appears to be very competitive compared with existing methods.</p>
					<p><a href="https://lib.jucs.org/article/28935/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/28935/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/28935/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Mon, 28 Jan 2008 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>A Collaborative Biomedical Research System</title>
		    <link>https://lib.jucs.org/article/28562/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 12(1): 80-98</p>
					<p>DOI: 10.3217/jucs-012-01-0080</p>
					<p>Authors: Adel Taweel, Alan Rector, Jeremy Rogers</p>
					<p>Abstract: The convergence of need between improved clinical care and post genomics research presents a unique challenge to restructuring information flow so that it benefits both without compromising patient safety or confidentiality. The CLEF project aims to link-up heath care with bioinformatics to build a collaborative research platform that enables a more effective biomedical research. In that, it addresses various barriers and issues, including privacy both by policy and by technical means, towards establishing its eventual system. It makes extensive use of language technology for information extraction and presentation, and its shared repository is based around coherent "chronicles" of patients' histories that go beyond traditional health record structure. It makes use of a collaborative research workbench that encompasses several technologies and uses many tools providing a rich platform for clinical researcher.</p>
					<p><a href="https://lib.jucs.org/article/28562/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/28562/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/28562/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sat, 28 Jan 2006 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Modelling and Implementing Pre-built Information Spaces. Architecture and Methods for Process Oriented Knowledge Management</title>
		    <link>https://lib.jucs.org/article/28393/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 11(4): 605-633</p>
					<p>DOI: 10.3217/jucs-011-04-0605</p>
					<p>Authors: Karsten Böhm, Wolf Engelbach, Joerg Härtwig, Martin Wilcken, Martin Delp</p>
					<p>Abstract: Process-oriented Knowledge Management aims to provide adequate information for employees, especially in weakly structured and information-intensive business processes. Beside a technical software solution, which uses a pre-structured, context-aware and collaborative information space that combines processes, domain specific semantic structures and document parts, this requires a methodology to model the process and other context-dimensions, such as roles. Moreover, a guideline and clear service modules are necessary to introduce process-oriented Knowledge Management in companies, especially in small and medium-sized enterprises (SME). Such solutions were developed in the cooperative research project PreBIS (Pre-Build Information Space).</p>
					<p><a href="https://lib.jucs.org/article/28393/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/28393/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/28393/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Thu, 28 Apr 2005 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>ADDS: A Document-Oriented Approach for Application Development</title>
		    <link>https://lib.jucs.org/article/28301/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 10(9): 1302-1324</p>
					<p>DOI: 10.3217/jucs-010-09-1302</p>
					<p>Authors: José Sierra, Alfredo Fernández-Valmayor, Baltasar Fernández-Manjón, Antonio Navarro</p>
					<p>Abstract: This paper proposes a document oriented paradigm to the development of content-intensive, document-based applications (e.g. educational and hypermedia applications, and knowledge based systems). According to this paradigm, the main aspects of this kind of applications can be described by means of documents. Afterwards, these documents are marked up using descriptive domain-specific markup languages and applications are produced by the automatic processing of these marked documents. We have used this paradigm to improve the maintenance and portability of content-intensive educational and hypermedia applications. ADDS (Approach to Document-based Development of Software) is an approach to software development based on the document oriented paradigm. A key feature of ADDS is that formulation of domain-specific markup languages is a dynamic and eminently pragmatic activity, and markup languages evolve according to the authoring needs of the different participants in the development process (domain experts and developers). The evolutionary nature of markup languages in ADDS leads to OADDS (Operationalization in ADDS), the proposed operationalization model for the incremental development of modular markup language processors. Finally, the document-oriented paradigm can also be applied in the construction of OADDS processors that are also described using marked documents. This paper presents our ADDS approach, including the operationalization model and its implementation as an object-oriented framework. The application of our document-oriented paradigm to the construction of OADDS processors is also presented.</p>
					<p><a href="https://lib.jucs.org/article/28301/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/28301/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/28301/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Tue, 28 Sep 2004 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Etiquette, Empathy and Trust in Communities of Practice: Stepping-Stones to Social Capital</title>
		    <link>https://lib.jucs.org/article/28208/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 10(3): 294-302</p>
					<p>DOI: 10.3217/jucs-010-03-0294</p>
					<p>Authors: Jennifer Preece</p>
					<p>Abstract: Creating online communities of practice involves much more than creating software. Software houses online communities of practice activities but social interactions also depend on who is involved, what their goals are, their personalities and the community's norms and policies. By paying attention to these sociability issues, community members can influence how their community develops. Norms that lead to good online etiquette, empathy and trust between community members provide stepping-stones for social capital development.</p>
					<p><a href="https://lib.jucs.org/article/28208/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/28208/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/28208/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sun, 28 Mar 2004 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Automatic Discovery and Aggregation of Compound Names for the Use in Knowledge Representations</title>
		    <link>https://lib.jucs.org/article/28035/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 9(6): 530-541</p>
					<p>DOI: 10.3217/jucs-009-06-0530</p>
					<p>Authors: Christian Biemann, Uwe Quasthoff, Karsten Böhm, Christian Wolff</p>
					<p>Abstract: Automatic acquisition of information structures like Topic Maps or semantic networks from large document collections is an important issue in knowledge management. An inherent problem with automatic approaches is the treatment of multiword terms as single semantic entities. Taking company names as an example, we present a method for learning multiword terms from large text corpora exploiting their internal structure. Through the iteration of a search step and a verification step the single words typically forming company names are learnt. These name elements are used for recognizing compounds in order to use them for further processing. We give some evaluation of experiments on company name extraction and discuss some applications.</p>
					<p><a href="https://lib.jucs.org/article/28035/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/28035/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/28035/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sat, 28 Jun 2003 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Components of a Model of Context-Sensitive Hypertexts</title>
		    <link>https://lib.jucs.org/article/27913/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 8(10): 924-943</p>
					<p>DOI: 10.3217/jucs-008-10-0924</p>
					<p>Authors: Alexander Mehler</p>
					<p>Abstract: On the background of rising Intranet applications the automatic generation of adaptable, context-sensitive hypertexts becomes more and more important [El-Beltagy et al., 2001]. This observation contradicts the literature on hypertext authoring, where Information Retrieval techniques prevail, which disregard any linguistic and context-theoretical underpinning. As a consequence, resulting hypertexts do not manifest those schematic structures, which are constitutive for the emergence of text types and the context-mediated understanding of their instances, i.e. natural language texts. This paper utilizes Systemic Functional Linguistics (SFL) and its context model as a theoretical basis of hypertext authoring. So called Systemic Functional Hypertexts (SFHT) are proposed, which refer to a stratified context layer as the proper source of text linkage. The purpose of this paper is twofold: First, hypertexts are reconstructed from a linguistic point of view as a kind of supersign, whose constituents are natural language texts and whose structuring is due to intra- and intertextual coherence relations and their context-sensitive interpretation. Second, the paper prepares a formal notion of SFHTs as a first step towards operationalization of fundamental text linguistic concepts. On this background, SFHTs serve to overcome the theoretical poverty of many approaches to link generation.</p>
					<p><a href="https://lib.jucs.org/article/27913/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/27913/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/27913/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Mon, 28 Oct 2002 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>An Object-oriented Approach to Design, Specification, and Implementation of Hyperlink Structures Based on Usual Software Development</title>
		    <link>https://lib.jucs.org/article/27911/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 8(10): 892-912</p>
					<p>DOI: 10.3217/jucs-008-10-0892</p>
					<p>Authors: Alexander Fronk</p>
					<p>Abstract: Different models and methodologies for the development of hypermedia systems and applications have emerged in the recent years. Software-technical methods and principles enriched with ideas mainly driven from the applications needs are often sponsor to those models and methodologies. Hence, they deal with very specific problems occurring in the hypermedia domain, thereby extending design notations like UML or State Charts and adapting them to modeling this domain. In the present paper, we propose a very usual software-technical approach to the development of hyperlink structures which form the basis for navigation in hyperdocuments. Our approach uses standard UML, algebraic specification and object-oriented implementation to cover the construction of hyperlink structures, from design through to specification and realization. We thereby equate the development of hypermedia documents with usual software development. Instead of adopting software-engineering and notations to hypermedial concerns, we adopt the latter to the former and show the advantages of this approach.</p>
					<p><a href="https://lib.jucs.org/article/27911/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/27911/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/27911/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Mon, 28 Oct 2002 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Topic Map Generation Using Text Mining</title>
		    <link>https://lib.jucs.org/article/27889/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 8(6): 623-633</p>
					<p>DOI: 10.3217/jucs-008-06-0623</p>
					<p>Authors: Karsten Böhm, Gerhard Heyer, Uwe Quasthoff, Christian Wolff</p>
					<p>Abstract: Starting from text corpus analysis with linguistic and statistical analysis algorithms, an infrastructure for text mining is described which uses collocation analysis as a central tool. This text mining method may be applied to different domains as well as languages. Some examples taken form large reference databases motivate the applicability to knowledge management using declarative standards of information structuring and description. The ISO/IEC Topic Map standard is introduced as a candidate for rich metadata description of information resources and it is shown how text mining can be used for automatic topic map generation.</p>
					<p><a href="https://lib.jucs.org/article/27889/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/27889/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/27889/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Fri, 28 Jun 2002 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>The Roles of Video in the Design, Development, and Use of Interactive Electronic Conference Proceedings</title>
		    <link>https://lib.jucs.org/article/27502/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 4(6): 604-628</p>
					<p>DOI: 10.3217/jucs-004-06-0604</p>
					<p>Authors: Samuel Rebelsky, Fillia Makedon, P. Metaxas, Peter Gloor</p>
					<p>Abstract: Abstract: In this paper, we discuss the design and development of a particular type of electronic publication that has gained recent popularity: electronic conference proceedings. We suggest that modern electronic proceedings should provide a high degree of interactivity. To support such interactivity, proceedings should include an extensive collection of features and diverse multimedia components. Features appropriate for electronic proceedings include annotation, presentation, and retrieval mechanisms. Conference papers and multimedia reproductions of conference presentations with features that allow readers to manipulate these reproductions particularly enhance the interactivity of electronic proceedings. Experience from interactive proceedings the authors have designed is also discussed. Special attention is given to the multiple roles video elements can and should play in interactive proceedings.</p>
					<p><a href="https://lib.jucs.org/article/27502/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/27502/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/27502/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sun, 28 Jun 1998 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Structured Parallel Computation in Structured Documents</title>
		    <link>https://lib.jucs.org/article/27323/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 3(1): 42-68</p>
					<p>DOI: 10.3217/jucs-003-01-0042</p>
					<p>Authors: D. Skillicorn</p>
					<p>Abstract: Document archives contain large amounts of data to which sophisticated queries are applied. The size of archives and the complexity of evaluating queries makes the use of parallelism attractive. The use of semantically-based markup such as SGML makes it possible to represent documents and document archives as data types. We present a theory of trees and tree homomorphisms, modelling structured text archives and operations on them, from which it can be seen that: many apparently unrelated tree operations are homomorphisms, homomorphisms can be described in a simple parameterised way that gives standard sequential and parallel implementations for them, and certain special classes of homomorphisms have parallel implementations of practical importance. In particular, we develop an algorithm for path expression search, a novel powerful query facility for structured text, taking time logarithmic in the text size. This algorithm is the first example of a new algorithm discovered using homomorphic skeletons over data types.</p>
					<p><a href="https://lib.jucs.org/article/27323/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/27323/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/27323/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Tue, 28 Jan 1997 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Evaluating and Improving WWW-Aided Instruction</title>
		    <link>https://lib.jucs.org/article/27314/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 2(12): 829-841</p>
					<p>DOI: 10.3217/jucs-002-12-0829</p>
					<p>Authors: Samuel Rebelsky</p>
					<p>Abstract: A growing number of instructors are putting course resources on the World Wide Web (WWW) [Berners-Lee et al. 1994], from simple course descriptions through traditional printed handouts to complete "classroom-free" classes ([Team Web 1995] provides a broad sampling of such resources). However, there appears to be a paucity of evaluation of WWW based classroom resources. Do they help or do they hurt? Which materials are more valuable or less valuable? How do students react to the web?This paper describes the design, evaluation, redesign, and re-evaluation of a number of course webs that incorporate a wide range of resources (including readings, notes, transcriptions, and traditional handouts) and media (including text, images, and audio). This paper generalizes student reactions to webs for two introductory Computer Science courses [Rebelsky 1994] [Rebelsky 1996], incorporating additional comments from students in advanced courses.Key Words: Multimedia Information Systems [Evaluation/Methodology], Computer Uses in Education, World-Wide Web, Hypertext Document Design and Preparation, Computer Science Education, Computer Literacy.</p>
					<p><a href="https://lib.jucs.org/article/27314/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/27314/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/27314/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sat, 28 Dec 1996 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>The Dortmund Family of Hypermedia Models - Concepts and their Application</title>
		    <link>https://lib.jucs.org/article/27206/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 2(1): 34-56</p>
					<p>DOI: 10.3217/jucs-002-01-0034</p>
					<p>Authors: Klaus Tochtermann, Gisbert Dittrich</p>
					<p>Abstract: This paper presents the Dortmund Family of Hypermedia Models (DFHM). Existing formal models for hypermedia mostly lack the flexibility and adaptability and, often not more than one existing system conforms to such a model. The DFHM overcomes this drawback by means of optional and alternative data types. The conformance of a hypermedia system to the DFHM can be conditionalised upon one member of the family. The DFHM has been formalised in VDM, but the aim of this paper is to give an informal overview of the main concepts. Therefore, any formalisms are omitted here. The first part of the paper deals with hypermedia fundamentals from a conceptual perspective. Apart from basic concepts, e.g. nodes and links, also structuring concepts, e.g. views, folders and others, are discussed in detail. Some examples are given to convey how models for existing hypermedia systems can be derived from the DFHM. The second part demonstrates the power of these concepts by introducing main features of a hypermedia system that has been developed for the use in educational settings. This hypermedia system bases upon a member of the DFHM.</p>
					<p><a href="https://lib.jucs.org/article/27206/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/27206/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/27206/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sun, 28 Jan 1996 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>Contained Hypermedia</title>
		    <link>https://lib.jucs.org/article/27170/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 1(10): 687-705</p>
					<p>DOI: 10.3217/jucs-001-10-0687</p>
					<p>Authors: Erik Duval, Henk Olivié, Nick Scherbakov</p>
					<p>Abstract: We propose a new hypermedia data model, called CHM for Contained HyperMedia. Our model is based on set-oriented data structuring, with a strong emphasis on automatic maintenance of link integrity. In this paper, the CHM model is presented in detail: both data structuring, navigational facilities and authoring support are presented. We will also explain how we have integrated support for the CHM model in Home, our Hypermedia Object Management Environment, publicly accessible through the World-Wide Web.</p>
					<p><a href="https://lib.jucs.org/article/27170/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/27170/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/27170/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sat, 28 Oct 1995 00:00:00 +0000</pubDate>
		</item>
	
		<item>
		    <title>HOME: An Environment for Hypermedia Objects</title>
		    <link>https://lib.jucs.org/article/27125/</link>
		    <description><![CDATA[
					<p>JUCS - Journal of Universal Computer Science 1(5): 269-291</p>
					<p>DOI: 10.3217/jucs-001-05-0269</p>
					<p>Authors: Erik Duval, Henk Olivié, Piers Hanlon, David Jameson</p>
					<p>Abstract: In this paper, we present HOME, a new environment for distributed hypermedia. We mainly concentrate on the server side, and provide access to World-Wide Web clients through a gateway mechanism. Data and metadata are strictly separated in the distributed HOME server. The architecture is based on a layered approach with separate layers for raw data, multimedia characteristics and hypermedia structure. We briefly present some of the implementation aspects and emphasise distinctive characteristics of HOME. We conclude with a comparison with related research and our plans for the future.</p>
					<p><a href="https://lib.jucs.org/article/27125/">HTML</a></p>
					<p><a href="https://lib.jucs.org/article/27125/download/xml/">XML</a></p>
					<p><a href="https://lib.jucs.org/article/27125/download/pdf/">PDF</a></p>
			]]></description>
		    <category>Research Article</category>
		    <pubDate>Sun, 28 May 1995 00:00:00 +0000</pubDate>
		</item>
	
	</channel>
</rss>
	