JUCS - Journal of Universal Computer Science 17(1): 48-63, doi: 10.3217/jucs-017-01-0048
An OCR Free Method for Word Spotting in Printed Documents: the Evaluation of Different Feature Sets
expand article infoIsrael Rios, Alceu Britto Jr, Alessandro Lameiras Koerich, Luis Eduardo Soares Oliveira§
‡ Pontifical Catholic University of Parana, Curitiba, Brazil§ Federal University of Parana, Curitiba, Brazil
Open Access
Abstract
An OCR free word spotting method is developed and evaluated under a strong experimental protocol. Different feature sets are evaluated under the same experimental conditions. In addition, a tuning process in the document segmentation step is proposed which provides a significant reduction in terms of processing time. For this purpose, a complete OCR-free method for word spotting in printed documents was implemented, and a document database containing document images and their corresponding ground truth text files was created. A strong experimental protocol based on 800 document images allows us to compare the results of the three feature sets used to represent the word image.
Keywords
word spotting, document retrieval, word recognition