JUCS - Journal of Universal Computer Science 3(2): 70-85, doi: 10.3217/jucs-003-02-0070
Symbol Ranking Text Compression with Shannon Recodings
expand article infoPeter Fenwick
‡ epartment of Computer Science, The University of Auckland, Auckland, New Zealand
Open Access
Abstract
In his work on the information content of English text in 1951, Shannon described a method of recoding the input text, a technique which has apparently lain dormant for the ensuing 45 years. Whereas traditional compressors exploit symbol frequencies and symbol contexts, Shannon's method adds the concept of "symbol ranking", as in `the next symbol is the one third most likely in the present context'. While some other recent compressors can be explained in terms of symbol ranking, few make explicit reference to the concept. This report describes an implementation of Shannon's method and shows that it forms the basis of a good text compressor.
Keywords
text compression, Shannon, symbol ranking