Downloads: 121 | Views: 163
Research Paper | Computer Science & Engineering | Kuwait | Volume 5 Issue 11, November 2016
Improving Compression Methods for Arabic Text Using Dedicated Character Mapping
Abstract: Natural Language Text Compression methods have been discussed thoroughly in the literature in the past years, different methodologies have been implemented and introduced, most however focused on English and European languages. Rather few studies have focused on Arabic Language, some methods used statistical approaches, other methods used dictionary based compression techniques, while some used features of the Arabic language and derivation rules in attempt to increase compression ratio. In this paper, we will introduce several statistical methods for natural language and apply it on Arabic text. We will also provide implementation for each of these methods and give a comparison between them in terms of performance, compression ratio, resource requirements for running the algorithms and areas and application and usage. Golomb, Elias Gamma Code, Huffman methods are to be implemented, and compared as a sample statistical algorithms, We will also introduce a dedicated Arabic Character Mapping technique to be used in the Elias, Golomb and Huffman algorithms, which will show through the results a major improvement to the compression ratio in comparison to the original methods when applied on binary data ignoring the language underneath, the improvement introduced will show that it can be superior even to LZW when used on small Arabic Sample Files, two sets of data will be tested, first set uses random Arabic text, the second set will use real texts from complete Arabic stories and books.
Keywords: Arabic Text Compression, Golomb Code, Elias Code, Huffman Code, Improved Arabic Character Mapping
Edition: Volume 5 Issue 11, November 2016,
Pages: 1379 - 1387