• Aug 06, 2018 News! IJIET Vol. 7, No. 1-No. 8 have been indexed by EI (Inspec).   [Click]
  • Feb 02, 2019 News!Vol. 9, No. 2 issue has been published online!   [Click]
  • Dec 28, 2018 News!Vol. 9, No. 1 has been indexed by Crossref.
General Information
    • ISSN: 2010-3689
    • Frequency: Bimonthly (2011-2014); Monthly (Since 2015)
    • DOI: 10.18178/IJIET
    • Editor-in-Chief: Prof. Dr. Steve Thatcher
    • Executive Editor: Ms. Nancy Y. Liu
    • Abstracting/ Indexing: EI (INSPEC, IET), Electronic Journals Library, Google Scholar, Crossref and ProQuest
    • E-mail: ijiet@ejournal.net
Prof. Dr. Steve Thatcher
QUniversity, Australia
It is my honor to be the editor-in-chief of IJIET. The journal publishes good-quality papers which focous on the advanced researches in the field of information and education technology. Hopefully, IJIET will become a recognized journal among the scholars in the related fields.

IJIET 2012 Vol.2(4): 348-353 ISSN: 2010-3689
DOI: 10.7763/IJIET.2012.V2.149

Tokenization as Preprocessing for Arabic Tagging System

Ahmed H. Aliwy

Abstract—Tokenization is very important in natural language processing. It can be seen as a preparation stage for all other natural language processing tasks. In this paper we propose a hybrid unsupervised method for Arabic tokenization system, considered as a stand-alone problem. After getting words from sentences by segmentation, we used the author’s analyzer to produce all possible tokenizations for each word. Then, written rules and statistical methods are applied to solve the ambiguities. The output is one tokenization for each word. The statistical method was trained using 29k words, manually tokenized (data available from http://www.mimuw.edu.pl\~aliwy) from Al-Watan 2004 corpus (available from http://sites.google.com/site/mouradabbas9/corpora). The final accuracy was 98.83%.

Index Terms—Arabic Tokenization, Arabic segmentation, Arabic tagging.

H. Aliwy is with Institute of Informatics, University of Warsaw, Warsaw, Poland (ahmed_7425@yahoo.com; aliwy@mimuw.edu.pl).


Cite: Ahmed H. Aliwy, "Tokenization as Preprocessing for Arabic Tagging System," International Journal of Information and Education Technology vol. 2, no. 4, pp. 348-353, 2012.

Copyright © 2008-2018. International Journal of Information and Education Technology. All rights reserved.
E-mail: ijiet@ejournal.net