Abstract—We have developed a framework to match records from different data sources that refer to the same entities, using proper names as the matching keys. There are two challenges to overcome. Firstly, there may be typographical errors. Secondly, some data sources may store the data in Thai characters while some store them in English characters. Thus, Thai-version keys are romanized and compared with English-version keys, using string comparators and rule-based decision function. We report our experimental results and problems encountered, as well as suggest future research directions.
Index Terms—Entity names, record matching, romanization, Thai characters.
R. Marukatat is with the Department of Computer Engineering, Faculty of Engineering, Mahidol University, Thailand. (phone: +662-889-2138 ext 6251-2; fax: +662-889-2138 ext 6259; e-mail: firstname.lastname@example.org).
Cite: Rangsipan Marukatat, "Matching Entities by Their Thai and English Proper Names," International Journal of Information and Education Technology vol. 1, no. 5, pp. 384-388, 2011.