Abstract
In the Arabic world, people speak two languages fluently: classical Arabic and slang Arabic. Being able to use slang Arabic terms in Web search queries will be considered a huge leap in modern research on Arabic information retrieval. It would be more comfortable to Arab Web users to obtain the required information available on the Web using slang Arabic queries. This research paper completes the work that was started a year ago in which the ultimate goal of this ongoing project is to design and build a system that is fully capable of replacing Arabic slang-based queries with their equivalent classical terms. The work that has been done in this paper aims to simplify the task for Arab Web users via enabling them to use their slang Arabic to make Web queries. This work provides promising results and shows that people who do not know how to write classical queries can use their slang language directly.
1. Introduction
The work in this paper is part of ongoing research that started almost a year ago [1]. It is a fact that our everyday spoken language is different from the written one. Arab people face a challenge in creating expressions (i.e. querying using search engines); this prevents them from speaking their minds and writing their thoughts.
The Arabic language is divided in terms of dialects into two categories, slang and classical [2, 3]. Slang is the spoken and widespread form, followed by classical, despite the fact that Arabs use the classical language in their education and writing. Arabic documents have a strong presence on the Internet. People usually prefer to use slang terms when they search the Web because they are familiar. Also, the Web contains a large amount of slang-based documents, such as the content of the social forums. Thus, developing a system that maps the slang terms into their equivalent classical ones is necessary in order to provide good results when users search the Web using slang-based queries.
Internet users can be categorized based on their ages into three categories: children, teenagers and adults. Choosing suitable query terms is difficult for children and teenagers because their experience and knowledge of the classical language are not mature enough. For adult users, it is not expected that they will all use accurate query terms as well, since some do not like to use such terms and prefer their slang ones.
Although slang Arabic is a correct and a complete language, little attention has been paid to studying it in terms of Arabic information retrieval. For example, the male and female forms, counting forms and verb tenses exist in the slang as they do in the classical form. This paper proposes a new method of Arabic information retrieval to allow Arab Web users to query in slang Arabic.
2. Related work
2.1. Slang Arabic
Khaleefa [4] developed a rule-based algorithm to convert the Sana’ani accent to Modern Standard Arabic (MSA). This method consists of the following 10 steps:
Remove diacritics.
Remove connectors (
Remove length two and length three prefixes.
Remove length two and length three suffixes.
Replace the removed dialect suffixes with their equivalent MSA ones.
Do the same for dialect prefixes.
Extract stem after removing affixes.
Check the remaining token. If it is a dialect stem, apply stem rules and obtain the alternative MSA stem.
Rebuild the token by adding the removed MSA affixes.
Unrecognized tokens remain unchanged.
Another approach that works with slang Arabic has been developed [5]. This method aims to find the differences between MSA and Jordanian Arabic in the area of the future tense. The details of this method can be found in Abdou [3] and Elsebai et al. [5].
Bashar Al-Rashdan [6] discusses a sample of spoken Jordanian time-expressions and provides ethno-linguistic explanations of their meanings in the contexts in which they occur. More information about this work can be found in Abdou [3] and Bashar Al-Rashdan [6].
2.2. Slang English
Standard English terms and structures are different from those used in slang English. Al Kharashi and Al Sughaiyer [7] define the difference between slang (modern) and classical English language, and a non-referential definite is analysed in a slang context. It is noteworthy that slang English terms can be used in Web queries to retrieve critical information on the Web.
Kadri and Nie [8] developed a dictionary of slang drug terms to help drug counsellors and researchers obtain more accurate information on a client’s abuse histories and patterns. Further methods and techniques have been developed for slang English [9, 10].
3. Mapping slang to classical
The main objective of this paper is to find the most applicable and nearest classical term that maps the user slang terms. Using lookup tables is costly because there are many terms. Therefore, a large size data repository is needed to store them.
In general, the main difference between classical slang Arabic is in the dialect [10, 11]. Slang is not another language that differs from the classical language; the origin of many slang terms was the result of language error for the classical terms [12–16].
There are various proposed rules to convert Arabic verbs and nouns from slang to classical form [3]. This paper adds other rules to convert other types of components of Arabic slang terms (e.g. prepositions, circumstances of place and time, question tools, negation tools, pronouns and special dialectal cases).
3.1. Jar (prepositions) and Zarf (circumstances of place and time) names
Using Jar or Zarf names followed by nouns forms is known as (
The list of slang Jar names and their equivalent classical names.
The list of slang Zarf names and their equivalent classical names.
3.2. Negation tools
Negation tools are methods used to set aside doing action. There are many negation tools in both classical Arabic and slang Arabic. The presence of one of these tools alone may meet the purpose and achieve the intended goal of negation. Table 3 shows how negation tools are used in the classical and in the slang form. Sometimes, using negation tools alone is not enough. For example, if someone does not want to study, he says ‘
The list of slang negation tools and their equivalent classical ones.
The list of slang negation formats and their equivalent classical ones.
3.3. Question tools
The question tools are used to enquire about something, but the system in this content is not a Question Answering System [17–20] and it does not respond to such queries by giving answers. Instead, the system searches for the relevant topics of the query and the answer may be found in the retrieved results. Table 5 shows the list of used slang question tools and their equivalents in classical language.
The list of slang question tools and their equivalent classical ones.
3.4. Special dialectal cases
There are a widespread special dialectal cases associated with the slang Arabic language [6, 9, 21, 22]. Some of these cases and how to process slang-based terms to get their equivalent classical ones are given as follows:
Use of the Arabic letter (dhad
Use of (hamza
Use of the letter (zeen
Use of the letter (ya
Use of the letter (taa’
Use of the letters (taa’ and sheen
Use of the letter (jeem
The common pronunciation in slang Arabic of words ending with (alf and hamza
Words that indicate a particular profession or vocation in the slang Arabic language are made up of two parts. The first part is a collection of letters to indicate the work type and the second one consists of the two letters (
There are some adjectives ending with (
Special slang words that indicate a profession or a vocation.
3.5. Pronouns
These words are used to denote specific people, instead of anonymity [23]. Table 7 lists the commonly used pronouns in slang Arabic language and the equivalents of pronouns in classical language.
The list of slang pronouns and their equivalent classical forms.
4. Implementation and results
The system was built using C# language in a Microsoft Visual Studio 2008 environment. The system was tested on several paragraphs written in slang language, for example:
This must be converted to:
Our proposed converter converts it to:
Table 8 shows a sample of the statements that were used to test our converter. The percentage of correct results achieved by our converter was about 76%.
Sample of 13 statements used in testing.
5. Slang to classical converter in information retrieval
The system was tested using slang-based Arabic queries that were used to retrieve both classical and slang documents.
5.1. Testing using slang-based and classical-based queries over a classical dataset
Table 9 shows the testing results using the classical- and slang-based queries in a classical-based dataset.
Average values for precision, recall and F-measure on a classical dataset.
5.2. Testing using slang-based and classical-based queries over a slang dataset
Table 10 shows the testing results using the classical- and slang-based queries in a slang-based dataset.
Average values for precision, recall and F-measure on a slang dataset.
6. Conclusion
This research proposed rules for converting slang Arabic terms into their classical Arabic equivalents. The proposed rules were developed and tested in the field of information retrieval. The given results of the precision, recall and F-measure are very close in both cases (i.e. slang-based and classical-based queries). This is due to the proposed conversion mapping.
The results show that our proposed converter can achieve a 76% accuracy in converting terms from slang into classical Arabic. Moreover, the results show that using slang-based queries overcomes the need to use classical-based ones in some cases, but the opposite is not true. In addition, the results show that, on average, using slang-based queries is better to retrieve documents from a slang dataset than using classical queries, with 19% improvement.
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
