Abstract
Previous research has revealed that distributional information obtained from child-directed speech could be informative for children when they are learning grammatical categories. Frequent frames are distributional units proposed by Mintz and explored by researchers in many languages with different typologies. This study investigated two parent–child corpora from the CHILDES database to determine frequent frames in Persian child-directed speech. To do so, a number of frequent frames in the two corpora and more specifically those which contained complex verbs were analyzed in detail. The results indicate that the accuracy of frequent frames in Persian (0.54) with some specific typological features is lower than that of English (0.91) at the word level due to the flexibility of the basic SOV order at the sentence level in Persian. It was also found that Persian frequent frames mostly included complex verbs. This evidence, along with the results of frames in categorizing words at this level, indicates that the accuracy of the frames is also affected by the fact that the subject position of verbs is mostly left vacant in Persian as a pro-drop language. That is why the non-finite forms of the verbs were taken into account when a verb was a part of the frames. The results also revealed that grammatical categories which mostly appeared in the context of frames were verbs, while the target words were nouns and adjectives.
Introduction
The speech addressed to young children contains distributional regularities that could support their discovery of grammatical categories, which is a vital part of language acquisition (Mintz, 2003; Monaghan et al., 2005; Redington et al., 1998). Mintz (2003) proposed a distributional unit and nonadjacent dependency known as frequent frames. A frequent frame is a sequence of three elements, AxB, whereby A and B are the context of the frequent frame and x is the target word. In this frequent frame, A and B can predict the grammatical category of x if they are accurate. Mintz (2003) extracted frames from six English longitudinal corpora of child-directed speech. He analyzed only child-directed utterances and tested the frequent frames on them to see if these frames could predict any grammatical categories of target words. In his study, two quantitative variables of accuracy and completeness of the frequent frames were measured. Accuracy measures the success rate of frame categorization and completeness shows the overlap rate. The necessary condition for determining the grammatical category of frames is the non-adjacency of the first and third parts of these frequent frames. Categories derived from frequent frames are much more reliable than those derived from two adjacent words. In parallel with the results of psychological studies and computational modeling, it has been shown that neural networks in both adults and children can be used to exploit these nonadjacent dependencies under certain conditions (Freudenthal et al., 2013; Gomez, 2002; Gomez & Maye, 2005; Mintz, 2006; Monaghan & Christiansen, 2004).
From a linguistic point of view, research on frequent frames in English has shown that the recognition of grammatical categories of target words is very accurate in these frames. The high accuracy of frequent frames in English proved that this model of categorization provided useful information. The important question that arises is whether the ability to predict and categorize words and morphemes is limited only to English, or whether it works just as well in other languages. So, after Mintz (2003), many studies were carried out on frequent frames in various languages, including Chinese (Cai, 2006; Xiao et al., 2006), Dutch (Erkelens, 2008), French (Chemla et al., 2009), German (Stumper et al., 2011), and Spanish (Weisleder & Waxman, 2010).
Erkelens (2008) found that frame-based categories in Dutch were not as accurate as those in English. Dutch children used the information on frequent frames early in their language acquisition, but they show more delay in production in comparison to English children. Generally, Erkelens (2008) came to the conclusion that frequent frames could not be universally considered as a cue for grammatical categorization and they might be more accurate at a morpheme level because of free word order and complicated morphology system in Dutch.
Weisleder and Waxman (2010) investigated frequent frames in Spanish and compared them with English. They also identified another distributional environment called end-frames and considered sentence position in their analysis. They found that homophony in function words and noun dropping in Spanish influenced the accuracy of Spanish frequent frames and caused low accuracy in comparison to English. In fact, the accuracy of frequent frames was higher than end-frames and also grammatical categories of verbs had higher accuracy followed by nouns and adjectives, respectively.
Stumper et al. (2011) claimed that frequent frames in German child-directed speech were limited cues to grammatical categories. Also, they concluded that German, like Dutch, has more free word order and more complicated morphology compared to English; therefore, accuracy at the morpheme level would be high.
So far, all reviewed studies were at the word level. The use of lexical frames may not always lead to high accuracy. For this reason, frequent frames have also been studied in some other languages at the morpheme level. Wang et al. (2011) investigated the categorization accuracy of frequent frames at this level for the first time, in German and Turkish. They examined Chemla et al.’s (2009) distributional environment and bigrams that Wang and Mintz (2009) introduced and found that frequent frames at both the word and the morpheme levels had better performance. In German, frequent frame accuracy at both levels was high. Turkish is an agglutinative language for which accuracy at the word level was low. Generally, the accuracy of Turkish and German frequent frames was considered higher at the morpheme level than at the word level (Wang et al., 2011).
Finally, Moran et al. (2018) investigated frequent frames at the word and morpheme levels for seven typologically different languages: Chintang, Inuktitut, Japanese, Russian, Sesotho, Turkish, and Yucatec. The accuracy of word frames ranged from 0.44 in Russian to 0.98 in Inuktitut. In contrast, accuracy at the morpheme level was higher, ranging from 0.88 in Turkish to 0.98 in Japanese. Moran et al. (2018) concluded that, cross-linguistically, morpheme frequent frames were more accurate than word frequent frames for predicting grammatical categories. The results also showed that predicting grammatical categories in frequent frames had high accuracy for two main categories: nouns and verbs.
The presence of particular features in certain languages makes the study of frequent frames difficult and even challenges the existence of such frames. For example, in French, there are many functional words that are homophones, including clitic object pronouns and determiners; in contrast, most of the English frequent frames consist of closed-class items which are lexical words. Another factor that influences the informativeness of frequent frames is the difference in gender categorization. Also, in pro-drop languages, verb frames may be more variant than those in a non-pro-drop language such as English (Wang et al., 2011). However, Weisleder and Waxman (2010) reported that being a pro-drop language did not appear to have a substantial negative effect on frequent frame categories in Spanish.
From a linguistic point of view, research on frequent frames in English has shown that the recognition of the grammatical category of target words is very accurate in these frames. The high accuracy of frame analysis shown in the studies on English may be due to the fixed word order and relatively simple morphological structure of words in this language.
The important question we ask in the present study is whether the possibility to predict and categorize words may work just as well in Persian (the language used in Iran). We focus on analyzing Persian frequent frames to see how accurate the recognition of the grammatical category of target words is. Despite the fact that the canonical word order in Persian is (SOV), it is a non-configurational language with a free word order at the sentence level. There is rich agreement on verbs as far as the person and number of the noun in subject position are concerned, a position which can also be left vacant as Persian is categorized as a pro-drop language with the null subject. The verbs have tense markers in order to be coordinated with the subject. The verb inflections or verb person endings attach to the verb to mark agreement. Table 1 shows subject–verb agreement for all verb person endings with six possible combinations of person and number.
Verb agreement with null subjects in Persian.
Although it is an option in the languages which are pro-drop to leave the subject position vacant, Persian subject positions are never empty when the subject is emphasized or is in contrast with the subject of another sentence (Dabirmoghaddam, 2013). That is why the first person singular pronominal subject (I) is not dropped in two or three samples of our frequent frames listed in Table 4 in spite of the fact that the verb agrees with it in both number and person.
Dabirmoghaddam (2009) pointed out that Persian verbs are mostly complex and formed by two processes of word formation, namely incorporation and compounding. In the latter, which is very popular and extensive, an auxiliary verb (sometimes called light verb by Karimi (1997)) plus a noun, adjective, preposition, adverb, or past participle will form a complex verb. The light verb kærdæn (to do) is the most frequent one. As an example (baz kærdæn _open, to do) means to open.
The present study aims to analyze frequent frames at the word level in two Persian child-directed corpora. With this aim in mind, the accuracy and completeness of the frames will be analyzed. As the two existing corpora in Persian are not morphologically segmented and we needed software in Persian to tag our corpora at the morpheme level, the analysis is limited to the word level.
Materials and methods
Corpora
The analysis was carried out over the Family (2009) Persian corpus from the CHILDES database. Lilia (1;11.21–2;10.21) and Minu (4;1.12–5;2.25) are the target children. For analyzing the frequent frames, only child-directed speech was used. Lilia’s corpus consisted of 31 files and 12,794 utterances and Minu’s corpus included 103 files and 15,818 utterances. The members in contact with Lilia were her babysitter, mother, father, and brother. In Minu’s corpus, the members addressing her were her mother, father, uncle, and aunt. Before doing distributional analysis, all punctuation marks and special transcriptions of CHILDES, phonological fragments, pauses, and interjections were removed.
Although the present study is limited to word frames and aims to analyze the frequent frames at the word level, some frequent frames of verb and noun templates at the morpheme level that are common and accurate are given in the following examples for the sake of more familiarity with the status of the frames at such a level:
(1) mi-xa-m prog-want- 1sg. ‘I want’.
In example (1) ‘mi’ is a morpheme marking progressive and ‘am’ a morpheme representing the first person and singular number which both function as the context of this frame and can always predict the grammatical category of the intervening target word (xa) as verb.
(2) na-mærd-i neg-|man –indef. ‘Unmanly’
In example (2), the morphemes ‘na’ and ‘i’ representing negation and indefiniteness, respectively, both function as the context of this noun frame and can predict the grammatical category of the intervening target word (mard) which is a noun here.
Distributional analysis procedure
Mintz’s (2003) framework was applied to analyze frequent frames in the present study. In order to extract the trigrams, the following steps were taken:
A software was designed first through C# language and then the only two existing Persian corpora were taken from the CHILDES database. Only adult utterances or child-directed speech were imported into the software for analyzing the data.
All punctuation marks and additional information were deleted by the software.
Each utterance was segmented into trigrams and the number of times each frame occurred in the corpora was recorded. All frames in each corpus were ranked by their frequency and the most frequent frames were selected. Two conditions were taken into account for the selection of frames: (a) boundaries of utterances were not considered as part of frames and (b) frames did not break boundaries of utterances.
As the corpora were limited, we decided to use other word segmentation processes introduced by Weisleder and Waxman (2010). They mentioned that an utterance such as ‘Look at the doggie over there’ consists of four frames since frames do not cross an utterance boundary (p. 4). In this word grouping method, more words could be analyzed.
Although in most previous studies, 45 frequent frames had been selected, we selected 46 frequent frames in each of our two corpora as a result of the variety we observed in the intervening word of the last frame. This conforms to the second criterion proposed by Mintz (2003). Based on Mintz (2003), there are two main criteria for choosing a frame as frequent frame: (a) the main selection criterion for each group of frequent frames is to be sufficiently frequent in the whole corpus and (b) they should also occur enough to include a variety of intervening words to be categorized together. These two criteria were obtained in our selection of 46 frequent frames.
Then, each instance of a given frequent frame was located in the corpus and each target word was grouped with other target words of the frame to create a frame-based category.
The number of all word types and word tokens of each corpus was recorded. Also, the number of word types and word tokens that had been categorized was recorded. The last column indicates the ratio percentage of the target tokens to all existing tokens in the corpora, and this was calculated by our designed software. Table 2 illustrates the collected data.
Descriptive statistics of the data.
All frames were counted, and the 46 most frequent frames were selected for each corpus. The sixth and seventh columns of Table 2 indicate the number of tokens and types that occur in the 46 most frequent frames of our corpora.
We excluded all inflected verb endings and only considered non-finite forms of verbs as parts of frames. We considered the inflected verbs in the frames of the corpora as infinitive forms due to the fact that the present study is not done at the morpheme level. As verb endings indicating agreement are not considered clitics but are suffixes in Persian (Rasekh Mahand, 2010), morphological variations of the same verbs indicating person and number were not taken into account in framing because the subject of the sentence is also dropped in these cases and this, in turn, can affect the forms of the frames. Unlike verb inflectional endings which mark agreement with the subject as suffixes, clitic object pronouns are attached to verbs as substitutions for direct object. As clitics have syntactic rather than morphological functions and are different from inflectional morphemes in this respect, they have been considered as a part of the frequent frames in our word-based analysis.
Quantitative variables
In the process of quantitative analysis of frequent frames, Mintz (2003) and some other researchers (Cai, 2006; Chemla et al., 2009; Erkelens, 2008; Stumper et al., 2011; Weisleder & Waxman, 2010; Xiao et al., 2006) utilized two variables called accuracy and completeness that range in value from 0 to 1. They calculated the accuracy for measuring the success rate of categorization of frame-based categories. Accuracy is a standard metric that was used by Cartwright and Brent (1997) and Redington et al. (1998) for the first time.
Accuracy measures if the grammatical category of the target word can be correctly predicted. It is calculated by the following formula
In this formula, all pairs of target words in each frame are compared. Each pair belonging to the same grammatical category is called a hit. If two words are members of different grammatical categories, they are considered false alarms. This process was done for all word tokens and word types in the corpora used in the present study.
Completeness measures how much word tokens and types of a frame overlap in comparison to all the selected frames. Completeness is calculated by the following formula
Misses in this evaluation are pairs of words that have the same grammatical category, but are not categorized in the analysis. Completeness was calculated at the level of token and type of the corpora used in the present study.
We followed Mintz (2003) in creating the baseline used for the analysis and evaluation of the accuracy and completeness of our frame-based categories. As a baseline control against which to compare the accuracy and completeness of the frame-based categories, chance categories were created for each corpus. The content of the chance categories was determined by selecting the word tokens in the frame-based categories and randomly distributing them among the chance categories. Token and type accuracy and completeness were computed on the chance categories for each corpus to yield baseline measures. The baseline indicates the accuracy and completeness that could be achieved given the category structure that resulted from analyzing the corpus, but without considering the distributional structure of the corpus.
Results
The findings are shown in Table 3. Since Minu’s corpus (62,910 word tokens and 7191 word types) was much larger than Lilia’s (37,527 word tokens and 4130 word types), the mean token accuracy and type accuracy were higher in Minu’s corpus compared to Lilia’s. The mean token completeness and type completeness in both corpora were almost equal and are presented in Table 3.
Descriptive statistics of the means in each category.
As shown in Table 3, we achieved mean baselines for the evaluation of the accuracy and completeness of both tokens and types in the corpora. The mean token accuracy across corpora was 0.54, which is just a little higher than the baseline that was 0.51. Mean type accuracy was 0.51, and it was higher than our baseline that was 0.46.
Mean token completeness across corpora was 0.03, which is higher than the mean baseline score of 0.002. Moreover, mean type completeness for corpora was 0.04, and it was higher than the baseline which was 0.009.
Examination of the frames of each corpus reveals that the first and the last frequent frames were similar in both corpora. The first frequent frame was ptl|ro --- v|kærdæn (particle---to do) and the last frequent frame was v|kærdæn --- v|kærdæn (to do---to do). Furthermore, 19 frequent frames were also the same. These common frequent frames are listed in Table 4. The symbol| in Table 4 is used to separate the word and the part of speech it belongs to.
Common frequent frames.
There was a frequent frame in both corpora formed with the use of pronouns. In Minu’s corpus, it was formed as pro|mæn --- pro|to (I --- you), while in Lilia’s corpus it was pro|mæn --- pro|shoma (I --- you (polite)). In Persian, the pronouns shoma and to are both used for the second person singular (You), but shoma is more polite than to.
Examining the grammatical categories of the 46 frequent frames in Minu’s corpus made it clear that at least one word in the context of the frames was a verb in 39 frequent frames showing that the first, third, or both words can be grammatically categorized as verbs. Thirty-two frequent frames of the target words were grammatically categorized as nouns, while adjectives only appeared in 7 frequent frames. Of the 46 frequent frames in Lilia’s corpus, the grammatical category of the context of 35 was the verb. Target words in 27 frequent frames were grammatically categorized as nouns, and in 7 frequent frames, they were adjectives. Some samples of the first frequent frames appearing in one of our corpora are provided in Table 5. In this table, frequent frames along with their frequency and the number of tokens categorized for each type are shown in parentheses. English translations of the Persian target words in Table 5 along with their syntactic information (parts of speech) are provided in Appendix 1.
Samples of some Persian frequent frames.
The English equivalents along with their part of speech are given in Appendix 1.
In Table 4, the symbol| separates the word and the part of speech to which it belongs.
Some examples of frequent frames from Minu’s corpus are given here for the sake of more familiarity with the word-based frames listed in Table 5. Example (3) is the most frequent frame in which the particle ‘ro’ as an object marker and ‘kærdæn’ (to do) are the context of the frame. They can predict the grammatical category of the target word ‘baz’ (open) which is an adjective here.
(3) ro baz kærdæn ptl | adj | v ‘to open something’
In example (4) ‘ye’ (one) and ‘dige’ (other) form the context of this frame and predict the grammatical category of the target word ‘ruz’ (day) which is a noun here.
(4) ye ruz dige qn | n | adv ‘another day’
Discussion
The present study on Persian child-directed speech aimed to investigate whether distributional information and frequent frames can accurately derive grammatical categories of words. The results showed that, unlike Mintz (2003) who achieved very high accuracy in frequent frames of English child-directed speech (0.91), frequent frames in Persian child-directed speech do not show such high scores. The average accuracy in Minu’s corpus was 0.58 at the token level and 0.55 at the type level. In Lilia’s corpus, the average accuracy was 0.51 at the token level while it was 0.48 at the type level. Generally, the results and comparison with the baseline show that mean accuracy of the Persian frequent frame is 0.54, which is only slightly higher than the baseline (which is 0.51).
Table 6, in which the Persian data derived from the present study are also inserted, provides information on the completeness and accuracy of frequent frames at morpheme and word levels in different languages reported in Moran et al. (2018). Similar to our study, some studies shown in Table 6 have been carried out at and limited to word levels.
Presentation of mean accuracy and completeness of different languages including Persian.
Cross-linguistic accuracy of frequent frames in several languages depends on many properties such as the corpus size, that is, in larger corpora, the accuracy and completeness of frequent frames would be higher. For example, Stumper et al. (2011) investigated the frequent frames on a small German corpus, and the accuracy was reported lower compared to the results published by Wang et al. (2011) using a larger corpus. Due to the fact that there are no Persian POS (parts of speech)-tagged corpora, the data used in this research were limited, which might explain the lower accuracy observed. Compared to the study published by Mintz (2003), in which the completeness of English frequent frames was 0.12 at the level of the token, and 0.10 at the level of the type, in Persian, the results at the token and type levels were 0.03 and 0.04, respectively.
In line with the findings of the present study, Moran et al. (2018) found that languages with a rich morphology have more word forms, and as a result, accuracy is lower at the word level. Persian has free word order (Davari & Naghzguy-Kohan, 2019) and is also rich in terms of morphology (Ghatreh, 2007). The low degree of accuracy for the Persian word-based frames compared to English (Mintz, 2003) might be the result of morphological properties of these two languages. English in some limited cases (like do, doing, does, done) and Persian both mark inflection, but inflection in Persian is much richer than that of English specially when applied to the verbs to show agreement with the subject in person and number.
Consistent with the results of the present research, Stumper et al. (2011) achieved a lower accuracy (around 0.11 and 0.07) in German compared to English and French. This is due to a richer morphology, lower corpus volume, and the free word order existing in German. On the other hand, Wang et al. (2011) examined the accuracy of frequent frames in both German and Turkish and reported an accuracy of 0.86 at the word level in German. They believed that this was due to the size of the corpus, with the corpus under their analysis being larger than the corpus used by Stumper et al. (2011).
Furthermore, Erkelens (2008) concluded that the reason why a lower level of accuracy is observed in Dutch is that the word order is freer and the morphology is richer in Dutch compared to English. The accuracy reported at the token level and type level was 0.56 and 0.40, respectively. However, in French, Chemla et al. (2009) achieved an accuracy of 1 and a more consistent word order in comparison to Dutch. Weisleder and Waxman (2010) compared the accuracy of frames in Spanish and English and showed an accuracy of 0.75 at the token level and 0.76 at the type level. The results of our study in Persian are consistent with those published by Stumper et al. (2011), Erkelens (2008), and Chemla et al. (2009) and also showed that the typological features affect the level of accuracy in frequent frames.
As was seen, 67% of extracted frames forming the context in the present study were verbs and more specifically complex verbs. Dabirmoghaddam (2009) pointed out that many Persian complex verbs are made by compounding in which an auxiliary verb called light verb by Karimi (1997) is joined to a noun, adjective, preposition, adverb, or past participle. The auxiliary verb kærdæn (to do) was the most frequent verb in the corpora and as grammatical categories of target words were almost nouns and adjectives, thus, we can conclude that most Persian frequent frames consist of these complex verbs.
A fascinating result obtained from the analysis of both corpora in the present study is the fact that despite differences in their size, the first and last frequent frames are the same in both of them. Obviously, corpus size has a direct effect on the frequency of frequent frames, and the more the data included, the more generalizable the results will be.
There are some suggestions for further studies. We analyzed frequent frames at the word level in Persian. It seems that studying frequent frames at morpheme level in this language will lead to much more interesting results. Due to the nature of Persian, which is a morphologically rich language and has a free word order, the study of frequent frames at the morpheme level is suggested to see if they are more accurate than those studied at the word level in the present research. Developing and using large volume of Persian corpora will be required for this purpose. A frequent frames analysis could then operate on stems and affixes, rather than open- and closed-class words. That level of analysis would likely result in much more stable patterns than would be available at the level of words. Adding a language like Persian, which is typologically different from the languages that have been typically studied, can be an important contribution and adds interesting and probably different findings to the existing research.
Footnotes
Appendix
English translation of the Persian target words included in the frequent frames of Table 5.
| Target word | Translation | Target word | Translation | Target word | Translation |
|---|---|---|---|---|---|
| Abad (Adj) | rich | Hædse (N) | the guess | Qanun (N) | rule |
| Abbazi (N) | playing with water | Hæfte (N) | week | Qashoq (N) | spoon |
| Abrængi (N) | watercolor | Hæmam (N) | bath | Qorqor (N) | murmuring |
| Adæm (N) | the human | Hæmle (N) | attack | Ræha (Adj) | free |
| ængur (N) | grape | Hærf (N) | talk | Rænde (N) | grater |
| ærusæka (N) | dolls | Hærfa (N) | talkings | Ræng (N) | color |
| ærusæke (N) | the doll | Hæva (N) | air | Rængeshun (N) | their color |
| ærusi (N) | wedding | Hæyat (N) | courtyard | Ræxta (N) | clothings |
| ærz (N) | speech | Inja (Adv) | here | Rahnæmayi (N) | guide |
| ævæz (N) | substitute | Injuri (Adv) | like this | Rahnæmayi (N) | guidance |
| Ahænge (N) | the music | Ja (N) | place | Residegi (N) | investigation |
| Amade (Adj) | ready | Jadu (N) | magic | Ro (Ptl) | particle for object marker |
| Aqa (N) | sir | Jælb (Adj) | attractive | Roshæn (Adj) | bright |
| Aqaye (N) | the sir | Jæm (N) | pack | Roshænesh (Adj) | bright-3rd person |
| Arezu (N) | wish | Jaru (N) | sweeper | Rubahe (N) | the fox |
| Avizun (Adj) | hanging | Jaye (N) | the place | Ruz (N) | day |
| Azad (Adj) | free | Joda (Adj) | apart | Ruze (N) | the day |
| Bæhanegiri (N) | fussiness | Jure (Adj) | alike | Sæfær (N) | travel |
| Bæqæl (N) | arms | Kæm (Adj) | few | Sære (N) | the head |
| Bæqælesh (N) | her arms | Kæsif (Adj) | dirty | Særfejui (N) | saving |
| Bæqælet (N) | your arms | Kæsifesh (Adj) | dirty-3rd person | Saxtemune (N) | the building |
| Bar (N) | turn | Kæsifeshun (Adj) | dirty-3rd sing-pl | Seda (N) | voice |
| Bare (N) | the turn | Kaqæze (N) | the paper | Sedash (N) | its voice |
| Bavær (N) | belief | Kar (N) | work | Shæb (N) | night |
| Bayæd (V) | should | Kara (N) | works | Shæbe (N) | the night |
| Baz (Adv) | again | Karesh (N) | her/his work | Shæhre (N) | the city |
| Baz (Adj) | open | Kareshun (N) | their work | Sheitunia (N) | mischief |
| Bazesh (Adj) | open-3rd person | Kari (N) | a work | Shena (N) | swimming |
| Bazi (N) | game | Kartona (N) | packages | Shokolate (N) | chocolate |
| Baziyi (N) | a game | Ke (Conj) | that | Shol (Adj) | weak |
| Birun (Adv) | outside | Kelid (N) | key | Shoru (N) | start |
| Bolænd (Adj) | high | Kesi (N) | somebody | Sibe (N) | the apple |
| Bus (N) | kiss | Ketaba (N) | books | Sir (N) | full |
| Chiz (N) | something | Komæk (N) | help | Soal (N) | question |
| Chiza (N) | somethings | Konim (V) | we do | Soale (N) | the question |
| Chizaye (N) | the somethings | Kuchik (Adj) | small | Sohbæt (N) | talk |
| Chize (N) | the thing | Lebas (N) | clothing | Sos (N) | sauce |
| Dær (N) | door | Lebasat (N) | your clothing | Suræti (Adj) | pink |
| Dærd (N) | pain | Loqmæro (N) | bite-object marker | Surax (N) | hole |
| Dæst (N) | hand | Loqme (N) | bite | Tæhæmol (N) | tolerance |
| Dæste (N) | the hand | Lotf (N) | favor | Tæhdid (N) | threat |
| Dæstgir (N) | arrested | Lotfe (N) | the favor | Tæmasha (N) | look |
| Dæstmali (N) | handling | Mæqalæm (N) | my article | Tæmir (N) | repair |
| Dæva (N) | struggle | Mæqazeye (N) | the store | Tæmiz (Adj) | clean |
| Dævæt (N) | invitation | Mæsræf (N) | consumption | Tæmizesh (Adj) | clean-3rd person |
| Damæn (N) | skirt | Mishe (V) | possible | Tæmrin (N) | practice |
| Daqun (Adj) | shattered | Mishod (V) | possible-past | Tæmum (N) | finish |
| Daræn (V) | they have | Mitunim (V) | we can | Tærrahi (N) | designing |
| Dari (V) | you have | Mixai (V) | do you want | Tærif (N) | definition |
| Dasht (V) | he had | Mixast (V) | he/she wanted | Tærk (N) | leave |
| Dashte (V) | he has had | Moærefi (N) | introduction | Tæshækor (N) | thanks |
| Dastan (N) | story | Mohasebe (N) | calculate | Tæzahorat (N) | administration |
| Dastanesh (N) | her story | Mohreha (N) | marbles | Taip (N) | type |
| Deraz (Adj) | long | Momkene (V) | maybe | Tas (N) | dice |
| Dexalæt (N) | interference | Moqayesæsh (N) | its comparison | Tekrar (N) | repetition |
| Dorost (Adj) | true | Morætæb (Adj) | neat | Tikæsh (N) | its piece |
| Dota (Qn) | two | Moraqebæt (N) | care | Tupe (N) | the ball |
| Dune (N) | unit | Mozakere (N) | discussion | Væsayelet (N) | your equipment |
| Edare (N) | office | Næbayæd (V) | should not | Væsl (N) | joined |
| Ejra (N) | performance | Næfære (N) | the person | Vaz (Adj) | open |
| Elan (N) | announce | Næmayeshe (N) | show | Vel (Adj) | free |
| Emtehan (N) | test | Næmayeshnamæsh (N) | His/her show | Velesh (Adj) | free-3rd person |
| Entexab (N) | choice | Næqashi (N) | painting | Xærab (Adj) | ruined |
| Esemeskari (N) | sending messages | Narahæt (Adj) | sad | Xærid (N) | shopping |
| Esme (N) | the name | Naz (Adj) | cute | Xæta (N) | error |
| Estefade (N) | use | Nega (N) | look | Xali (Adj) | empty |
| Esterahat (N) | rest | O (Ptl) | particle for object marker | Xamush (N) | off |
| Ezdevaj (N) | marriage | Pæhn (Adj) | broad | Xamush (Adj) | off |
| Færar (N) | escape | Pærvaz (N) | flight | Xodæm (Pro) | myself |
| Fekr (N) | thought | Pak (Adj) | clean | Xodafezi (N) | goodbye |
| Fin (N) | sniff | Pakesh (Adj) | clear-3rd person | Xoshk (Adj) | dry |
| Foru (Adj) | down | Pat (N) | your foot | Yævash (Adv) | slow |
| Fut (N) | blow | Peida (Adj) | find | Yeki (Qn) | one |
| Gelimali (Adj) | muddy | Peidash (Adj) | find-3rd person | Yekish (Qn) | the one |
| Gir (N) | trap | Pishnæhad (N) | offer | Zæbt (N) | record |
| Gom (Adj) | lost | Qæ t (N) | disconnection | Zendegi (N) | life |
| Gush (N) | ear | Qæzaye (N) | the food |
Author contributions
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
