Abstract
Part–whole relations are pervasive throughout domain ontologies and enjoy interest also in, inter alia, NLP and manufacturing, and by philosophers in the scope of mereology. There exist a stable list of part–whole relations that are assumed to be common, yet for isiZulu, among other languages, there were at least linguistic differences. This raises the question whether there are ontological differences, which would imply that the ‘common’ list is not universal across languages and cultures. We investigated this for 18 part–whole terms in the Zulu language that we selected from an initial list of 81 terms collected. They were formalised and aligned to the well-known part–whole relations, and checked against a corpus. While there is a term for general parthood in Zulu, the main difference observed concerns relation proliferation due to very specific relata that are entities typically represented only in domain ontologies. This poses new questions for ontology engineering on how to manage the plurality of relations and for philosophy to possibly extend mereology.
Introduction
Part–whole relations are a recurring topic in ontology and computer science, thanks to both the many options for constructing mereological theories and their use in a multitude of information systems, ranging from the data management of manufacturing processes of devices, to food processing, to document summarisation, among many. Also in Africa digitisation happens in many spheres, which thus also requires the use of part–whole relations in information systems. On top of that, those IT solutions may have to be delivered in African languages, which are in a different language family from English, and thus some sort of ‘localisation’ of part–whole relations is needed. A typical example for part–whole relations for health informatics is the localisation of SNOMED CT (2019) into a regional language, which then can be integrated with a localised version of an electronic health record system, notably OpenMRS (2018), and be used to generate automatically patient discharge notes in a local language (e.g. (Byamugisha et al., 2017)) so as to improve compliance with the medical treatments by reducing the language barrier (Hussey, 2012/2013). To be able to realise that, one first has to find suitable translations or transliterations for the abundant part–whole relations, be it for the medical domain (e.g., (Rosse and Mejino Jr, 2003; SNOMED CT, 2019; Rogers and Rector, 2000)) or others, such as recording building architecture in the vernacular (Frescura and Myeza, 2016) and anatomy of crops and animals for agriculture. Such subject domain part–whole relations go beyond the
The software localisation approach of trying to translate names of part–whole relations into a local language – isiZulu in our case – showed that there were no 1:1 mappings at the language level at least (Keet and Khumalo, 2016; Keet, 2017b), and probably not ontologically either. An example is the
Conversely, having observed candidate ontological differences, the hypothesis arises that not only refinements will be encountered but also that they may be useful for ontology engineering by providing finer-grained distinctions of part–whole relations and new problems to investigate. Therefore, we seek to answer the following main research questions in this paper, which are required to be answered before, among others, localising SNOMED CT and devising other domain ontologies for ontology-driven information systems in South Africa at least:
Which named part–whole relations exist in isiZulu, and are they not only lexically but also semantically distinct from the commonly listed part–whole relations?
Which part–whole relations can be mapped with equivalence relations to the ‘common’ part–whole relations, and which one(s) are more or less precise?
For those that have no equivalent with a ‘common’ part–whole relations, what is (are) the underlying reason(s) for differentiation, if any?
Are there any characteristics brought into the analysis of part–whole relations in Zulu language and culture that may be useful for the ontological analysis of part–whole relations in general?
To answer these questions, we first devised a procedure that takes a combined approach of evidence collection and theoretical analysis, which can be used for any natural language. First, we harvested common isiZulu terms for ‘part’ and similar terms from the dictionary. The resulting 81 terms were analysed in detail and reduced over several iterations of assessment. The eventual selection of 18 terms/relations were then formally characterised and aligned with subsumption or equivalence to the common part–whole relations. The main results of the comparison are that there are only two equivalence alignments, 13 subsumption alignments, and 1 distinct one, yet certain distinctions are not made, resulting in two hypernyms. The refinements are largely due to more constrained domain and range axioms that take either different classes (e.g., the mouth) or different categories of elements (e.g., collectives only). The relations were then queried against the isiZulu National Corpus (INC; Khumalo, 2015) to examine their use, neither of which violated the respective formal characterisations. Thus, while the general notion of parthood still seems universal, there are differences as to which ones are perceived to be the ‘common’ part–whole relations. This is, to the best of our knowledge, the first ontological and systematic investigation into the question whether there are part–whole relations with particular terms in languages other than English and the countries where it is spoken predominantly. An earlier version of this claim was reported in our FOIS 2018 paper (Keet and Khumalo, 2018), which is extended with novel scientific material here, in particular regarding: i) a generalisation of the experimental process; and ii) four new terms have been analysed and added (umqobelo, isitho, and ilungu/ilunga), isithako was refined, and extended descriptions have been added.
The remainder of this paper is structured as follows. We first consider related work in Section 2. The procedure of the analysis is described in Section 3. The results are described in Section 4, including term harvesting, analysis, formalisation, and evaluation against the INC. We discuss in Section 5 and describe practical examples of some of the consequences of the results. We close in Section 6.
Related work
There exists ample literature on mereological theories with debate about inclusion of various axioms, such as antisymmetry and strong vs. weak supplementation (Varzi, 2004). For the current scope, we take them as given, and instead zoom in on the ‘multitude’ dimension of part–whole relations. That is, multiple part–whole relations have been proposed in the literature and used in ontologies, conceptual models, linguistics, and NLP (among many: Galton (2018); Guizzardi (2005); Keet and Artale (2008); Motschnig-Pitrik and Kaasboll (1999); Tandon et al. (2016); Vieu and Aurnague (2005); Winston et al. (1987)), which are also declared not only in domain ontologies (e.g., openGalen has 23 part–whole relations (Rogers and Rector, 2000)) but also, to some extent, in foundational ontologies (Keet, 2017a). This shows that modellers are convinced of their need. This ‘multitude’ approach has resulted in a stable list of common part–whole relations that are used for modelling and other tasks such as natural language processing (NLP). There are further refinements, such as mereotopology (Keet and Kutz, 2017; Varzi, 2007), material parthood (Galton, 2018), constitution (Hahmann and Brodaric, 2014), portions (Donnelly and Bittner, 2009; Keet, 2016), and essential and immutable parthood (Artale et al., 2008; Guizzardi, 2007). Those refinements have not yet had substantial uptake in ontologies for information systems, which is at least partially due to the required computationally expensive formal apparatus, and therefore are not included here.
Based on an extensive analysis of related work spanning computing (artificial intelligence, conceptual modelling, and software engineering), cognitive science, ontology, and language, the common part–whole relations have been structured in a hierarchy and formally characterised (Keet and Artale, 2008). Its hierarchy is summarised and shown informally in Fig. 1. The “part–whole relation” at its root is added for indicative intuitive structuring purposes only and is not intended to be used when modelling. The hierarchy divides first between parthood sensu mereology and part–whole relations in natural language utterances only (meronymy) (Keet and Artale, 2008). Here, mereology refers to the usual primitive

Part–whole relations taxonomy with indicative domain and range descriptions (extended from Keet and Artale (2008)).
The second main distinction rests on the notion that constraining a relation’s domain and range means it is a more precise representation of its intended meaning, where applicable (Keet and Artale, 2008; Poveda-Villalón et al., 2012; Vieu and Aurnague, 2005), which for engineering purposes need to have a different name to avoid modelling mistakes and undesirable deductions. For instance,
When we take a closer look at the ontology and computing literature on part–whole relations that explicitly take into account languages other than English, there are a few papers with implicit hints at possible variants for mereology/meronymy, although indirect and sparse. These papers focus on NLP-specific relation extraction from text documents in Arabic, Chinese, and Turkish, and confine themselves to the aforementioned typical set of part–whole relations or a subset thereof (Al Zamil and Al-Radaideh, 2014; Cao et al., 2008; Yıldız et al., 2016), rather than an ontological analysis of the relations. For instance, Cao et al. (2008) state that they refined the constitution relation with an Element–Object relation – e.g., calcium as part of milk – where the element is an atomic element, with as sole reason “for convenient verification”. They do not offer a reason why it may hold only linguistically or would also be semantically distinguishable from the other common part–whole relations, and why, nor why the existing
Finally, in earlier work, we also had started from the typical set of part–whole relations and observed several commonalities as well as differences for isiZulu (Keet and Khumalo, 2016): there are refinements in some cases and the lack thereof for others, such as distinguishing participation for objects vs. collectives, which is discussed briefly by Keet (2017b).
Let us now turn to relevant literature from linguistics. Parts and part–whole relations in natural language have been investigated for several under-resourced spoken languages, including African languages (Chappell and McGregor, 1996). Hayman (1996) considered Haya, a Niger-Congo B (Bantu) language spoken in Tanzania, but he focussed on possessor deletion and possessor promotion in the sentence rather than any part–whole relation. The linguistic realisation of describing body parts in Ewe (spoken in Ghana) refers only implicitly to part–whole relations, such as ‘the cover of the book’ agbalẽa ϕe akpa (Ameka, 1996), instead of the explicit use of parthood in a sentence such as ‘the cover is part of the book’. To the best of our knowledge, there is no inventarisation of part–whole relations in any of the Sub-Saharan African indigenous languages, other than the informal analysis of the common relations (as in Fig. 1) by Keet and Khumalo (2016); Keet (2017b), and certainly not an ontological analysis and logic-based characterisation thereof.
The paucity of ontological analyses of part–whole relations on interaction between language and ontology for languages other than English leaves unclear whether new insights may be obtained upon analysis. This may even be the case for languages that are relatively similar to English compared to African languages. For instance, there are at least seven entries in WordReference for ‘part’ in Spanish1
We specified a procedure upfront, principally in order to reduce the possibility of bias, subjectivity, and generally shoe-horning isiZulu terms and conceptualisations into those reported in the literature. Secondly, it is expected to have the added benefit that the study may be replicated or reused for another language and culture. To this end, the procedure uses the variable Create a Write down the term and a description of the meaning of the term; Determine whether it is a part–whole relation (at least broadly construed); if not, add it to the ‘discarded’ list with a reason for exclusion; Check the English entries under the identified candidates (e.g., ingxenye (for isiZulu) and similar terms) and revise the preliminary list, if applicable. Categorise the candidate part–whole relations obtained in Step 1b by their similar informal meanings. Refine the descriptions, if/where necessary, based on that categorisation and remove any term that is not a candidate part–whole relation on closer inspection. For each part–whole relation, create a formal definition where possible, or else at least a logic-based characterisation of the main known characteristics. Relate each formally specified part–whole relation to one described in ontology literature, where possible. For each of the relations where this fails, determine the reason(s) why and seek to identify an underlying pattern, if any. Examine the relation’s use by querying a corpus for
The first step essentially is a ‘term and meaning harvesting’ stage that has as aim to make an inventory of the relevant terms by consulting authoritative resources, notably dictionaries, by means of a manual query expansion. In absence of a
There are multiple reasons why there is a test against a corpus as last step of the procedure. Dictionaries are limited due to, among others, page limitations and they lag behind in how words are used in modern speech, which is especially the case for an underresourced language, or: there are cases of concept drift, which is difficult to cater for at the harvesting stage but may be detected with a corpus analysis. Also, linguists tend to be ‘purists’ in language use relative to non-linguists, which may affect term definitions vs. usage in text by non-linguist authors. Further, decisions on refinement are made in the process, and the final formal characterisation may be too narrow or too broad after all. That said, for isiZulu specifically, Step 6 is limited to concordance search only as the only feasible option (theoretically and technologically) at present.
The procedure proposed is thus neither fully top-down nor solely bottom-up, but being informed by both, therewith bearing some similarity to a middle-out approach in ontology development that was first proposed by Uschold and King (1995) and expanded upon in different ways in several ontology development methodologies and tools and techniques proposed since. The middle-out approach itself can be constructed as an iterative waterfall process, as visualised in Fig. 2. The iterations uphill are added, for it may be the case that 1) upon further analysis, some term(s) have to be removed after all, 2) in formalising a relation, one realises that what initially appeared detailed enough, was not, and demands more analysis, and 3) seeing formal characterisations from related work may motivate changing the axioms when they are deemed ontologically equivalent, in order to improve the alignment.

Visualisation of the middle-out approach as an iterative waterfall process.
The language resources for this middle-out method for the current task are, principally, the Scholar’s Zulu Dictionary (Dent and Nyembezi, 2009), assisted occasionally in the first round by an old dictionary (Doke et al., 1958) to verify older and outmoded meanings and by isizulu.net to additionally cross-check translations in case of doubt. The step-wise reduction and term analyses were documented in a spreadsheet in successive sheets to foster traceability of motivations and decisions. The isiZulu National Corpus (INC) was used for Step 6. The INC is a living corpus of about 31 million tokens (Khumalo, 2015) that is stored in Wordsmith Tools. The main part of the corpus consists of isiZulu novels (9.6 million tokens) and news items from the Isolezwe, UmAfrika, Ilanga, and Izindaba zabantu newspapers (19.8 million tokens). The section for detailed analysis consists of 36 novels by female authors that is also used for another (ongoing) experiment and was therefore already loaded in Wordsmith. Due to certain technological limitations of the infrastructure, it was not feasible to change that at the time and those challenges still have not been overcome. The analysis was carried out by the authors, who have complementary expertise: LK is a specialist in isiZulu linguistics with some knowledge of ontologies, while CMK is an ontologist with some knowledge of isiZulu.
We describe the results obtained from Steps 1–3 of the procedure in Section 4.1. Eighteen terms were selected for a more detailed analysis to devise a formal characterisation (Section 4.2), where possible, therewith describing the outcomes Steps 4–5. Subsequently, they are assessed against the INC (Step 6) in Section 4.3. The data and successive analyses stages are available as supplementary material accessible at
Harvesting and reduction of number of terms
The dictionary entries that were considered were taken from both sections of the bilingual dictionary. For instance, the entry for ‘part’ in the English→isiZulu section lists isinqamu (n.) as one of the isiZulu terms, so then it was looked up in the isiZulu→English section to check the back-translation, taking into account isiZulu morphology and consequent dictionary organisation. For instance, with afore-mentioned example isinqamu: its stem -nqamu was looked up and then of all the -nqamu entries, the entry with the applicable prefix (isi-, in this case) was selected. If there was no entry with an applicable prefix, then nothing was added to the list. Overall in aggregate:
English→isiZulu: ‘part’ has 18 entries in isiZulu; ‘portion’ has 11 isiZulu terms; ‘quantity’ has 8 isiZulu terms; ‘piece’ lists 19 isiZulu terms, ‘pinch’ lists 6; ‘contain’ and ‘component’ each lists 4 isiZulu terms. isiZulu→English: the principal part–whole relation ingxenye, as well as, among others, umncunzo, isigamu, and other terms that were harvested when carrying out the previous step. Terms that have to do with creating parts, rather than a part–whole relation: among others, -aba ‘share’, -ahlukanisa ‘separate’, and -vithiza ‘break to pieces’; Terms that refer to standalone entities or size of quantities, portions, and pieces, rather than subquantities of something else: among many, ubungako is a quantity in the sense of hugeness and ubuningi refers not to some physical quantity or large quantities but to ‘abundance’ due to the ubu- prefix for so-called ‘abstract concepts’ that reside in noun class 14. Terms that are artefacts of English compound nouns or idioms, which are linguistically related in English but not ontologically: among others, ‘piece of paper’ (ipheshana) is listed under ‘piece’ and ‘I for my part’ (mina ngokwami) is listed under ‘part’; Terms that are clearly wrong or only very distantly related; among others, isibhamu ‘firearm’ in the ‘piece’ entry, lamula in the ‘part’ entry (meaning ‘pacify’ or ‘mediate’), and ifa ‘inheritance’ that is listed under ‘portion’ for it assumes several people each will receive a portion of what the deceased left behind.
The resultant list consists of 81 unique isiZulu terms. They were annotated with a description and a tentative status of probably referring to a part–whole relation or to something else when it was immediately obvious. 41 terms were put on the ‘discarded’ list already. The discarded terms can be divided roughly into four categories due to the reason of elimination of the terms:
This is not to say the terms that belong to the first and second category would not be interesting to investigate, but they are beyond the current scope of part–whole relations.
The remaining 40 terms were annotated with an indicative scope of the type of part–whole relation, such as whether the term refers to relating stuffs or relating regions, whether it concerns how the part comes about after all, whether there is a temporal aspect to it, and their part-of-speech category (noun or verb) and noun class (nc) if it is a noun because they have some indications of ontology,3
IsiZulu has 17 noun classes, such as noun classes for nouns that refer to humans (nc1), long thin objects (nc11), or abstract concepts (nc14); e.g., umuntu ‘human’ (nc1) and ubuntu ‘humanity’ (nc14).
The last round of selection was guided by two considerations, selecting:
terms that are deemed important and expected to return many instances in the INC, such as ingxenye, and terms that, at first impression at least, seem perhaps too specific, such as iqatha that seems to apply to portions of meat only and -mumatha that seems to apply only to objects properly contained in the mouth.
This reduced the list to 18 terms that are used for part–whole relations. We structured the informal characterisation of this final selection of the terms into a tentative taxonomy of part–whole relations to visualise the selection and facilitate further analysis. This is depicted in Fig. 3 and they will be formalised and assessed against the INC.

Tentative and partial taxonomy of linguistically-motivated part–whole relations. The isiZulu terms are denoted in bold italics, and informal keywords are added as shortcuts to indicate domain and range; LOC+LOC and SC+CONJ: the surface realisation has no single term for them, because it is constructed on-the-fly depending on the noun class of the noun that participates in the axiom (for SC) or the noun’s orthography and phonological conditioning (CONJ and LOC).
The main aim of this investigation is to determine commonalities and differences in part–whole relations between extant literature and isiZulu and the Zulu culture. Therefore, we want to rely as much as possible on existing formalisations and theories, which therewith then facilitate comparison and alignment. To this end, we first describe the minimal necessary preliminaries of parthood and relevant part–whole relations in Section 4.2.1 to keep the paper sufficiently self-contained and then proceed to the formal characterisation of the putative part–whole relations in Section 4.2.2.
Preliminaries
Most terms – putative relations – require constraints on their domain or range (relata), such as
The putative relations shown in Fig. 3 suggest that any full formalisation will require second-order logic, because the stuff-parts and portions need them to assert that the stuffs involved are either different kinds of stuff (
We present relevant definitions and axioms from related works that the formalisation for part–whole relations in isiZulu require directly. For mereological
Finally, in order to distinguish the non-transitive part–whole relation from parthood, we use a placeholder name/relation for the purpose of structuring the relations (hence, it is not intended to be used for modelling), called
Formal characterisation
We proceed to the formalisation of the putative part–whole relations, down and from left to right in the hierarchy, following Fig. 3. Each term/relation is discussed in a separate paragraph in the remainder of this section. Each paragraph first contains a description with examples and considerations for formalisation, which provide a summary of the analysis and documentation for the justification of the way it was formalised, and subsequently the axioms and any alignments are described. For linguistic reference, it also lists the part-of-speech – “n.” for noun and “v.” for verb – and the noun class “nc” if it is a noun, because that determines the possessive concord (PC) that realises the preposition ‘of’ in ‘part of’ in a sentence in isiZulu.
Ingxenye (n.; nc9) is the generic ‘catch all’ part that can be used both for mereological
The formalism can be illustrated as follows in natural language use in isiZulu, with “SC” the subject concord (≈conjugation) and “PC” the possessive concord for preposition ‘of’ in ‘part of’:
Isitho (n.; nc7) is used to denote mereological parthood not only for physical objects, but more specifically for identifiable, whole, body parts. That is, not, say, the ‘left side of the body’, but objects with some identity and unity, such as eyes and arms. Isitho is thus a subtype of ingxenye and also a subtype of the part–whole taxonomy’s s-
Its usage in a sentences follows the same pattern as with ingxenye; e.g.,:
Umunxa (n.; nc3) is used for a contiguous ‘portion’ of a meaningful area where some object(s) reside, such as the portion of the kitchen where the kitchen utensils are, the portion where the operating theatre is in a hospital, and the area where the fireplace is in the hut; e.g., iziko lingumunxa wexhiba ‘the fireplace is a portion of the hut’ (iziko ‘fireplace’, ixhiba ‘hut’, and lingumunxa as part (i.e., li-ngu-munxa: SC-is-part).
This contradicts the ontological notion of

Visualisation of the situation with umunxa.
LOC+LOC/‘containment’. The
While ontologically clear, formalising it with the current first or second order logic or even plain OWL necessarily has to be somewhat ad hoc and convention-based, because the logic expects either a fixed string for each vocabulary element or an identifier with an immutable label, which does not work in this instance. As engineering solution, we proposed labelling the relation with an arbitrary sequence of letters (in casu,
Fumbatha and Mumatha (v.) also denote
Isigaba (n.; nc7) is used for part–whole relations between geographical entities, such as provinces and districts, and thus is equivalent ontologically to the
Isithako (n.; nc7) is used for subquantities in the sense of an ingredient that is an input in making another stuff, such as food, medicine, and paint, which implies that the whole stuff is a mixture and thus
Umqobelo (n.; nc3) refers to portions or pieces and is a reified form of the root -qob- ‘cut into small pieces’ that is extended with the applicative -el and terminated with a noun derivative -o. Mbatha (2006) contains the entry umqobelo, defining it as a mixture of food cut into small pieces. This includes the traditional tripe foodstuffs such as umgxabhiso (made up of cut up tripe and intestines) and foodstuffs like salads that are made up of cut tomatoes, carrots, lettuce, etc. It would be used in a sentence construction alike:
A core difference between umqobelo and the aforementioned isithako is that umqobelo is used explicitly only for pieces in heterogeneous stuffs, from which follows a basic characterisation of:
Isiqephu (n.; nc7) is used for a specific type of portion (Eq. 5), such that the kind of stuff that the portion is made of is solid or solid-like stuff. A straightforward example is that each slice (ucezu) of bread (isinkwa) is a portion of some bread: Zonke izicezu zesinkwa ziyisiqephu sesinkwa esisodwa. Intuitively, this is ontologically clear. When analysing examples, however, it appeared that the notion of ‘solid’ is ambiguous – or at least ontologically not obvious – hence, the addition of ‘solid-like’ stuff. For instance, blood is solid(-like) enough to be used with isiqephu, as in ‘a sample of blood is a portion of the blood’ of the patient it is taken from (in isiZulu: Onke amasampula egazi ayisiqephu segazi elilodwa) and it can be used with the lick of an ice cream as a portion of the ice cream as well (Keet and Khumalo, 2016). Yet, blood is a viscous liquid in its natural state and becomes solid due to coagulation only once it is in contact with the air, and, conversely, the lick of the ice cream arguably may have melted into a liquid state when licked.
We have not been able to ascertain what exactly the ontological status of ‘solid’ is with respect to the language use for isiqephu and how that corresponds with physics of an object’s state. The theoretical analysis is augmented with a query against the corpus (see below) in the hope to elucidate its usage and we return to this in the next section. Either way, this portion for solid(-like) entities may or may not be used to refer to a scattered portion and the stuff may or may not be a mixture. The minimum that can be said is that it denotes a sub-property of
Isichibi (n.; nc7) and iqatha (n.; nc5) are straightforward refinements of scattered portion. Isichibi is restricted to
Ilunga (n.; nc5) and its linguistic variant ilungu (n.; nc5) have similar meanings according to (Mbatha, 2006; Dent and Nyembezi, 2009) and have multiple senses, in particular as i) member of a human family, community, institution or organisation; ii) part of a plant, such as sugar cane and maize cone; and iii) part of a specific body part that has joints (e.g., arm or leg). Two examples of their senses are as follows:
It may appear that ilunga and ilungu are a variant of some generic, not necessarily mereological, part–whole relation. Examining this further by also consulting an experienced translator and interpreter working as a senior Language Practitioner at UKZN (Mr. Manyoni), he observed that he actually uses the terms in two specific ways with two independent meanings: ilungu refers to member of [human family, community, institution or organisation] whereas ilunga refers to part of [arm, sugar cane, or maize cone]. In addition, the ‘member of’ sense is, to the best of the experts’ knowledge, used only for humans and the collectives they are member of, not, say, a ship being member of a fleet or a sheep being member of a flock. It could not be traced why both would be used for both senses, and it may be due to regional or, more likely, temporal variants with an increase in specialisation of the term’s use. Analyses of corpora may assist with the latter.
Either way, at present, the differentiation is not official.4
Through the provisions of the Pan South African Language Board (PanSALB) Act 59 of 1995 (amended by Act 10 of 1999), there exists an IsiZulu National Language Body called UMZUKAZWE, which is short for UMkhandlu WesiZulu Kuzwelonke. The mandate of UMZUKAZWE is to develop isiZulu so that it becomes a modern scientific language to achieve parity with English and Afrikaans (Khumalo, 2017). While there has been some development of specialised terminology through this effort, PanSALB has been criticised for its failure to effectively develop African languages at the level of similar bodies, such as the Real Academia Española and Académie Française.
To formalise them, we need the entities
This ad hoc formalisation straddles even further into domain ontology territory than some of the previously analysed part–whole relations, which becomes even more elaborate for ilunga, for it also requires a definition of joints of extremities (legs, arms, hand, feet, fingers, toes) and of their analogue for the nodes of stems of sugar cane, reed and similar types of plant, which may be gleaned from the Plant Ontology or Crop Ontology (The Plant Ontology Consortium, 2002; Shrestha et al., 2010). Also, here is not the place to develop that domain ontology, so therefore, we leave ilunga with the informal description only.
Ukuhlanganyela (v.) denotes participation of a collective in an event where the members of the collective act in unison; e.g., the electorate participates in an election (Keet and Khumalo, 2016) and an operating team participates in an operation. In natural language text, the verb is used in inflected form, as illustrated in (6) below, which also illustrates how the ‘in’ of ‘participates in’ is realised using the same rule for locative affixes as we have seen earlier for containment and whose details about the morphology have been described earlier by Keet and Khumalo (2016).
From the viewpoint of ontology, this means we have to constrain the domain of
Ingqikithi (n.; nc9) means ‘essence’ and it is used for both essential and immutable part, such as the brain being an essential part of a human and a hand an immutable part of a boxer (while the boxer is a boxer), respectively. This is an orthogonal constraint that may be added to a part–whole relation, as it concerns either necessities (as attempted by Guizzardi (2007)) or may be formalised in a temporal logic (Artale et al., 2008). Given that the formal characterisation is known and that it is somewhat elaborate, we omit it here, noting that ingqikithi is simply the union of the two, as formalised by Artale et al. (2008).
Finally, the
The whole corpus yielded the following number of hits of the whole words; i.e., ingxenye ‘part’ was queried but not also siyingxenye ‘is part’ for nouns in noun class 9, ziyingxenye ‘are part’ for nouns in noun class 10, etc., thus yielding the lowest possible number of mentions: ingxenye: 132; ukuhlanganyela: 95; isiqephu: 269; iqatha: 194; isichibi: 3; isithako: 27; isigaba: 3002; mumatha: 0; fumbatha: 0; umunxa: 105; ingqikithi: 239; akhiwe: 153; enziwe: 267; ilungu: 2012; ilunga: 386; umqobelo: 0 (though the verb qoba yields 56 hits); isitho: 309; containment (LOC+LOC) was not queried because there is no single term for it (recall Section 5) and therewith too many options have to be tested, since it can apply to all nouns except for those in nc14 (abstract nouns) and nc15 (infinitives). As expected, there was some difference but no clear-cut case with ilungu vs. ilunga: although ilungu is used predominantly to refer to a ‘member of’, ilunga is used in both the ‘member of’ and ‘part of’ senses. No mentions of mumatha and fumbatha is to be expected, because the strings queried is the concordance only; the underlying reasons are discussed in Section 5.
Concordance results from the section of the INC. n: total hits, relevant: term used in sense of part; match: used as indicated in the formalisation; “–”: not queried
Concordance results from the section of the INC. n: total hits, relevant: term used in sense of part; match: used as indicated in the formalisation; “–”: not queried
Retrieving data for the selected section of the INC thus yielded further lower values. The aggregates are shown in Table 1 and the raw results are available in the online supplementary material. The principal observations to note are:
The (from the outside) seemingly overly restrictive iqatha is indeed restricted in its use to meat only, as is isitho indeed restricted to body parts only; The essential part–whole relation ingqikithi is indeed always used as such and thus illustrates many uses that will be useful for future research on this topic; The notions of ‘region’ in the definitions of umunxa and isigaba have been used in a much broader sense than the examples of physical (respectively geographical) regions given in the previous section, but may still satisfy their respective formal definitions; The mentions of ingxenye demonstrate it is indeed used for part in the broadest sense; For ilungu, it is worth noting that the corpus supports the praxis of a clear shift from its general use as a part-of to a more specialised use in the sense of membership of humans and their collectives. This semantic drift has not yet been captured in the lexicon/dictionaries. Some terms have more than one sense (e.g., isiqephu also means ‘section’, akhiwe) and their use was mainly with the other sense, such as akhiwe yimisindo yenkulumo ‘built by speech sounds’, which is due to the broader meaning of the verb (-akha).
Discarding the false positives due to different sense usage as inapplicable, then it can be concluded that no relevant concordance result violated the definitions of Section 4.2, except for the two ilungu mentions of which was already anticipated that there would be violations due to the restrictive formalisation chosen.
We discuss the outcomes generally first and then illustrate several examples both of its use as well as where the differences observed will make a difference in possible applications. Last, we turn to several outstanding issues for further ontological analysis and computing challenges to assist with that.
General discussion of the outcomes
Having followed the procedure as described in Section 3 and concluding from the results presented in Section 4, the ontological analysis showed that the different terms held up as denoting distinct part–whole relations, even though some domain or range has been underspecified. Moreover, their characterisation showed that, besides a few equivalences (e.g., the generic
The Zulu language and culture may not be alone in this approach of highly specific terminology denoting entities at the level of domain ontologies rather than top-level ones. In underlying idea, this is similar to those specific terms for parts in the German language, such as Bauteil which were discussed in Section 2, and it also may relate to YAMATO’s approach and terminology. YAMATO was first created with Japanese vocabulary and then translated into English. It has 96 sub-relations for ‘has part’ and most of them are as specific as, among others, isiZulu’s mumatha, such as
Next, we illustrate potential use of the part–whole relations for isiZulu and how this may also affect ontologies more broadly. Afterward, we discuss several avenues for future work.
Motivating examples with possible applications
In Section 1, we mentioned relevance of the results to the medical and healthcare domain, notably for improving patient-doctor communication and generating discharge notes, among other possible scenarios that also may include after-care instructions and generating multilingual medicine information leaflets. Relevant terminology in isiZulu is being collected, developed, and standardised (Engelbrecht et al., 2010; Khumalo, 2017), which thus would facilitate the localisation. However, the terms have to be related, like in SNOMED CT. Given the differences in part–whole relations as expounded in the previous sections, a simple translation will be inadequate. Let us illustrate this claim with a few examples. Consider informing a patient about a blood sample having been taken for a hemogram (a complete blood count test), and the following relevant axioms in SNOMED CT for Take the identifiable body parts, such as Reconsidering the earlier mentioned example of operating theatre, this is in SNOMED CT with ID 225738002 and the assertion
In this example it is not only the case of higher precision for translation, but, arguably, also a revision for SNOMED CT’s knowledge. The additional scrutiny needed for an isiZulu localisation therefore may assist in improving the quality of the original. Different examples that illustrate the same two underlying aspects are as follows:
Looking beyond the healthcare domain, consider, e.g., architecture, which has been investigated extensively also for Southern African architecture and the specific terminology used for those building structures (Frescura and Myeza, 2016). A BioPortal search on ‘roof’ as a part of a house returned the Environment Ontology (EnvO) (Buttigieg et al., 2013). First, EnvO has as high-level statement
Ontological issues yet to be addressed
Several terms are yet to be analysed in depth, and they tend to draw in other ontological issues in addition to aforementioned essential part (ingqikithi). For instance, isijuqu refers to a part that remains, as in ilunga yisijuqu semfe which means that of the amalunga ‘subsections’ (of the stem) of the imfe ‘sugar cane’, the top ilunga gets torn off, and the larger remainder of the cane remains, and therefore isijuqu applies, not ingxenye. Isihlephu, -yimvithimvithi, and udengezi bring up the notion of identity: the scattered part isihlephu has an identity of its own, such as the ear of a cup that has broken off (but it does not apply to a chip of the cup), with -yimvithimvithi, the parts/pieces are such that the whole is no longer recognisable, such as the pieces of the glass that has shattered, but for udengezi, the part broken off from the whole assumed another function. While identity indeed has been investigated for parthood since a while (Thomson, 1983) and is ongoing (Bennett, 2017), and the discussion forms part of how to construct mereological theories with respect to the extensionality principle, the focus is mostly on the identity of the whole rather than the part. To the best of our knowledge, it has not been investigated to determine whether the parthood relation itself requires specialisation based on whether the part has an identity (and, arguably, unity) or not. Udengezi adds a further complication, as it is not only with some identity, but the part also obtains a new function, rather than the more common notion of functional part whilst being a part (Vieu and Aurnague, 2005). These extra features over and above merely being a part in the sense of standard mereological theories require further scrutiny not only with respect to their use but also for mereology.
Knowing now that the part–whole taxonomies differ at least to some extent, it raises the question why this is so, which we cannot hope to answer here. An obvious direction of explanation is that it may have to do with the fact that isiZulu belongs to a different linguistic classification (Nguni) from English (Germanic), that may presuppose different cultural-linguistic groups. There are some preliminary results on culture and personality. Velichko et al. (2011) showed that terminology and word clusters used were linked to culture, and in particular regarding social and relational aspects, which were shown to be more pronounced in Africa than in Western conceptualisations. This, of course, does not demonstrate a causal relation to have an additional
Computing challenges for corpus-based testing
The intention of a mixed-methods approach of ontological analysis and testing against the corpus could not be realised to the authors’ satisfaction, although some useful results have been obtained. This is partially due to technological issues and isiZulu being an underresourced language. The concordance test was carried out with a simple string matching of the single, whole word. However, isiZulu is an agglutinating language and it typically would have any one or more of the slots filled in the prefixes positions of the word, and minimally the so-called subject concord (roughly: conjugation) and the deep prepositions. As could be observed from the examples already, an ‘is part’ with a noun from noun class 5 (that plays the part) results in iyingxenye in the text, but with a noun of noun class 10 it is ziyingxenye. There are 17 noun classes with 10 distinct subject concords overall, which adds to the challenge to formulate regular expressions to be able to find all possible permutations.
The query also would need to factor in the ‘of’ of ‘part of’, which is realised with the possessive concord that is added to the noun of the object that plays the whole and involves phonological conditioning; e.g., -ingxenye ‘of [a/the/at least one/some] human’ is ya + umuntu = yomuntu and ‘of dinner party’ is ya + idili = yedili. Each noun class has a possessive concord and three phonological conditioning rules for the vowels. This is more complex for the part–whole relations that are realised with verbs, which is due to both agglutination and inflection; e.g., mumatha may be used in a sentence such as uswidi umumethwe emlomeni ‘the sweet is contained in the mouth’, where the u- is the subject concord for uswidi and the -we is the passive extension that also induced the vowel change in the root.
Including all permutations for all relations is not possible with the current limited technologies for isiZulu NLP even for those variants that we know of, i.e.: there is no grammar book that lists all permutations so that it would be a matter of just constructing the regular expression. Yet, as the results showed, searching for only the noun or infinitive returns too many false positives, such as idioms, unrelated compound nouns, and unrelated sentence constructions – just like it would in English. A part-of-speech (POS) tagger may assist in filtering out such false positive phrases, as one then could detect and select noun(phrase)-verb-noun(phrase) patterns. However, at the time of writing, there is only a limited isiZulu POS tagger that cannot be integrated with the Wordsmith tools that the INC is locked into. Thus, the corpus-driven approach for part–whole relations in isiZulu, be this for harvesting or text analysis alike by Ittoo and Bouma (2013) for Wikipedia, will require more resource development first.
Conclusion
The procedure of lexicon harvesting of terms for part–whole relations and their subsequent analysis, yielded 31 candidate part–whole relations in Zulu language and culture. Eighteen of them were analysed and formalised in this paper and checked agains the IsiZulu National Corpus. The main insights obtained is that, while the general notion of a part–whole relation does exist in the Zulu language and culture, there are numerous differences with respect to the hitherto assumed to be world-wide common part–whole relations.
The large number of words for parts in isiZulu also did result in more ontological distinctions compared to those in the well-known lists of part–whole relations, as well as some underspecification where the general
This paper is the first systematic ontological assessment of part–whole relations outside the Western hemisphere. The process we have proposed here may be useful also for other languages and cultures, so as to gain a deeper understanding of part–whole relations in general and that may be used in domain ontologies. Further, we have started looking into creating more algorithms to be able to obtain better results from a larger portion of the IsiZulu National Corpus.
Of the yet to be analysed candidate part–whole relations, the main challenge to address concerns identity. For instance, where the part is a ‘whole body part’ (e.g., eye) or an identifiable piece (e.g., ear of a cup) – that is, identity of not only the whole but also the part and differentiating somehow between those entities and ‘others’, such as bone and the chip of a cup, respectively. This may be of interest to mereology to investigate. For ontology engineering, it is still unclear how to systematically handle this sort of plurality of relations when there are no neat 1:1 mappings, for declaring the axioms, their maintenance, and modelling guidance, and, more broadly, multilingual ontologies and localisation and globalisation of ontologies (research question 4).
Footnotes
Acknowledgements
We thank the corpus intern Ms. Neo Putini for extracting isiZulu novels from the entire IsiZulu Nation Corpus in order to facilitate their separate analysis and querying using WordSmith Tools, and Mr. Njabulo Manyoni for providing his professional insights in the use of some of the specialised terms. We also thank the reviewers for suggestions that helped improve the paper.
