Abstract
This article aims to explore representation of the content knowledge of historical Malay manuscripts by extracting the event features using an event ontology framework. The manuscript used during the testing is Sulalatus Salatin (Sejarah Melayu
) by Abdul Ahmad Samad and it was published at University of Malaya Digital Library database. In aligning to a domain-specific ontology, the Simple Event Model (SEM) model is adopted and an event-based ontology for historical Malay manuscripts is designed. Information extraction approach is done manually to extract events from the manuscript and mapped into Protégé editor. Competency questions were constructed and submitted to the Protégé editor using SPARQL to check the ontology capability of providing answers as well as to examine its correctness. Event-based ontology model assists in discovering and representing the content knowledge of historical Malay manuscripts and supports organisation of knowledge. All the main concepts are extracted from selected Malay manuscript and 17 concepts used to develop the event-based ontology model. The knowledge was verified by three domain experts in Malay manuscript. In the findings, the interrater reliability for Event and Actor instances is 84%, which means 16% of instances and its type are incorrect and need amendment. For Place, interrater reliability is 95% and 99% for Role. Meanwhile, the experts achieved 100% agreement for Time. In addition, the experts agreed that the concepts, properties and instances for Malay Manuscript Ontology and complied with the criteria of consistency, completeness, conciseness, expandability and ease of use. The development of the event-based model of an ontology-based system with a high level of semantic granularity reflects the various cultural riches and intellectual aspect stored in Malay manuscripts. This will enable systematic research of the knowledge embedded in the manuscripts and make it widely and easily accessible by everyone.
Keywords
Introduction
Manuscripts are considered unique primary source of historical, cultural, and informational value data. Malay manuscript is a valuable handwritten document in the Malay language which raised at the beginning of the fourteenth century and ended in the early twentieth century with the coming of the west and the introduction of printing machines (Omar, 2001). The main features of the Malay Manuscripts are: (i) Written in Jawi script, (ii) Early period manuscripts were written on palm leaves from the lontar palm and the Nipah palm, bamboo, vellum and tree bark, (iii) after the introduction of paper during the Islamic period, the majority of Malay manuscripts were written on European paper such as Dutch Paper (Omar, 2001).
Manuscripts are fragile and valuable; access to originals is usually restricted to a few pages. By using the original manuscripts, the documents may be fronted with damage although they are well protected. Therefore, many organisations digitise it and put it on the web to make it available to a wider audience. User can study the history through remote access of the original materials through their images. Digitisation of a manuscript is not new in Malaysia. Different types of institutions have embarked on manuscripts digitisation projects. Digitisation of manuscripts would allow access in a single format accessible to all and would contribute to the preservation, access and appreciation of the local heritage and culture. It also provides excellent services for the public to search the resources rapidly and comprehensively anywhere at any time (Manaf, 2008).
The content is the valuable item in the historical material, especially manuscript. The manuscripts cannot be considered fully available if its content has not been discovered. According to Toledo and Carbonell (2018) even though changing over a digitized document image into machine-readable text is clearly a decent advance forward, the last objective is to extract the data contained to permit the access and search by content. This is consistent with Aljalbout and Falquet (2018) findings a need for an approach to classify the knowledge contained in the manuscript into a meaningful thematic category and make it available and accessible (Aljalbout and Falquet, 2018). However, manuscript is a rare collection with special feature and attributes.
The knowledge contained in the manuscripts are not easy to understand, and they are sometimes incomplete or not well organized (Aljalbout and Falquet, 2018). It also has been produced in different periods with several views and perspectives. Some words or entire phrases in the original material may be lost or impossible to read or recognise by humans or computers. In the Malay historical manuscript, the challenges are usually on the attribute. According to Jones (1999) Malay manuscript is occurred in fourteenth A.D. century, the time when Islam took root in the region. Most of the manuscripts are written on paper, in ink, using the Arabic (Jawi) script. Some of the physical manuscript having poor visibility and the language use (old Malay Language and written in Jawi Script) also making it difficult to read and understand (Rifin and Zainab, 2007).
Another major problem in many cultural collections is that they contain noisy text with inconsistent references to, and spellings of, the same entity e.g., person, place, event (Hampson et al., 2012), or also known as named entities ambiguity. For example, persons in the text having the same name and sometimes carrying the same honorary title, and same events mentioned in manuscript A and manuscript B, may have a different interpretation. There is a need to disambiguate between persons with the same name. The studies on information-seeking behaviour by humanities emphasise the importance of being able to search for particular people, places, and another proper noun (Crane and Jones, 2006).
There is a limitation to discover, organise and access to the content knowledge embedded in the manuscript. So that, there is a need for an appropriate and adequate concept or model that can address the requirement of knowledge-base for manuscripts. It is claimed that events and roles are a key concept in order to achieve an adequate representation of historical facts (Goy et al., 2015). Event information is essential for cultural heritage because it is central to understanding heritage information (Doerr, 2009). In their paper Junnila and Hyvönen (2006) chose processes and actions (events) as a basis to describe many kinds of cultural resources to make them searchable, unified logic and linked together semantically in insightful ways. According to them, human culture is much about action, and about people doing things. Actions and processes tell us more about the life around cultural objects that other point of view.
In today digital manuscripts are represented using metadata such as Dublin Core. This metadata only represents a limited number of parameters such as author, title, date and subject, while most of the data remain unsearchable and undiscovered (Zhitomirsky-Geffet and Prebor, 2016). This limitation of traditional knowledge representation and organisation methods are a significant impediment to the discovery of and access to the digital collection (Pattuelli, 2011). It fails to answer complex research questions such as, “How many events happened during the reign of Sultan Mansur Syah in Malacca?” This question requires a series of field-based queries because the information is often heterogeneous and distributed, may also not available all in one place, but may have to be collected from several locations.
Aljalbout and Falquet (2018) stated that the knowledge-base for manuscripts should have the following: (i) representation of the manuscript contents, (ii) contextual knowledge that describe persons, places, events, and other entities. (iii) Terminological knowledge that describes the meaning that the author associates with terms at a given point in time; (iv) terminological knowledge that describes the concepts of a scientific (sub) domain, and (v) bibliographic knowledge about the works cited in the manuscripts.
The issues mentioned above are also related to the Malay manuscript. Improving content access is essential as these sources become increasingly important in education and research especially in Malay manuscript. Ontology is said to be the best way in representing, organising and sharing the knowledge (Zou and Park, 2018). According to Mäkelä et al. (2012), event ontology is useful for indexing historical, cultural heritage content. Events are the semantic glue that associates actors, objects, places, and time together. Junnila and Hyvönen (2006) found that the event-based approach can be used in annotations for interoperability, as a search object of their own and provide the end-user with insightful semantic recommendations with explanations.
Moreover, among the most critical information for historians is the event (Ramli et al., 2016). The event will create a complete semantic framework by interlinking all entities such as people, places, object, and time. The relationship between various entities such as persons, works, and locations to name a few is very important because they constitute a large part of the context, which gives meaning the object of study (Benjamins et al., 2004).
In order to discover, organise and access to the content knowledge embedded in the historical Malay manuscript, it is essential to investigate the event as entities that can define hierarchical structures of documents and a relationship between events regarding a person, things, time and place and their interactions. Hence, the objectives of this research are:
To extract the content knowledge of historical Malay Manuscript based on an event ontology framework.
To evaluate the content knowledge of historical Malay Manuscript.
This study is organised as follows: the following section of this article reviews related research on ontology model and the next section explains the methods for the ontology development. The fourth section in this study describes the result of the ontology that has been built and discusses relevant issues. The fifth section provides a conclusion, including plans for further studies.
Related research
The development of ontology models for the information produced and stored by cultural heritage institutions has been on the agenda since the 1990s (Le Boeuf, 2012). The models are important in structuring and representing the data or knowledge (Meroño-Peñuela et al., 2014). One of ontology model that has received a lot of attention is an event-based ontology model. This particular model represents knowledge with higher granularity (Zhong et al., 2012) and at a different level of abstraction (Van Hage et al., 2012).
Events can be characterised by six different aspects: time, space, participation, relations between events (in term of mereology, causality, and correlation), documentation, and interpretation (Scherp and Mezaris, 2014). It provides a natural way to explicate complicated relations between people, places, actions and objects (Van Hage et al., 2011). Ruotsalo and Hyvönen (2007) has presented an event-based representation of knowledge about the real world.
In the context of historical text, several works have been done in representing and visualising the knowledge of the primary sources. Finnish Culture Sampo project is a pioneer work in presenting a prototype system for integrating cultural content on the national level in Finland (Mäkelä et al., 2012). In this project, they created a historical event focusing on Finnish history, based on the timeline created by the Agricola network of Finnish historians. The ontology is based on a temporal classification of the event on a temporal timeline. They manually annotated 220 events between the years 1850–1920 using SAHA annotation tool coupled with ONKI ontology library servers for using shared domain ontologies (Hyvönen et al., 2007). This project proposed a dynamic content approach, where different contents are organised and linked together by performing a selection of available contents according to specific criteria (Locatelli et al., 2012). The ontology designed allows users to visualise numerous heterogeneous data that are mutually linked, and then produce some knowledge about the observable entities (Messaoudi et al., 2018).
Modelling historical data and events has also been the focus of Irish Record Linkage 1864–1913 project, which started in 2014 (Debruyne et al., 2015, 2016). The objective of this project is to create a platform for analysing events captured in historical birth, marriage, and death records by applying semantic technologies for annotating, storing, and inferring information from the data contained in those records. They used Historical Events Ontology for an interpretation of the register pages.
The Clavius On the Web3 project used existing ontologies such as FOAF, CIDOC CRM and FRBRoo to build a semantic infrastructure in describing the document by using event. They aimed to restore and enrich the manuscripts written by Christophorus Clavius (1538–1612), to facilitate users’ access and explore the content of manuscript in depth by finding new information. This included information about cited people, places and events, to learn new words and concepts and to better understand the language and the structure of sentences (Abrate et al., 2014).
To identify a possible ontology for managing common events in the document, several existing and well-known event ontologies have been developed. Each model differs from each other because they were created for different purposes. They can be categorised based on four criteria which are, domain (in) dependency, classes or properties focus, scope (minimal or complex), and the level of formalisation (Van Hage et al., 2011). Among these models are Event Ontology (Raimond and Abdallah, 2007), LODE (Shaw et al., 2009), CIDOC CRM, Simple Event Model (Van Hage et al., 2011), and Event Model-F (Scherp et al., 2012). These ontologies have a similarity in their conceptual modelling of an event, even coming from different domains.
These works and projects are reasonable efforts to reveal the relationship between data within a digital collection by building ontologies. Through this way, the relationship between selected concepts and their semantic meaning in the documents can be reformed, represented and rediscovered in a digital collection. Hence, to address the research problem, this study aims to build an event-based model to reconnect and represent the content knowledge and relationship of historical Malay manuscript.
Methodology
The main objective of this research is to extract the content knowledge of historical Malay manuscript using an event ontology. Thus, a research framework on event ontology was constructed as a vehicle to achieve the objective.
Figure 1 represents the graphical overview of the research methodology framework of event ontology for historical Malay manuscripts. It highlights the four main phases, and methods implemented throughout the research. The primary research phases include knowledge development (Noy and McGuinness, 2001), knowledge representation (Horridge et al., 2011), knowledge integration (Yanti Idaya Aspura and Shahrul Azman, 2010), and knowledge evaluation (Gómez-Pérez, 1999).

Research methodology framework of event ontology development for historical Malay manuscripts.
As mentioned earlier, events can be characterised by time, space, participation, relations between events, documentation, and interpretation (Scherp et al., 2009). These characteristics will be discovered using the event ontology framework. The knowledge development phase is the manual process starting with determining the domain, scope and purpose (Noy and McGuinness, 2001), then reusing the existing ontology, enumerate the terms in the ontology, defining the classes, class hierarchy, properties and inverse properties and lastly create instances. The next phase is knowledge representation. The purpose of this phase is to implement the formalised ontology in a formal knowledge representation language (Beck and Pinto, 2002). Implementation also means writing the formal ontology in machine-processable ontology language by using OWL for example (Thunkijjanukij, 2009). Ontology languages are used in knowledge representation in terms of expressiveness and computational properties using ontology editor tools (Jain and Mishra, 2014). The knowledge integration phase is the enhancement process to make the event ontology comprehensive by including the external sources. And the last phase is knowledge evaluation on the contents extracted based on event ontology. The domain experts will assess and validate the contents extracted using the event ontology. According to Gómez-Pérez (2001), in order to prove the correctness of the ontology, consistency, completeness and conciseness must be proved. Whereas Kreider (2013), proposed four criteria for ontology evaluation which are comprehensiveness, consistency, extensible, and easy to use. In this research, we apply Gomez-Perez et al and Kreider criteria for ontology evaluation.
This study focuses on one historical Malay manuscript namely Sulalatus Salatin (Genealogy of Kings) that composed sometime between 15th and 16th centuries. This is a history of origin, evolution and demise of the great Malay maritime empire, the Malacca Sultanate. The events in the historical manuscript will be discovered and arrange in the structured format using ontology based.
The first objective of the study is to extract knowledge content from historical Malay manuscript. The event ontology framework is used to extract the content from the Sulalatus Salatin Malay manuscript. In the framework, the first phase in extracting the contents is knowledge development process. This process was conducted manually and involved eight steps.
Knowledge development
The first step is knowledge development. In the process of ontology development, there is no single correct way of ontology design methodology (Noy and McGuinness, 2001). This study followed 101 Ontology Development methodology proposed by Noy and McGuinness (2001) with refer to Boyce and Pahl (Boyce and Pahl, 2007) as the general methodology framework for Malay Manuscripts ontology model development. This method is comparatively perfect one which is widely used and provides specific details on operation and related technical support. There are eight steps in the knowledge development phase as described below:
Step 1: Determine the domain, purpose, and scope of the ontology
The first step in developing ontology is to determine the domain, purpose and the scope of the ontology or also known as ontology specification. Defining the scope and purpose of the ontology can be done through constructing and answering basic questions and competency questions as suggested by Noy and McGuinness (2001) and Gruninger et al. (1995). Both sets of questions could assist in defining the scope and purpose of the developed ontology.
In this research, the basic questions and Competency questions would clarify the purpose of the ontology and limit the scope of the historical Malay manuscript ontology. The answer to these questions may be updated and revised during the ontology design process. The first basic question that need to be answered according to Noy and McGuinness (2001) is on the domain. In this research, historical Malay manuscripts were chosen as a domain. Historical Malay manuscript is rich with historical events, places, person, etc. The purpose of this ontology development is to provide a knowledge based containing information about the Malays history, culture and civilization for students and users. This will enable a systematic research of the knowledge embedded in the manuscripts and make it widely and easily accessible by everyone. This ontology should be able to answer direct and complex questions by users. It would directly benefit students and we who studies in Malay manuscript and cultures.
Competency questions is a list of questions that a knowledge base on the ontology should be able to answer. These questions are just a sketch and do not need to be exhaustive. The preparation of a set of competency questions in natural language help in identifying the coverage and scope of the ontology. It also for important for the designer to understand when ontology contains enough information and when it achieves the right level of details or representation (Cristani and Cuel, 2005). Below are some of the informal competency questions based on the Sulaltus Salatin manuscript:
How many events mentioned in the text?
What type of events mentioned in the text?
Why was the event happened?
Who were involved in that event?
List down the events that took place during the reign of Sultan Mansur Syah, which involved Hang Tuah.
Who was involved during Malacca making an international relationship with Majapahit and what was their role in the event?
All the competency questions should be kept in mind while developing the ontology, and the ontology should be capable of answering all the questions after the development process.
Step 2: Determine the knowledge source
The knowledge source for event ontology of historical Malay manuscripts was selected from MyManuskrip Digital Library at University of Malaya. In this research, we have selected one Malay manuscript which is Sulalatus Salatin (Sejarah Melayu
) by Abdul Ahmad Samad (editor) as the source. This manuscript has been selected because it has been digitized, published, and transliterated to Romanize script.
The original text has undergone numerous changes, with the oldest known version dated May 1612, through the rewriting effort commissioned by the then regent of Johor, Yang di-Pertuan Di Hilir Raja Abdullah. It was originally written in the Classical Malay on traditional paper in old Jawi script, but today exists in 32 different manuscripts, including those in Rumi script (Fig. 2). Notwithstanding some of its mystical contents, historians have looked at the text as a primary source of information on past events verifiable by other historical sources, in the Malay world. In 2001, the Malay Annals was listed on UNESCO’s Memory of the World Program International Register. Several factors influenced this decision in selecting the data source in this research: (i) This manuscript is already digitized and published, (ii) have been transliterated to Romanize script. Therefore, it is easy to read and understand compare to original manuscript which is in old Jawi script and available in digital format.

Original text of Sulalatus Salatin and transliteration text version of Sulalatus Salatin.
In this step we have conducted an analysis of the numerous existing ontologies that are relevant to event ontology for content knowledge of historical Malay manuscript. The summary of findings is mention in Table 1.
Existing event model
Existing event model
These ontologies have a similarity in their conceptual modelling of event even coming from different domains and different definition of event. They share similar concepts in defining an event which are person, time, date, and place. We have selected the Simple Event Model (SEM) as a main reference model to represent a knowledge extracted from the historical Malay manuscript. According to SEM, “Events encompass everything that happen, even fictional events. Whether there is specific place or time or whether these are known is optional. It does not matter whether there are specific actors involved. Neither does it matter whether there is consensus about the characteristic of events” (Van Hage et al., 2011). Hence, there is no specific definition on event, anything that happen are valid events in SEM.
SEM can create an event-event relationship (van den Akker et al., 2011). This relationship can generate prototypical historical narratives. There are three basic relationship types, corresponding to three of the event properties which are (i) topological relationship, (ii) conceptual relationship and (iii) biographical relationship. The topological or location-based relationship is where two events are related because they involved the same place. Meanwhile, the conceptual or type-based relationship is where two events are related because they involved the same type of event. Finally, the biographical or actor-based relationship is where two events are related because they involved the same actor. SEM will be the reusing ontology to extract the content of Sulalatus Salatin Malay manuscript.
After selecting the existing ontology, the next step is to construct the list of the ontology content comprising classes, subclasses, relationship and instances. Therefore, the SEM classes were analysed in order to instantiate the classes. The purpose of this activity is to distinct what information should be extracted from the text.

The classes of the Simple Event Model (SEM) (Van Hage et al., 2011).
In SEM, the classes are divided in three groups: Core Classes, Types and Constraint as shown in Fig. 3. In core classes it has sem:Event, sem:Actor, sem:Place, and sem:Time. These four core classes outline the basic idea underlying SEM, the motto “Who did what where and when?” The four core classes have an associated sem:Type which contains resources that indicate the type of a core individual. In Constraint class it has three classes which are sem:Role, sem:Temporary, and sem:View. Based on these classes, we started information extraction process to extract instances for each class.
Information extraction (IE) is an essential task in building ontology, and it is defined as a text analysis task aimed at extracting targeted information from context in the document (Cowie and Lehnert, 1996). The Malay language is rich in colloquial, idiomatic expressions and literary allusions and like other languages, it possesses its own unique structure and grammar (Zamin and Ghani, 2010). They added that due to this, the Malay language has become one of the less-resourced languages in the world and has a limited number of computational linguistic research. Therefore, the information extraction in this research is done semi-automatically. The activities involved during information extraction are shown below.

Information extraction for historical Malay manuscript.
We divide our approach into four steps as shown in Fig. 4:
Pre-processing: In this step, text is divided based on their chapters and sub-topic for each chapter. Then, text will split into sentence. Entity extraction: Entity extraction is the process of identifying relevant entities in the sentence. Relation extraction: The purpose of this step is to identify the relationship ships between the different entities in the text in the form of triples. The relations are extracted by asking “Who?”, “What?”, “When?”, “Where?”, and “Why?”. Entity and relation information. Finally, all the entities and their relation will be listed accordingly.
Figure 5 shown the sample of page from Sulalatus Salatin, where the event mentioned here is “Menjalin muhibah Melaka-Majapahit” (Making an international relationship between Malacca and Majapahit). Next step is to identify the actors, who were participated in this event. From the text, we have identified several actors (person) participated in the event which were: Sultan Mansur Syah, Bendahara Paduka Raja, Seri Nara Diraja, Seri Bija Diraja, hulubalang, Hang Tuah, Hang Jebat, Hang Kasturi and others. We also identified object that mentioned in the text used in the event. The object occurred during that event were, perahu and lancaran (name of transportation). Finally, we identified the time when the event was happened. The time when the event happen was during the reign of Sultan Mansur Syah and the place of the event is Majapahit.

Information extraction activities.
Time is one of the most important factors, because if there are no time elements people cannot suitably locate the event in the process of cognition (Yeh, 2017). As in Sulalatus Salatin, no exact date when the event was happened has been stated. According to Yeh (2017) and Pattuelli (2011) mentioned that there are several modelling choices for temporal aspect such as measure of duration, calendar dates, frequencies, concatenation of temporal interval and etc (Pattuelli, 2011). In Sulalatus Salatin, the domain experts suggested that the temporal aspect can be effectively represented by a governmental period and the event itself. For example, an event “Menjalin muhibah Melaka-Majapahit” (Good relationship of Malacca-Majapahit) was during the reign of Sultan Mansur Syah, so that the time is “Pemerintahan Sultan Mansur Syah” (The reign of Sultan Mansur Syah). An event also become a time for several event for example an event of “Tun Perpatih Putih menjadi Bendahara” was happen after the death of Tun Perak. Therefore, the time assigned is “Selepas Tun Perak Mati” (After Tun Perak Died). All the instances for events, actors, time, place, and object were listed and compiled manually.
There are seven main classes discovered namely sem:Event, sem:Actor, sem:Place, sem:Time, sem:Object, sem:Role and sem:EventType. Besides these seven classes, we have identified another important information which is the cause or factor of the event. The “Cause” is important to know why or how the event happened. Moreover, it is necessary to get an information about actors’ biographical information. This biographical information was extracted based on Relationship Vocabulary, a vocabulary for describing relationships between people like actor’s parent name, child name, sibling, spouse and others. For example, Sang Guna and Hang Tuah has a biographical relationship where Sang Guna is child of Hang Tuah. It is also found that actors in the text have their own position in the society and honorary title that have been awarded by king or rulers. Therefore, we have associated actors with their position and honorary title. For example, Tun Perak’s post is a Bendahara and have an honorary title which is Bendahara Paduka Raja. All this additional information is essential to create a historically meaningful narrative.
There are three approaches in developing class hierarchy, a top-down development process, a bottom-up development process and finally, a combination development process. In this research, we used a combination development process to define the few top-level classes and few specific concepts. The concepts within the hierarchy are connected through relationship. There are two type of relationships developed in this research which are “HasSubtype”, and “HasRelationship”. The “HasSubtype” relationship is the most common relationship. It is used to indicate the relationship between general concept and all specialization of that concept that are themselves simple concept. This relation is the inverse of the “is-A” relationship type. For example, a class A is a subclass of B, and every instance of A is also an instance of B. While “HasRelationship” links concepts within the same hierarchy of between different hierarchies.
In defining the classes and class hierarchy of this research, firstly, the SEM classes were listed. These classes are sem:Event, sem:Place, sem:Actor, sem:Object, sem:Time, sem:Role, and sem:Type. Based on information extraction activities we defined new classes that related to Actor, which are Position and Honorary Title. Next, we defined a top-level class which are sem:Event, sem:Place, sem:Actor, sem:Object, sem:Time, sem:Role, Cause, Position and Honorary Title. The top-level class for Cause, Position and Honorary Title is a new added class extracted from the contents of Malay manuscript. Therefore, we did not put “sem:Cause” for Cause class. All eight classes are sub classes of sem:Core class, and two sub classes of sem:Constraint, sem:View and sem:Role. In SEM, sem:Role was used to represent the ternary relationship among an event, an actor and the actual role the actor plays in the event. However, in this research the class Role has direct relationship to the Actor. The Role represents the function that is played by an Actor in the context of a given event. The class Types corresponding to the core Classes are sub classes of Type.

Class hierarchy for event ontology of historical Malay manuscript.
Figure 6 illustrated classes that represented content of historical Malay Manuscript collection and their relationships. The green oval represents the existing SEM classes, while the red oval represents new added classes. The relationships are represented by arrow. These classes are the main component of event ontology for historical Malay manuscripts and it provide a framework for modelling, representing and better understanding the content knowledge of historical Malay manuscripts. Table 2 shows the definitions of all classes for event ontology.
Classes of event ontology for historical Malay manuscript
Each class have its own instances. The relationship between instances is called properties (Horridge et al., 2011). These properties represent the relationship and define the characteristics of the class. There are two types of properties: object properties and datatype properties.
Object properties for existing event ontology models
Object properties for existing event ontology models
Object properties were developed in order to provide detailed information for each class as well as for answering the competency questions. The competency questions were analysed to identify the concepts and relation for the ontology. Some of the competency questions has been discussed in Step 1. Then, we analysed several relevant event ontology model object properties to learn whether and how they can be reused and matched with event ontology for historical Malay manuscript. These ontology models are BIO, CIDOC-CRM, SEM, LODE, and Event Ontology. Table 3 shows object properties for each ontology model.
The table indicate a comparative analysis of ontology model properties in order to find a relevant property for Malay manuscript ontology. The event class described “something that happen” as used in SEM and CIDOC-CRM. In SEM, object properties can be omitted or duplicated as needed. They are all optional, and it is also possible to have more than one of necessary. Optionally is a useful feature when there is incomplete knowledge. For example, the type of Actor is unknown, then the properties ActorType can be omitted. Duplicability can be useful when more than one actor associated with an event and this can be denoted by stating multiple hasActor properties about the same event.
Event class is a central concept or class for this model and SEM define event as “everything that happen, even fictional events, whether there is a specific place or time or whether these are known is optional, and whether there are specific actors involved or not”(Van Hage et al., 2011). The event class contain information about the events mentioned in the manuscript. The Event Type class contain information that are used to classify the events. The event class has nine properties such as hasActor, tookPlaceAt, hasTime, and eventType. Figure 7 shows the classes and object properties range. Each event can be described with multiple actors, places and time. SEM implements a constraint class named Role that is used to modify the actors(s) in the event. This feature allows the same actor to appear in multiple events. For example, Tun Perak is an actor in the event “Membuat hubungan Melaka Majapahit” and “Langgaran Siam”.

Event class and object properties range.
We also add more properties which do not exist in these models to make event ontology for historical Malay manuscript more detail and comprehensible. For class Actor, we used properties of FOAF (Friend of Friend) and Relationship vocabulary.
Properties or slots have different facets that describe the value type, allowed values, the number of values (cardinality), and other features of values the slot can take. In this research most of the slot values are string. Table 4 shows properties assigned for an event ontology model of historical Malay manuscript ontology and the range of the properties.
Example of properties and the descriptions
Beside object properties, Datatype properties also have been developed to link instances with an XML Schema Datatype value or an RDF literal. Datatype properties for Malay Manuscript ontology described in Table 5.
Datatype properties
The knowledge inference infers new and unknown knowledge according to the existing entity-relationship triple in the knowledge base (Dou et al., 2018). This is done by set up several properties into the property characteristic of inverse relationship. There were eight properties set into the property characteristic of inverse relationship in order to develop knowledge inference as describe in Table 6.
Object properties and type for Malay manuscript ontology
Object properties and type for Malay manuscript ontology
Figure 8 shows the sample of knowledge inferences. The property childOf and its inverse property parentOf – if Sang Guna childOf Hang Tuah, then because of the inverse property it infers that Hang Tuah parentOf Sang Guna.

An example of inverse property childOf and parentOf.
Transitive object property relates individual to individual B, and also individual B to individual C, then we can infer that individual A is related to individual C via property P. For example, the transitive property successorOf. Sultan Mansur Syah is a successor to Sultan Muzaffar Syah, and Sultan Muzaffar Syah is a successor to Sultan Abu Syahid, then it can infer that Sultan Mansur Syah is successor to Sultan Abu Syahid as shown in Fig. 9.

Sample of transitive object property successorOf.
If a property P is symmetric, and the property relates individual A to individual b then individual B is also related to individual A via property P. An example of a symmetric property, if the individual Tun Perak is related to the individual Tun Kudu via the siblingOf property, then it inferred that Tun Kudu must also be related to Tun Perak via the siblingOf property as shown in Fig. 10.

Sample of symmetric object property siblingOf.
It is essential to use inference tools because it discovers new relationships on the data. “Inference” means that automatic procedures can generate new relationships based on the data and based on some additional information in the form of a vocabulary, e.g., a set of rules. Whether the new relationships are explicitly added to the set of data or are returned at query time. Inference based techniques are also important in discovering possible inconsistencies in the data.
The event ontology for historical Malay manuscripts was instantiate during Information Extraction process. During information extraction process, we have identified the instances for Event, Actor, Object, Place, Time, Cause, Role, Position, and actors’ biographical information. There is no information about event type, actor type, object type, place type, time type and role type in the text of Sulalatus Salatin. Basically, this information can be defined by a third party and it is not enforced to use of types. In this research, an event type was defined based event type in BIO ontology. For the PlaceType we used the Getty Institute’s Thesaurus of Geographical Name (TGN). For example, Place “Beruas” and its placeType is “inhabited place”.
There are 268 events has extracted from the Sulalatus Salatin Malay manuscript. This events then were group based on their type. The most frequent event mentioned in the Sulalatus Salatin is Event Coronation and Death (34), followed by Marriage (31), Appointment (26) and Murder (24). Table 7 depicts the sample list of identified individual derived from Sulalatus Salatin.
Sample list of identified instances from Sulalatus Salatin
Sample list of identified instances from Sulalatus Salatin
After doing the manual development, the information has been mapped into structure knowledge representation using the Protégé Ontology Editor and the URI for Malay Manuscripts Ontology is
In this stage, ontology was represented in a formal way using formal ontology language which is Web Ontology Language (OWL) with the support of the Resource Description Framework (RDF). RDF is a standard machine-process able format for stored data, vocabularies, and data-handling rules. This format is recommended by W3C. Data are stored a single statement known as triples, and support rule-based processing. In this research, we refer to Horridge et al.’s work (2011), A Practical Guide to Building OWL ontologies Using Protégé 4 and CO-ODE Tools Edition 1.3.
According to Horridge et al. (2011), all the class names should start with a capital letter and should not contain spaces or use underscores to join word. In this research, we named the class with capital letter and with no space. For example: Person, Event, and Place. After that, we key in individuals or instances for each class. Instances can be assigned and may belong to more than one class. For example, “Seri Teri Buana” was belong to class “Actor” and “HonoraryTitle”. For duplicate name, such as person name, we used number to differentiate them i.e. Raja Ahmad1, Raja Ahmad2, Raja Ahmad3.
Once the ontology classes and the hierarchy have been established, they were assigned properties and characteristic. Fifty-nine object properties were constructed in the Malay Manuscript Ontology in order to link two individuals together. Another type of property is Datatype Property. Datatype properties link an individual to an XML Schema Datatype value or a rdf literal. They describe relationships between an individual and data values. In this research there are thirty-four Datatype properties assigned. Properties would be linked to form relations between individuals in the class.
Particularly, we developed the event ontology in OWL and verified the accuracy and correctness of the information using SPARQL. The correctness of the result from the SPARQL Query will be validated by comparing it with the data source. The competency questions were translated from natural language into the query language in order to make them understood by the system. Below are the Competency Questions & Answers retrieved using SPARQL query:
Competency Question 1: List down the events that took place during the reign of Sultan Mansur Syah, which involved Hang Tuah
This query required a system to find all event participated by Hang Tuah during Sultan Mansur Syah reign. This question has generated eight results from the SPARQL query execution and successfully answered the posed competency question, as shown in Table 8. This result was validated by comparing it with the list of events extracted from the manuscript. It is found that the list of events participated by Hang Tuah is consistent with the result in SPARQL Query.
Result for competency question 1
Result for competency question 1
Competency Question 2: Who was involved during Malacca making an international relationship with Majapahit and what was their role in the event?
This query asked a system to list all actors who were participated in the Event Malacca making an international relationship with Majapahit (Melaka membuat hubungan antarabangsa dengan Majapahit) and role for each actor. This question has generated 22 results from the SPARQL query execution and successfully answered the posed competency question, as shown in Table 9. This result was validated through a comparison with the list of Actors and their Role in the event extracted from the manuscript. It is found that the list of Actors and their Role in the event of Membuat hubungan Melaka Majapahit is consistent with the result in SPARQL Query.
Result for competency question 2
In this stage, Malay manuscript ontology was integrated with DBpedia to enrich the domain ontology. The DBpedia dataset is interlinked with various other data sources on the Web using RDF links. RDF links enable web surfers to navigate from data within one data source to related data within other sources using a Semantic Web browser. RDF links can also be followed by the crawlers of Semantic Web search engines, which may provide sophisticated search and query capabilities over crawled data (Bizer et al., 2007).
The process of linking named entity in the text with their corresponding entities in a knowledge based is called entity linking (Shen et al., 2015). According to Shen et al. (2015), entity linking can facilitate in knowledge base population, question answering and information integration. In this research, there were several instances from Dbpedia were retrieved and then loaded into Malay Manuscript Ontology such as dbo:Tun_Perak, dbo:Mahmud_Shah_of_Malacca, dbo:Kampar, dbo:Malacca and etc. All these instances were link into same instance in the Malay Manuscript Ontology by using owl:sameAs. For example, person name Tun Perak is an instance of Actor in Malay Manuscript Ontology, and there was a limited information about Tun Perak in this resource. In order to give details information about him, information with Dbpedia has been created by adding owl:sameAs:

A sample information about Tun Perak from dbpedia.
Knowledge evaluation is essential to ensure the reliability of knowledge. In this study, the knowledge was verified by domain experts. Experts were selected based on their expertise and authority in Malay manuscript. They are directly involved in Malay manuscript research and teaching. In this research, there are three experts were appointed for the knowledge evaluation process. All experts were given the verification form as a tool to implement the verification task. The form was accompanied with list concepts for their use during the verification process. The verification process was based on above mentioned criteria. There are two types of knowledge evaluation by domain experts, verification, and validation.
Verification is the act of proving or disproving the correctness of the ontology or refers to the technical activity that guarantees the correctness of an ontology. It is to detect anomalies that can occur due to a combination of ontology definitions and rules (Pak and Zhou, 2010). The verification process was done by experts. To guarantee that ontology is well-verified, experts are required to verify its architecture, its definitions, as well as its content. The experts verified the accuracy of concepts constructed from the selected manuscript, which is Sulalatus Salatin as well as its properties. The experts were also required to review the appropriateness of the individuals (instances) for each concept. The experts were provided with a verification form containing the matters for verification.
All concepts assigned in this model have been agreed and verified by the experts as concepts representing the content knowledge of Malay manuscript. Regarding the properties, experts agreed on all properties assigned for the concepts in the Malay manuscript ontology except two properties which inaccurate as commented by one of the experts. These two properties are shown in Table 10.
Expert remark for object properties of class “Actor”
Expert remark for object properties of class “Actor”
All properties assigned in this model have been agreed and verified by experts as properties to define the concepts of Malay manuscript.
Experts were asked to verify the concept or class hierarchy of event-based ontology model, and they have verified that the concept hierarchy of event-based ontology model for Malay manuscript is correct without any adjustment. Moreover, they were also asked to examine all instances for this event-based ontology model. The experts have thoroughly examined the instances and their type. Table 11 shows the interrater reliability among three experts which was measured by per cent agreement.
Interrater reliability among three experts
Interrater reliability for Event and Actor instances is 84%, which means 16% of instances and its type are incorrect and need amendment. For Place, interrater reliability is 95% and 99% for Role. Meanwhile, the raters achieved 100% agreement for Time. There are some of the instances that are incomplete and need to be revised and added. Table 12 illustrated the sample of recommended modification as revised by experts.
Sample of recommended modification
Validation is the process of checking if something satisfies a particular criterion or if the ontology meets the requirements listed in the ontology requirement document (Brusa et al., 2008). According to Gomez-Prerez (2004), ontology validation refers to whether the ontology represents the real world for which it was created, and whether the meaning of the definitions matches with the conceptualisation the ontology is meant to specify. The goal is to prove that the world model (if it exists and is known) is compliant with the world modelled formally.
The two criteria that we have not considered are sensitiveness and comprehensiveness. The sensitiveness of the ontology refers to how minor changes in the definitions can modify other well-defined properties that have already been guaranteed (Gómez-Pérez, 1999). This criteria requires a standard measuring tool that can determine the level of sensitivity made by changes (Delir Haghighi et al., 2013). Whereas comprehensiveness according to Kreider (2013) is an ability to cover the main concepts within the domain (Kreider, 2013). This criterion can be evaluated using completeness.
Criteria for ontology evaluation
C = Compliance, PC = Partially Compliance, NC = Non-Compliance.
The experts were asked to provide their overall evaluation of the concepts, properties and instances for Malay Manuscript Ontology based on the above criteria. The result in Table 13 shows that all the concepts, properties and instances have been rated for compliance with the criteria of Consistency, Completeness, Conciseness, Expandability and Ease of use.
The main purpose of this paper was addressed through the development of event-based ontology for historical manuscript as in Malay manuscript where content knowledge of the manuscript was represented through the event-based ontology. This includes extraction of events and their relations from the manuscripts content using SEM ontology to populate events, named entities such as person, place and time, relations and properties. The event-based ontology model provides a clear framework in extracting, modelling, representing and a better understanding of knowledge embedded in the manuscripts. It enables unambiguous links to be made between events and entities that could connect an event to another in relation to people, places, and dates for example within similar documents. The existences of people, places, and object in the text were described with high level of detail. For example, a person was described not only by his or her name but also in reference to his/her honorary title, role and relation(s) with other individuals. Through the use of FOAF, discovery and retrieval of genealogy information were made possible by setting up several properties into property characteristic of an inverse functional relationship. This is due to the strength of RDF with its ability to flexibly manipulate its meaning (Ciula et al., 2008).
The Malay manuscripts ontology development was conducted based on the methodology demonstrated by Noy and McGuinness (2001) that includes information extraction from the text. Information extraction is widely used for knowledge acquisition. It acquires knowledge from structured, semi-structured and unstructured data to determined named entities, relations and events and then converts them into a structured representation. Knowledge was represented using ontology language and further presented it in a better way to be easily understood by the user and machine in a friendly manner. The SPARQL Query Language (SPARQL) was used to retrieve relevant information through queries. In addition, a further enhancement of the knowledge base is its possibility to link with external dataset such as DBpedia. The ontology offers a better way of organising and representing Malay manuscripts content knowledge and enables information to be represented in a more structured, linked, and interconnected way. For the purpose of this study, the ontology developed was a knowledge representation within a specific domain which is Malay historical manuscript, hence experts’ judgement in the respective domain was needed to build well-structured ontology that accurately representing the knowledge. The domain experts played an essential role to verify the ontologies in order to ensure its accuracy in terms of the structure and free from anomalies. The experts have validated the ontology development were consistent, complete, concise, expandable, and easy to use.
Conclusion
This paper reports an event extraction process to build and event-based ontology for Malay manuscripts. Simple Event Model (SEM) is used to facilitate knowledge discovery and knowledge representation. The study revealed that there is an existing semantic relationship within the content knowledge of the Malay manuscripts and between the sources.
The content knowledge in this knowledge base may be incomplete, where there are several events and other entities from the selected manuscript may not explicitly be represented. The incompleteness may arise from the fact that we were using a single Sulalatus Salatin transcript by A Samad Ahmad only in which there were several copies of the same manuscript which transcribed by different author such as Prof Muhammad Hj Salleh. The developed ontology model is not proposed to be complete, instead, the model is flexible to be further extended and updated as more Malay manuscripts are going to be analysed and inserted in the ontology.
The significance of this research is it contributes to the creation of a knowledge-base for Malay Manuscript collection containing heterogeneous information about the Malays history, culture, civilisation, and other valuable knowledge. This research contributes to providing a methodology for ontology development and enrichment for content and detailed knowledge with the concepts and properties of a domain ontology. The event-based ontology has a capability in retrieving content knowledge based on semantic annotation. The analyse and use non-historical genre of Malay Manuscript such as medicine, Islamic studies, and arts could be the subject of upcoming research.
In the future, the end user can access to this ontology via the accessible link from digital library and it will widely and easily reachable by everyone. The system will be able to formulate precise queries using natural language and facilitating users’ needs with simple and advanced query mode.
Footnotes
Acknowledgement
This research has been funded by Fundamental Research Grant Scheme (FRGS) FRGS/1/2018/ICT04/UM/02/8.
