Abstract
The purpose of the paper was to highlight the digitization of Indigenous Knowledge Systems (IKS) in institutional repositories in South Africa with a view to develop a framework for Web archiving IKS-related websites in South Africa. Anchored on the interpretivist paradigm, the qualitative research method was adopted for this research. The multiple case study research strategy was considered appropriate for the study. Data was gathered through face-to-face in-depth interviews and content analysis. Interviews were conducted with eight IKS staff at the IKS Documentation Centres across four provinces in South Africa. The study revealed that although there are efforts to digitize IKS and make them accessible through some channels online, there are no specific digital preservation policies guiding the project. Apart from the fact that there are policies in place to support any Web archiving initiative, the concept of Web archiving was generally unfamiliar to the respondents. The respondents admitting to the lack of a standard policy guiding the digitization project also admitted to a lack of knowledge or in-depth understanding of Web archiving and its prospect as a digital preservation measure. The research, therefore, proposes a Web archiving framework that should be incorporated in the digital preservation policy framework. This research will be useful to policymakers and all stakeholders in South Africa and other parts of Africa.
Introduction
In recent times, there have been developments in technology leading to the digitisation and digital preservation of Indigenous Knowledge (IK). Some digitisation projects implemented worldwide are Traditional Knowledge Digital Libraries (TKDL) in India, Cultural Preservation, Native Web, IK homepage (Chikonzo, 2006), among others; where IK has been digitised and made available online. In African countries, digitisation of IK projects is evident in countries such as South Africa, Uganda, Nigeria, Ghana, among others (Plockey, 2014; Biyela, Oyelude and Haumba 2016; Jain and Jibril, 2016). For instance, South Africa engages in the digitisation and hosting of national heritage resources available through a web-based portal known as ‘National Heritage Repository’ which is accessible on http://digi.nrf.ac.za/ (Biyela, Oyelude and Haumba, 2016).
To promote the preservation of IK, the South African government came up with a policy framework in 2004. The National Indigenous Knowledge Systems Policy was adopted by the South African government in 2004 and laid the platform for the recognition, affirmation, development, and protection of IKS in South Africa (Pretorius and Bezuidenhout, 2011; DST, 2004). The 2004 policy framework led to the National Recordal System (NRS) project which is an initiative of the Department of Science and Technology (DST) to coordinate and standardise the capturing, storing, maintenance, and dissemination of science and technology-related data on IKS in South Africa. The NRS project paved way for the establishment of the IKS Documentation Centres (IKSDCs) for collection, documentation, storage, and dissemination of IK and related activities. The IKSDCs are in academic institutions across all the provinces in South Africa. The KwaZulu-Natal Indigenous Knowledge Systems Documentation Centre was established in 2012. It is hosted by the University of KwaZulu-Natal and it is considered the major hub in South Africa. Through the NRS project, some of these digitized IKS are made available on http://iks.ukzn.ac.za/metadata and there are plans to also make them available in the national digital library known as the National Indigenous Knowledge Management System (NIKMAS) available at https://nrs.dst.gov.za/. The Digital Innovation South Africa (DISA) project supported by the Carnegie Corporation and South Africa’s National Research Foundation (NRF) is carried out in academic institutions to digitise archival, heritage, historical, and institutional materials across several academic institutions in South Africa. However, the NRS by the DST is an initiative specifically for the capture and digitization of IKS in South Africa. In 2014, the Department of Science and Technology (DST) promulgated an additional IKS Bill which aims at further improving the management of IK in South Africa.
Issues relating to the obsolescence of older technologies and formats, human error, and copyright render digital archiving static and just as imprecise and flawed as non-digital preservation (Perry, 2014). The need to create digital copies of materials online to ensure their long-term preservation makes it necessary to consider web archiving to preserve the content. Web archiving helps to avoid alteration or loss of IK contents on the web over time. Web archiving is the process of collecting data recorded on the web for storage and ensuring that data is preserved in an archive and made available for future research (Niu, 2012).
Statement of the problem
The main problem that prompted this study is that while there is consensus on the importance of digitising IK in South Africa, yet, ensuring long-term digital material preservation is not well studied (Zuraidah and Aliza, 2010; Kalusopa, 2018; Ngoepe, 2017). Most studies (e.g., Perry, 2014; Mutula, 2014; Le Roux, 2015; Adu and Ngulube 2017) cite digital preservation challenges, including obsolescence of older technology and format, human error, copyright issues, authenticity and reliability of materials, and errors or data loss, but the underlining threats to archival provenance of integrity, authenticity, and reliability of IK digital materials are not well established.
Also, many other studies focused on the use of technology to manage IK, such as digitisation of IKS (e.g., Bishop, 2019; Bourget, 2020; Kapepiso, 2018; Durcan, 2016; Nakarta et al., 2014); digital preservation of IK (e.g., Amunkete, 2020; Zuraidah and Aliza, 2010; Mutula, 2014; Biyela, Oyelude and Haumba 2016); and web archiving initiatives in academic institutions (Bailey et al., 2017; Hendry and Stock, 2014; Khumalo and Nkala, 2015; Niu, 2012). Very few studies were found on the digital preservation of IKS in Africa. No study was found on web archiving of IKS in Africa.
This paper, therefore, sought to develop a framework for web archiving of IKS in institutional repositories in South Africa. This framework will enable institutions to adopt and implement practical plans for the long-term digital preservation of IKS materials online to ensure that South Africa has long-term sustainable digital preservation plans and policies so that heritage materials can be available and accessible in the far foreseeable future. This paper also fills the gap in the literature on web archiving of Indigenous Knowledge Systems and heritage materials in Africa.
Literature review
It is important to take urgent measures to preserve oral knowledge due to great risks of misappropriation and loss (Poorna et al., 2014). According to Plockey (2014: 21), “African nations are going through many changes resulting in a loss of traditional, cultural and customary knowledge as a result of a lack of the preservation and digitization of African Indigenous Knowledge (AIK)”. Several studies have been carried out on the digitisation of IK in Africa and the rest of the world (Nakarta et al., 2014; Plockey, 2014; Akinwale, 2013; Owiny, Mehta and Maretzki, 2014; Bishop, 2019; Bourget, 2020). These studies have all affirmed the importance of preserving IK with the use of technology such as access (Christian, 2009), preservation (Sraku-Lartey, Acquah and Djagbletey, 2016), or both (Swanepoel, 2008). Chisenga (2002) specifically emphasized why this is very important for African countries. Owiny, Mehta and Maretzki (2014) also proposed the use of social media technologies to create, preserve, and disseminate IK and skills to communities in East Africa. Digitization of IK is also an effective tool for defensive protection from bio-piracy as well as reduction in the misappropriation of IK without compensation by multinational entities (Christian, 2009: 11; Nakata and Langton, 2006: 48). Several examples exist on the bio-piracy of IK by multinational entities in India, South America, and South Africa (Avantika et al., 2015: 80; Bhattacharya, 2014: 50). Documentation and digitization of medicinal knowledge have also gained prominence in several countries in the fight against bio-piracy (Poorna, Mymoon and Hariharan (2014). However, IK also involves various sectors such as agriculture, environment, architecture, culture, heritage, etc. that are interlinked and applied to daily living. All this knowledge needs to be preserved to achieve inclusive development (Poorna et al, 2014).
There are digitization initiatives within and outside Africa namely: the Traditional Knowledge Digital Library (http://www.tkdl.res.in), Korean Traditional Knowledge Portal (http://www.koreantk.com), Chinese Traditional Medicine Database System (http://www.megabionet.org), Seni Tradisi Indonesia (https://www.piknikdong.com), Smithsonian Centre for Folklife and Cultural Heritage (https://folklife.si.edu/), African Indigenous Science and Knowledge Systems (http://africahistory.net/), Elimu Asilia (Kenya’s Indigenous Knowledge Online) (http://www.elimuasilia.org/), the Ulwazi Program of Durban (www.ulwazi.org), Digital Innovative South Africa (DISA) (http://iks.ukzn.ac.za/metadata), among others.
Chisenga (2002) argues that harnessing, repackaging, and providing access to Africa’s IKS using the World Wide Web infrastructure will present the people of Africa with an opportunity to make a major contribution to the development of the information contained on the web. This will also ensure that information consumers in Africa will have access to information content produced on the continent. In Australia, the provision of online engagement with Indigenous people’s knowledge is being attempted, but it is happening haphazardly due to a range of constraints (Nakata et al. 2008). These online engagements range from restricted online databases to stand-alone databases with community-only access, to public webpages of varying standards (Scott 2004).
Digitization is sometimes presented as a panacea for problems of preservation and access. However, access to digitized collections and their preservation, especially in the longer term, may be problematic. The problems are not only technological but also economic, political, legal, and moral (Britz and Lor, 2004; Pickover, 2008). According to Pickover (2008), the digitization of heritage materials in South Africa to make them available on the Internet is considered a big challenge due to social and political factors. Therefore, despite the advantages of digitising materials for preservation and access, the long-term preservation of these materials comes with issues such as technological obsolescence, lack of awareness, financial sustainability, policies, legislation, politics, security, and privacy (Adu and Ngulube 2017; Biyela, Oyelude and Haumba, 2016). According to Hockx-Yu (2006: 4), digital preservation is considered as a “complex process and there are many unsolved organizational, managerial and technical issues that make digital preservation a challenging task for those managing institutional repositories”. Digital preservation is essential in ensuring that cultural heritage and democratic history are safeguarded for the longest time possible or even forever.
Lack of policy and digital preservation frameworks is a major issue with digital preservation, especially in Africa (Kalusopa and Zulu, 2009; Mutula, 2014; Gbaje and Mohammed, 2013). According to, Mutula (2014), several challenges such as weak policy and regulatory frameworks, limited capacity, inadequate government support, and limited connectivity and bandwidth are hampering digital heritage preservation management in Africa. The lack of standards and policies are seen at both national and institutional levels (Kalusopa and Zulu, 2009; Keokapa, 2010; Ngulube, 2012; Masenya and Ngulube, 2019). Pennock (2008) posits that issues regarding the development of a sustainable digital preservation policy are unexplored to a large extent. In cases where policies do exist, issues regarding implementation are also seen as a major challenge (Adu, 2016).
Data management and storage is also a very important aspect of the preservation of digital materials. It is important to have proper storage facilities to ensure that there is a reliable backup facility in place. In recent times, there is an increase in the use of cloud computing and other digital solutions for data storage (Inglesias, 2011; Prom, O’Meara and Stratton, 2016). Cloud computing is considered an important tool that can meet the storage and data management challenges in digital preservation and maintenance of reliable records and the preservation of their authenticity over time (Duranti, 2010; Hellmer, 2015; Rogers, 2015). It is common to consider the records, documents, and information that we create and disseminate over the Internet as being equivalent to documentary forms in the physical world. This assumption of functional equivalence of digital and analogue documents and data and the authenticity and trustworthiness of these new digital creations are often judged by the same standards (Rogers, 2015). This current era where information is born and preserved digitally requires a high standard for the guarantee of the authenticity and integrity of digital information (Hellmer, 2015).
The relatively short time frame within which web contents can be retained has been a driving force for web archiving initiatives (Pennock, 2013; Bailey et al., 2017; Hendry and Stock, 2014). Web archiving is “any form of deliberate and purposive preserving of web material” (Brügger, 2011: 25). According to Kalusopa (2018: 155), Web archiving “is the process of collecting websites and the information that they contain from the World Wide Web, and preserving these in an archive” (Kalusopa, 2018: 155). The web archiving technology “enables the capture, preservation, and reproduction of valuable content from the live web in an archival setting, so that it can be independently managed and preserved for future generations” (Pennock, 2013: 1).
Studies have reported the involvement of academic institutions in web archiving initiatives such as the Queens University Canada (Heil and Jin, 2017), Wake Forest University (Fansler et al., 2014), University of Toronto (Milligan, 2016), State and University Library Denmark (Sutton, 2004), Drexel University, Slippery Rock University, and the University of Scranton (Antracoli et al., 2014). This is contrary to the findings of Khumalo and Nkala (2015), which revealed that, apart from the lack of awareness of the respondents about web archiving, African institutions are not involved in web archiving initiatives to harvest their institutional repositories for future use. Some Web archiving initiatives around the world are as follows: Internet Archive (https://archive.org/), Library of Congress Web Archives (https://www.loc.gov/webarchiving/collections.html), Netarkivet (Danish web archiving project) (http://netarkivet.dk/), Pandora Archive – National Library of Australia (https://pandora.nla.gov.au/), the UK Web Archive (https://www.webarchive.org.uk/ukwa/), Croatian Web Archive (Hrvatski arhiv weba - HAW) (http://haw.nsk.hr/en), Central State Electronic Archives of Ukraine (http://www.archives.gov.ua), Bibliotheca Alexandrina (Egypt) (https://www.bibalex.org/isis/frontend/archive/archive_web.aspx).
Research objectives
The main aim of the paper is to highlight the digitization of IKS in institutional repositories in South Africa with a view to develop a framework for Web archiving IKS related websites in South Africa. The following objective shaped the study: To highlight the digitization of IKS in South Africa To identify the digital preservation policies of IKS To highlight the data management of digitized IKS To assess the Web archiving initiatives in the repositories
Methodology
The qualitative research approach was used to explore the objectives of the study. The Multiple Case Study design was employed to assess the digital preservation practices of IKS in academic institutional repositories in different academic institutions to understand the differences and similarities in their digital preservation practice and policies. The IKSDCs covered in this study are the DST-NRF Centre in Indigenous Knowledge Systems (CIKS) at the University of KwaZulu-Natal, AgriFood Technology Station at the Cape Peninsula University of Technology, DST-NRF Digitization Centre at the University of Witwatersrand, and the IKS Centre at the University Zululand. The study population consisted of IKS Managers/Coordinators, Digitisation Officers, and Online Collection Administrators in IKSDCs in different academic institutions across four (4) provinces in South Africa. The data for the study was collected through comprehensive face-to-face interviews and content analysis. Qualitative data was generated in the study through interviews and content analysis. The data collected through the interview was tape-recorded with the participants and these were transcribed into texts. These were integrated with the data collected through content analysis. The content analysis involved the analysis of policy documents and web contents of the selected institutions. Thematic analysis shaped the presentation of empirical data and aided the process of developing a framework.
Findings and discussions
The findings and discussions are presented below. The findings are based on interviews conducted with eight (8) key staff at the IKSDCs in four (4) academic institutions spread across different provinces in South Africa. The IKSDCs covered in this study are the DST-NRF Centre in Indigenous Knowledge Systems (CIKS) at the University of KwaZulu-Natal, AgriFood Technology Station at the Cape Peninsula University of Technology, DST-NRF Digitization Centre at the University of Witwatersrand, and the IKS Centre at the University Zululand. The interviewees were named RS12, RS22, RS32, RS42, RS52, RS62, RS72, and RS82 to ensure anonymity. The institutions were also named IN01, IN02, IN03, and IN04 for anonymity in the data analysis.
Digitisation of IKS
To understand the digitization efforts at the IKSDCs, the respondents were asked questions on the methods on how the IK data are captured. Several authors have stressed the importance of IK data capture for preservation (Plockey, 2014; Nakata and Langton, 2005). Digitization has been considered as the major way of using technology to preserve IK (Christian, 2009; Sraku-Lartey et al., 2016; Charkravarty and Mahajan, 2010). The result of this study is similar to the other findings mentioned above. For example, RS42 said that “Intangible and tangible cultural artifacts are converted into computer-readable format using technological hardware and software. They are described so that they will be meaningful to the user community (metadata added) and thereafter uploaded in an organized manner in the institutional repository. Basically, analogue materials are scanned and integrated with the born-digital materials.”
Some of the techniques for the preservation of IK include documentation, digitization, video/tape recording, microfilming, purchase of indigenous materials, and the provision of Internet access (Okore et al., 2009; Anyira et al, 2010). Nakarta and Langton (2005) also noted that there have been major efforts in the preservation of IK in accessible forms by recording and documenting traditional knowledge, enabling the retrieval of knowledge in memory and current practice, identifying, and retrieving previously documented knowledge. This study revealed the same result as studies cited above, that data is captured by IK Recorders (field workers) through audio/video recorders, digital cameras, data mining/purchase of grey IK materials, and scanning of print IK materials.
However, the study also revealed that certain procedures are followed before data is captured at the community level from the knowledge holders. The Non-Disclosure Agreement (NDA), Prior Informed Consent (PIC), Catalogue, and Harvest forms are completed. The respondents asserted that these forms are important to avoid Intellectual Property rights and copyright issues in the future. The IK holders give their permission for the capture, preservation, and dissemination of the IK within specific agreements. The harvesting form is one of the most important in the data capture stage because it contains in-depth information about the IK data collected from each IK holder. The captured data are then transferred to the IK recorder laptops by the IK Coordinators through synchronization. It is therefore evident that IK is captured with the use of modern technology through documentation and digitization in the IKSDCs in academic institutions.
Digital preservation policies
The study revealed that the institutions do not have policies in place which guide the digital preservation of IKS. The respondents were also unaware of any policy that deals with the digital preservation of IKS. RS32 and RS42, however, admitted to being aware of the National Policy on Digitization.
According to RS32: “Policies are being drafted at the university level for digital preservation. The policy is being drafted and is at the early stage so they are not currently being implemented. Apart from the policies being drafted, there is a national policy that concerns digital preservation, but I am not sure if it is actively implemented.”
The findings are the same as with other research studies on the lack of required standards and policies in managing digital materials in Africa (Gbaje and Mohammed, 2013; Kalusopa and Zulu, 2009; Mutula, 2014; Masenya and Ngulube, 2019). The lack of standards and policies is seen both at the national and institutional levels (Kalusopa and Zulu, 2009; Ngulube, 2012; Masenya and Ngulube, 2019). Studies by Kalusopa and Zulu (2009) and Gbaje and Mohammed (2013) revealed in their studies that there was a lack of policies and specific guidelines on the preservation of electronic records in most African institutions and organizations. This finding also reaffirms the assertion of Pennock (2008) that issues regarding developing a sustainable digital preservation policy are unexplored to a large extent. The absence of established standards, policies, and procedures are some of the main issues faced in the digital preservation of digital resources in some academic institutions in South Africa (Masenya and Ngulube, 2019).
However, there is a National Policy on Digitization of Heritage Resources which most of the respondents were unaware of, and this is not being adopted or implemented in the management of the digitized IK. This agrees with the findings that although few people might be seen as knowledgeable or aware of certain national policies regarding digital preservation, the implementation of these policies is seen as a major challenge (Adu, 2016).
The study also revealed that since there were no specific policies in place for the digital preservation of IK, the respondents agreed that nothing was being implemented. RS82 and RS32 however feel that since the IK collection was part of the national project, there was no need to bother about having a digital preservation policy at the institutional level. Only RS42 acknowledged the importance of having a digital preservation policy at the institutions and mentioned that the institution was working on having its policy that would be in line with the national policy on digitization. RS62, who claimed they adopted the national policy on digitization, was asked further questions on the policy and implementation of this policy but was not able to give specific details on the policy and its implementation.
Data management and storage
The study revealed that when IK data are collected by the field IK recorders, they are checked by the Community Coordinators before being transferred to the IK Coordinators who upload them on the National Indigenous Knowledge Management System (NIKMAS), accessible through https://nrs.dst.gov.za/. The Department of Science and Technology (DST) who controls the NIKMAS system is widely believed to be burdened with the management of the IK digital materials and not the academic institutions. RS12, RS32, RS72, and RS82 all noted that the field workers are equipped with laptops, digital cameras, and recorders which are used to capture data which are later synchronized with the IK Coordinator’s laptop before being uploaded on the NIKMAS system. RS32 noted that they do not use hard drives or other forms of storage devices to back up the data. Data is transferred through direct synchronization from computer to computer. One reason given for this is to safeguard the data against interference. RS72 affirmed this and added that they do not have any form of backup or archival measures in place. Apart from the IK data collected from the primary source, which are uploaded on the NIKMAS system, IN01 collects other types of IK that are stored on hard drives before being uploaded online. The study revealed that not much has been done in the area of ensuring proper storage of IK collection. This reinforced the findings of Adu (2016) that certain organizations failed to take advantage of having proper storage devices to extend the life of digital materials. Having proper storage facilities in place helps to ensure that there is a reliable backup facility in place. However, one of the respondents, RS42, affirmed that their data is stored on an open-sourced digital preservation platform known as Archivematica. This is like other studies that reveal the use of cloud computing and other digital solutions for data storage (Inglesias, 2011; Prom, O’Meara and Stratton, 2016). Cloud computing is seen as a vital tool that can be used in meeting storage and data management challenges in digital preservation.
Web archiving in the repositories
There is a relatively short timeframe within which the contents on a website can be retained and this has been a driving force for web archiving initiatives, particularly in the cultural heritage community (Pennock, 2013). Studies have reported the involvement of academic institutions in Web archiving initiatives such as Queens University Canada (Heil and Jin, 2017), Wake Forrest University (Fansler et al., 2014), University of Toronto (Milligan, 2016), State and University Library Denmark (Sutton, 2004), Drexel University, Slippery Rock University and the University of Scranton (Antracoli et al., 2014). This study is, however, in contrast to these findings because none of the institutions are actively involved in Web archiving initiatives. All the respondents were unaware of the concept of Web archiving or its importance to the preservation of their collection development process. For instance, RS12 said that “for now we are not in that space yet because we are fairly new.” But the respondent also noted that ideally, they should have a plan for that but it is not there. RS82 in a similar response said that they do not have anything like that yet because they are still in the planning phase. Also, RS32 said that “it is archived but that is internal archiving and not in terms of web archiving. That might be happening, but I do not know. I am not aware of anything like that.” However, respondents RS22, RS42, RS52, and RS62 said they do not have any idea of such an effort, and they said they do not know much about what Web archiving means or what it entails.
This is in line with the findings of Khumalo and Nkala (2015) who revealed that apart from the lack of awareness of the respondents to their study on the concept of Web archiving, institutions in Africa are not involved in Web archiving initiatives to harvest their institutional repositories for future use.
Although the respondents were unaware of Web archiving and even asserted that their institutions were not actively involved in Web archiving initiatives, an assessment of the institutional websites and institutional repositories on Internet Archive’s Wayback Machine revealed that the contents of these institutions’ websites and portions of the institutional repositories are being harvested and stored on the Wayback Machine which is accessible for free. It is important to understand that a website can be harvested and archived on the Wayback Machine without the knowledge or prior approval of the web owners. These methods are known as the opt-in (Glanville, 2010) and opt-out (Slania, 2013; Grotke, 2011) styles of harvesting and have generated concerns and debates over the copyright and permission issues (Davis, 2014) related to the harvesting of the contents of a website without proper permission.
The opt-in style is popularly used in large research institutions in Australia (Glanville, 2010) and this method requires permission before a website is being harvested. Any institution that does not give permission or respond to the request to harvest is either excluded or harvested without being made public. The other style, which is the opt-out style used in the Netherlands, requires that web archivists send the rights holders a message to explain their intention to harvest which takes place after a specific deadline (Slania, 2013). The Internet Archive uses this style and this explains why a website can be harvested and made available on the Wayback Machine without the approval or knowledge of the website owners. The Internet Archive adopts the full opt-out style which is a method of harvesting websites without any form of notice but gives the website owners the option of requesting the removal of such sites from the collection when contacted by the rights owners.
While some websites are easy to harvest by the Wayback Machine crawlers, websites that are password protected using certain elements of javascript, or websites blocked by robots.txt extensions are usually more difficult to crawl harvest by the Wayback Machine. The study revealed that institutional repositories are less frequently crawled than the main websites. The number of times a website is crawled also does not reflect the actual number of times that the website was updated, and this poses another problem if an institution is not directly involved in the harvesting of its website. This is another major issue apart from the copyright problem. An assessment of the institution repositories’ access websites on the Wayback Machine revealed that the content of the IN01 IK repository is available on the Wayback Machine but most of the links are either broken or inaccessible. Repositories IN02 and IN04 are also available on the Wayback Machine but with limited features due to broken links. The institutions need to make a conscious effort to get information about the importance of Web archiving and take steps to harvest the contents of their repositories to integrate them into the institutional collection for future use. An in-depth understanding of the technicalities, methods, and tools involved in the process of archiving a website will be important in understanding the aim of any Web archiving project or initiative and this will also help put it in context.
Proposed framework
This article, based on empirical research on the digitization and digital preservation of IKS in repositories in South Africa, provides a framework for Web archiving of IKS in institutional repositories in South Africa. Figure 1 is a depiction of the proposed Web archiving framework.
The proposed framework for the Web archiving of IKS provides the basic structure that can help in the preservation of the digitized IKS on the web. The proposed framework also enhances the collaboration between the academic institutions and the DST in the digitization and the digital preservation of IKS in the institutional repositories to support the long-term preservation of digitized IKS in South Africa. The rationale for proposing this framework is based on the findings that no digital preservation policies are guiding the IKS digitization project in the institutions and there are currently no solid plans or policy that supports Web archiving. The proposed framework, therefore, stands as a blueprint for Web archiving in African institutions.
Explanation of the framework
The proposed framework is built upon an existing body of knowledge on the digital preservation of IKS. This framework is proposed as a part of a well detailed digital preservation policy framework for IKS. The application of this proposed framework should be guided by the digital preservation policy.
The proposed framework draws heavily from the Web Archiving Life Cycle Model and the Open Archival Information System (OAIS) Reference Model. The proposed framework also covers the integration of the digitized IKS and the archived website into one single trusted digital repository where these materials can be accessed.
The components in the framework are Web Archiving Policy, Web Archive, IKS National Digital Library, and Data Integration.

Proposed Framework for Web archiving.
Web archiving policy
The study revealed that there are no policies in place guiding the digitization of IKS in these institutions. Policy documents such as the National Policy on Digitisation of Heritage Resources (2013), Managing Digital Collections: A Collaborative Initiative on the South African Framework (2010), and the National Archives and Records Service Digitisation Strategy (2013) were reviewed through content analysis. However, none of these policy documents specifically covers the digitization of IKS or preservation of the digitized IKS. This is contrary to the International Records Management Trust (IRMT) 1999 information management framework which emphasizes the importance of policy procedures, structures, systems, and the need for a long-term strategic plan for digital preservation. The IKS digitization policy should be built upon the National Policy on Digitisation of Heritage Resources framework, but it has to include detailed instructions on certain issues such as the metadata standards that should be adopted in all the institutions involved in the project.
The policy at the national level should also be adopted at the institutional levels and it should make provisions for issues related to format, storage, and backup, refreshing and emulation, access, archival description, responsibilities, disaster planning, long-term technology usage, contingency plans, funding, ownership, intellectual property rights and copyright, implementation of the policy, among other relevant subjects. As part of the IK Digitization Policy, access is a key issue that should be clearly outlined.
The IKS Management should include a framework on Web archiving of the digitized IKS which is accessible online. The Web archiving policy part of the proposed framework represented in Figure 1 should cover the mission statement and scope, selection, acquisition, descriptive metadata, presentation and access, maintenance, staffing and training, and collaboration.
Mission statement and scope
The Web archiving policy should clearly outline the mission statement of the Web archiving project and its scope for it to guide the vision of the Web archiving project. The target user groups, and the collection theme should also be clearly outlined.
Selection
The Web archiving policy should clearly outline what the archivists should select for the Web archive. The criteria for selection should also be very clear. Selection is a key component of archiving and this should also apply to Web archiving. The selection strategy for Web archiving should be clearly outlined. The dedicated IKS websites should adopt the bulk archiving method while the institutional repositories with other academic materials can adopt the selective approach to archiving the digitized IKS in the institutional repositories.
Acquisition
This part of the Web archiving policy should cover the capture scope, frequency of capture, material types and format, and issues related to rights. The capture scope should cover what needs to be captured, and at this stage decisions on the capture of sensitive and non-sensitive data should be made. The frequency of capture is also very important, and it is recommended that the websites should be captured as frequently as they are being updated. With regards to the material type and formats, the format of the captured websites should be retained as much as possible.
On the issue of rights management, there is a need to determine the group(s) of people who can have permission to use the Web archive. The issues of copyright and ownership should be clarified. Due to the legal problems associated with Web archiving, it is recommended that the Department of Science and Technology (DST) should ensure that it trains its staff and other people working with the NRS project in the IKSDCs to acquire Web archiving skills to be able to develop the Web archiving system and harvest the websites within South Africa. This would help to control and mitigate some of the unforeseen copyright and ownership issues that might arise in the future. However, if organizations like the Internet Archive are consulted for Web archiving, the issues of copyright and ownership should be clarified. Also, if the Web archiving project is being funded by international organizations, multinationals, individuals, or corporations, these issues need to be clearly defined in order not to lose ownership of valuable IKS materials to the western world in the future.
Descriptive metadata
The descriptive metadata proposed by OCLC’s Research Library Partnership Web Archiving Metadata Working Group should be adopted. The Dublin Core metadata has 15 core elements split into three (3) main groups which are the Content, Intellectual Property, and Instantiation. The Content part of the metadata includes title, description, subject (keywords), type, relations, source, and coverage. The Intellectual Property part covers the creator, contributor, publisher, and rights. Instantiation on the other hand covers areas like date, format, identifier, and language.
Presentation and access
The Web archiving policy should ensure that the crawled website is identical to the original website as much as possible, except for materials that might be excluded due to copyright concerns. For example, the Web archives might disable a link or use robot.txt to exclude certain information from being captured in the archived web due to copyright protection or privacy concerns. This part of the policy should also include measures to ensure the authenticity of the crawled websites in the Web archives. Access is a major component represented in both the OAIS and the Web Archiving Life Cycle Models. This is mainly because access is a key part of any digital preservation initiative. In the Web Archiving Life Cycle Model, access/use/reuse falls under the Policy circle and this involves decision-making about whether and how access to the collection will be provided and the monitoring of the use of the content by the patrons (Bragg and Hanna, 2013). The study established that access to digital materials is currently limited, to a large extent.
Maintenance
The maintenance section of the Web archiving policy should outline issues with the maintenance of the Web archives. If the digitization project is on-going and the institutions offer access online, there should be a policy to cover the sourcing for new content to ingest in the Web archive.
There is also a need to revisit the harvested websites to ensure continuous and periodic maintenance of the harvested contents. This should be done periodically, preferably quarterly. This periodic maintenance would offer an opportunity to evaluate the Web archiving project and improve on certain areas as required.
Staffing and training
The Web archiving policy should cover the issue of staffing and training. There is a need to employ Digital Archivists dedicated to the Web archiving project who would be further trained on the job to acquire Web archiving skills and be up to date on technological advancements or current trends in the field of digital preservation and Web archiving. These Digital Archivists should work closely with the selected ICT staff across these institutions to achieve the national Web archiving goal.
Collaboration
Collaboration is very important in Web archiving. Studies have been able to prove the success of Web archiving initiatives through collaborations. For example, the International Internet Preservation Consortium (IIPC) and The National Digital Stewardship Alliance (NDSA) have played key roles in international collaboration. Several Web archiving tools such as Hendrix, WARC, BAT, WERA, NutchWAX, Xinq, and DeepArc are products of collaborative Web archiving efforts.
The collaborative effort between Drexel University, Slippery Rock University, and the University of Scranton is also a good example of how academic institutions have been able to embark on successful Web archiving initiatives through collaboration (Antracoli et al., 2014). The New York Art Consortium (NYARC) is also able to collaborate to collect, preserve and provide access to “art ephemera born in digital formats native to the web” Duncan and Blumenthal (2016: 116).
It is therefore important for the academic institutions and the DST to collaborate on the Web archiving project to ensure they achieve a national goal.
Web archive
The Web archive in Figure 1 represents the repository where the archived websites should be stored. This is the end product of a process that is built on the Web archiving policy. The processes recommended in the Web archiving policy outlined would lead to the data capture and storage of the crawled web pages in the Web archive. The repository for the archived web would be the main storage for the harvested web contents and it should be informed by the Web archiving policy.
The Data Capture component of the Web Archiving Life Cycle Model includes decisions on how to capture or crawl data, the frequency, and the types of files that should be archived. The study revealed that the institutions are not aware of the harvesting of their websites on the Internet Archive’s Wayback Machine, and the frequency of capture is not in any way related to the number of times the sites were updated. This is because the frequency of capture by the Internet Archive does not reflect the actual times the websites were being updated. The involvement of the institutions in their Web archiving initiatives would enable the institutions to decide on what to archive on the IKS websites and the frequency of the data capture. It is recommended that based on this understanding, data should be captured according to the website’s update to maintain an up-to-date digital archive. The websites should be crawled multiple times a day if important data are uploaded or updated multiple times in a day. The decisions regarding the data capture should be included in the Web archiving policy.
Data integration
Figure 1 also shows the integration of the IKS Web archive with the IKS national digital library to enable access from a single access point. The integration of the Web archive with the IKS national digital library is a very important component of the framework because there is a need for the archived web to be incorporated with the IKS national digital library for interested researchers.
As represented in Figure 1, the Web archiving policy helps define the mission and scope, selection, acquisition, descriptive metadata, preservation and access, maintenance, and other important components that would make it possible to have a standardized IKS Web archive which is further integrated with the IKS national digital library. The Web archiving process should start with the policy which clarifies the selection of materials and the frequency of the crawl. The Access Point, which is the database-backed websites, is therefore crawled based on the policy and then stored/organized in the digital archive where issues such as metadata and other related data management challenges are handled before being sent to the archival storage where it will be stored and integrated with other digital collections available for research use.
IKS National Digital Library
To ensure that this IKS national digital library is OAIS compliant, there is a need for some OAIS reference model components such as Preservation/Policy, Archival storage, Storage and organization, and Data management to be applied.
The study established that the institutions do not have digital preservation policies in place to guide the digital preservation of IKS. This also extends to the lack of policy that considers Web archiving as a preservation option for the web contents. This component, therefore, recommends a policy for the preservation of digital materials which also incorporates a policy on the Web archiving of IKS websites. The issues related to access, use, and reuse of the digital materials and the archived web should be identified in such a policy. Web archiving methods should be outlined. A policy specific to the need of the digitized IKS is required.
The IKS national digital library in the figure represents part of the Archival Storage function. The data collected from the Source and ingested in the repositories should be stored in the IKS database. This database should also include the harvested/archived web which should be integrated with the digital collection in the database for long-term preservation and research use in the future. This process might not be as simple as it sounds because a lot of work has to be done to ensure that Web archives and digital collections are well integrated for them to be accessible for future use.
It is therefore recommended that the institution should make long-term plans for archived data as part of their preservation activities which should be spelt out in the preservation policy.
Conclusions
The study revealed that the South African government has invested a lot in the digitization of IKS through IKSDCs across different provinces in South Africa. The digitized IKS are mostly managed by the National Indigenous Knowledge Systems Office (NIKSO) and the NIKMAS is built for storage and to create access to the digitized IKS while the IKSDC hub at the University of KwaZulu-Natal (UKZN) also offers a form of access through its IKS dedicated website hosted on the UKZN server. However, the study revealed that there are currently no specific digital preservation policies guiding the digitization projects at the IKSDCs. The possibility of considering the Web archiving of the websites is also not being considered currently. Therefore, apart from recommending a digital preservation policy, this study proposed a Web archiving framework which is recommended as a part of the digital preservation policy in the IKS repositories in South Africa. Since the South African government has invested in the digitization of IKS and ensuring access online through the NIKMAS, the framework is important in ensuring that the IKS related websites and their contents are preserved for posterity.
