Abstract
The program “Digitization of Old Chinese Bibles,” likely the largest digitization program for Chinese Bibles ever undertaken, began in August 2014 under the auspices of the Digital Bible Library (DBL), an initiative of the United Bible Societies with the aim of gathering, validating, and safeguarding Scripture texts and publication assets (https://thedigitalbiblelibrary.org/home/). The completion of Phase I in April 2016 also marked the launch of Phase II of the program. By the time the present article is published, a majority of twenty-two Chinese Bibles (full or New Testament) will have been full-text digitized and uploaded to DBL for wider distribution. The final goal of the digitization program is to digitize all thirty-three extant complete Chinese New Testaments or full Bibles—whether in Wenli (classical) Chinese or Mandarin Chinese—published prior to the 1950s. The purpose of the article is to report on this program, what it entails, and the challenges it faces.
General introduction
The history of Chinese Bible translation can be traced back as early as the eighth century (mentioned in the Nestorian Stone, 781 C.E.). The oldest extant Chinese Bible is the incomplete New Testament of Fr. Jean Basset in 1700, that is, more than a century older than the first complete Protestant Chinese Bible by Johannes Lassar (Hovhannes Ghazarian) and Joshua Marshman (1822). There have been as many as eighty Chinese Bibles or parts of Bibles—whether in Wenli (literary/classical) Chinese or in Mandarin Chinese—of all Christian confessions published in the last three centuries. Of this collection, about forty-two are complete (or nearly so) New Testaments or Bibles, with the majority (about thirty-three) published prior to the 1950s. Unfortunately, most of these texts are known only by name and remain difficult for the general public to access. Even for a reputable text such as the translation by William Milne and Robert Morrison (1823), there is no electronic text available.
Thanks to the funding made available by the Digital Bible Library (DBL, https://thedigitalbiblelibrary.org/home/), the current program was launched in 2014 with the aim of full-text digitization of these old Chinese Bible texts. An advisory group of experts in the history of Chinese Bible translation was formed for this project, consisting of Dr. George Mak (Research Assistant Professor, Hong Kong Baptist University), Dr. Chun Li (Academic Dean, Bethel Theological College, Hong Kong), and Dr. Kenny Wang (Lecturer in Linguistics and Translation Studies, University of Western Sydney). These scholars provided invaluable help in prioritizing the texts to be digitized.
Six Chinese texts have been completed and were submitted to the DBL in April 2016, which also marked the beginning of Phase II of the project. The list of Bibles (twenty-two in total) included in these two phases is as follows (Protestant unless otherwise indicated):
Phase I (a total of 3,733,519 characters): Peking Version—Mandarin (1878) Orthodox—Wenli (NT-Psalms, Russian Orthodox; 1910) Chinese Union High Wenli (1919) Chinese Union Easy Wenli (NT; 1902) Chinese Union Mandarin (1919) WANG Yuen-det—Mandarin (NT; 1933)
Phase II (an estimated 8,000,000 characters): Jean Basset—Wenli (NT incomplete, Catholic; 1707) Louis de Poirot—Mandarin (Bible incomplete, with Deuterocanonical books, Catholic; 1803)—pending (subject to funding) Johannes Lassar and Joshua Marshman Version—Wenli (1822) William Milne and Robert Morrison Version—Wenli (1823) Delegates’ Version—Wenli (1852) Elijah C. Bridgman and Michael S. Culbertson Version—Wenli (NT, 1859; part of OT, 1862) Nanking Version—Mandarin (NT; 1857) Griffith John Version—Wenli (NT-Psalms; 1886) Griffith John Version—Mandarin (NT, 1892; Psalms, 1907) Samuel I. J. Schereschewsky—Wenli (1902) HSIAO Ching-shan—Mandarin (NT with study notes, Catholic; 1922) Absalom Sydenstricker and ZHU Baohui—Mandarin (NT; 1929) ZHU Baohui—Mandarin (NT with study notes; 1936) LI Shan-fu—Mandarin (NT, Catholic; 1949) Heinrich Ruck and Zheng Shoulin—Mandarin (NT; 1958) Theodore HSIAO—Mandarin (NT; 1959/1967)
For almost all these texts, this is the first time they appear in electronic format. It is a matter of convenience that most of the titles used in the list above are based on the names of the translators; they are not necessarily the official titles of the publications (which in most cases is simply “Bible” as in English). Every text being digitized is provided with an introduction (in Chinese) with a brief history of the translation and the digitization process involved. A few interesting highlights may suffice here:
Apart from the NT-Psalms translation (1949) by John Wu Ching-hsiung (1899–1986), the copyright of which is still in dispute, all Catholic Chinese translations prior to the 1950s are included in the digitization program, namely, Basset, Poirot, Hsiao Ching-shan, and Li Shan-fu. Jean Basset’s New Testament (though completed only up to the book of Hebrews) is the oldest Chinese translation extant. It is a well-known fact that Basset’s translation served as the basis for the Milne–Morrison translation of the New Testament (first published in 1810), and as a major reference for the Lassar–Marshman translation as well. With these texts digitized, we now can tell the actual extent to which Basset was being used.
The translation by the Jesuit priest Louis Poirot is worthy of special attention (1803). Poirot’s translation, though incomplete, constitutes the largest Bible corpus (short of a few prophetic books) prior to the publication of the first complete Catholic Chinese Bible in 1968 (the so-called “Si-Guo” Bible). The value of Poirot’s translation, apart from its age, is that it was written in the “Peking vernacular” (Mandarin Chinese, more or less) instead of classical Chinese. For this reason, the text is still understandable to modern Chinese readers. It is of immense value from the perspectives both of the history of Chinese translation and of language research. Poirot’s translation was lost for many years and rediscovered only in 2011 at the Zikawei Library in Shanghai, People’s Republic of China (PRC).
For many people, the term “Union Version” refers to the authorized Mandarin Chinese Union Version published in 1919, whereas originally the term referred to the entire publication endeavour, endorsed by the General Conference of the Protestant Missionaries of China (Shanghai, May 7–20, 1890), to produce three Chinese translations (thus the motto “One Bible, three versions”), one for each Chinese language form which was in use at the turn of the nineteenth century in China: High-Wenli Chinese, Easy/Low-Wenli Chinese, and Mandarin Chinese. While all three texts are included in the digitization program, it is surprising that, given the widespread use of the Chinese Union Mandarin (CU-Mandarin, 1919) since its appearance and many alleged “CU-Mandarin 1919” texts made available on the Internet by enthusiasts, there is not yet an accurate electronic text of this translation available. The current project involves the further proofreading of one archive copy of the United Bible Societies (UBS) against the current CUV 1919 edition (printed edition 1983) and the images of the original CU-Mandarin 1919 autographed by Rev. Chauncey Goodrich, one of the translation committee members.
No translator in the history of Chinese Bible translation is more legendary and well remembered with epithets (“Prince of Bible translators,” “probably the greatest Bible translator China ever had”) as Bishop Samuel Isaac Joseph Schereschewsky (1831–1906). Born a Jew and trained as a rabbi but later converting to Christianity, Schereschewsky migrated to the USA, received Christian theological training, and was ordained there. Later, he also became the US Anglican bishop of Shanghai. Schereschewsky’s OT translation in Mandarin Chinese (included as part of the Peking Version) is the first OT translation drafted directly from the Hebrew language. Schereschewsky suffered a major stroke while working on the Easy Wenli translation; all but one finger had stopped moving. Eventually he had to type with a single finger from his wheelchair 2,000 pages of his draft in Romanized script for his assistant to transcribe back to Chinese characters. Apart from Chinese, Schereschewsky also contributed to Bible translation work in Mongolia.
The Rev. Wang Yuen-det (known as Wang Xuan-chen) from Shandong province was the Chinese scholar who advised Calvin Wilson Mateer, one of the translation committee members for the CU-Mandarin. In spite of the significant contribution of the Chinese literati in the making of this important version, they did not have the right to vote and could only whisper their opinions to their Western associates during the meeting. Upon the publication of CU-Mandarin, Wang started working on his own Chinese translation of the Bible. His New Testament was published in 1933. Although he claimed this to be his original translation, it was in essence a revision based on CU-Mandarin.
Zhu Baohui, a student of the Rev. Absalom Sydenstricker (1852–1931), produced the first Chinese study Bible (NT) with copious notes and cross-references. In addition to many tables and explanations of the meaning of many biblical names, Zhu also included 136 pages of thorough exegetical information on NT Greek keywords. It was indeed state-of-the-art original language scholarship in the Chinese context in those days. This study Bible was published in 1936 under the sponsorship of Sydenstricker’s daughter, Pearl S. Buck, the Nobel-winning author.
Characteristics of the translations
The adoption by the digitization project of the 1950s as a cut-off date is legitimate for good reasons. First, the intellectual property law in the PRC states that, in general, a publication copyright expires fifty years after the death of the author or fifty years after the first publication (information supplied by Inés Galliani, UBS Intellectual Property Coordinator). Apart from the New Testaments by Theodore Hsiao (1959) and H. Ruck–Zheng Shoulin (1958), all the translations included in our digitization program were published prior to 1950. We are on safe ground in assuming that all these translations are in the public domain. Most of the keying-in work was based on the scanned images generously provided by the rare Bible collection of Faith-Hope-Love (https://bible.fhl.net/ob/index.html).
Second, the 1950s saw the first language policy issued by the PRC government. This aimed at reducing illiteracy in the country through the promulgation of Mandarin Chinese as the national language and the standardization of simplified-script Chinese characters. Two documents, the first in 1956 and the second in 1964, standardized the simplification of the characters. All publications of the PRC published after the 1950s were therefore in simplified-script Chinese, in distinction to the traditional-script Chinese characters (or more appropriately called “standard script Chinese”) used prior to the 1950s. It is, however, a common misunderstanding that simplified-script Chinese came into existence only after the 1950s. In fact, character simplification certainly predates this period (see below) and is regularly found in these pre-1950 Bibles.
Most if not all publications prior to the 1950s were printed with vertical text direction, as is the case in all the old Chinese Bibles. It has often been said that the first printed Chinese text in horizontal alignment was Robert Morrison’s Dictionary of the Chinese Language, published in 1815–1823 in Macau. After 1949, the PRC decided to use horizontal writing. All newspapers in China changed from vertical to horizontal alignment on January 1, 1956.
Of the twenty-two Bibles included in the current program, twelve were written in Mandarin Chinese, the rest are in Wenli Chinese. In Wenli (classical) Chinese, most words are monosyllabic and there is a close correspondence between characters and words, whereas in Mandarin Chinese or modern Chinese, most Chinese words consist of two or more characters—a character often corresponds to a single syllable that may be considered as a morpheme. The largest corpus of modern Chinese words as listed in the most comprehensive (online) Chinese dictionary Hanyucidian (汉语辞典) has 370,000 words derived from around 80,000 characters, but most dictionaries have about 56,000 characters (the PRC government defines functional literacy as a knowledge of 2,000 characters). As might be expected, the full Bible in Wenli has comparably fewer characters than the Mandarin Bible has. Compare, for example, the million characters in CU-Mandarin 1919 and just over 820,000 characters in CU-High Wenli 1919. Wenli Chinese has been the literary standard of the Chinese language for centuries, although novels in written vernacular Chinese already existed in the seventeenth century. However, it was not until after the May Fourth Movement in 1919 and the promotion of vernacular Chinese by scholars and intellectuals, such as pragmatist reformer Hu Shih and leftist Lu Xun, that this Bai hua (plain vernacular) style gained widespread importance. The overwhelming success of CU-Mandarin 1919 was no accident, as it came out at the kairos of the literary situation in the country.
Procedures of digitization
The entire process of digitization involves inputting, proofreading, and ParaTExt checking. These three steps go hand in hand, but proofreading is by far the most painstaking and unending job.
Proofreading is often perceived as unskilled labour and is thus underpaid. It is a sedentary task and can easily drain one’s concentration within forty-five minutes. One proofreader reports that she almost passed out after working for one and a half hours. Apart from the lapse of concentration and physical fatigue, excessive head movement can be a detrimental result of proofreading. One proofreader had the practice of folding the draft printout for each vertical line to match with the corresponding line in the image text. This is likely the best approach to proofreading, although inevitably, a good part of the time is spent on paper-folding. The project (Phase II) is grateful to have a group of young people dedicated to the job and also to benefit from a group of volunteers. However, because of the monotonous nature of the work, it is not always easy to keep the proofreaders on the job for an extended period of time. Initially the text was input into the computer once, followed by two successive rounds of proofreading, followed in turn by ParaTExt checking. This model was later changed so that the text was input independently twice and the two compared using ParaTExt, before a single round of proofreading and subsequent ParaTExt checking. Compared with proofreading, inputting has a lesser chance of making mistakes. The latter model proves to be more cost-effective. In order to provide a better-quality text for the proofreaders, some preliminary Basic Checks in ParaTExt are processed beforehand.
Optical Character Recognition (OCR) technology offers only a limited contribution to the current project. Most OCR software now can manage vertical printing reasonably well, but page layout features such as in-text verse numbers (between vertical lines), proper name underlining, and section headings on the above-text margin become “background noise” in the OCR processing. The time spent on cleaning up offsets the contribution of the technology.
Skopos of digitization and challenges
Full-text digitization can in no way reproduce the exact form and layout of the printed text. Such reproduction can only be accomplished by image-scanning. The primary aim of full-text digitization is to capture the content. While the project involves the mainly straightforward labour of inputting and proofreading, the process is never simple. Adjustment or even compromise is inevitable. The skopos of the digitized text does play a role in carrying out the program.
In the first place, Chinese is notorious for its complex orthography. Even the vast thesaurus of 74,605 Unicode CJK (Chinese–Japanese–Korean) characters is by no means sufficient to digitize these old Chinese Bibles. This is especially the case for Hebrew or Greek loanwords. In order to represent these foreign words, translators found the need to create neologisms that are based on transliterations. For this kind of neologism, which is non-Unicode, we use the notation [AB] to indicate the different parts of the character. As is often the case, these new characters are combined with the Chinese character 口 as the left part (A), indicating its foreign origin, and another character as the right part (B), providing a pronunciation clue. For example, “hosanna” is represented as 啝[口撒]哪 hé sā nǎ, and “Eli, Eli, lema sabachthani” as 咿唎、咿唎、啦嗎[口撒][口駁][口大]呢 Yī lì, yī lì, la ma sā bó dà ne. Apparently, even for the Union Versions, different translators have their different approaches in developing neologisms for the same foreign name; thus for “cherub/cherubim,” we have 基路伯 (CU-High Wenli),
[口氷] (CU-Low Wenli), and
[口伯](CU-Mandarin). To date for the project, about twenty neologisms have been documented.
Second, some degree of standardization (or modernization) of the Chinese orthography is deemed necessary in conforming to the purpose of our intended users. As noted earlier, Chinese character simplification took place well before the 1950s. Character simplification is common in cursive written texts and can be dated back to as early as the Qin Dynasty in the third century B.C.E. As a result, there are many different variants (or glyphs) for the same character. Some variants may be considered obsolete while some still find their place in modern publications. Among the old Chinese Bibles we are dealing with, it is not uncommon to have different variants of the same character found in the same publication, as in these pairs: 為/爲; 羣/群; 隣/鄰; 産/產; or the alternating glyphs 冫/氵on the left side of the characters (e.g., 决、减、凈、况、冲、凑 versus 決、減、淨、況、沖、湊). There are a couple of hundred characters belonging to this category. Some of these were later considered as simplified-script characters, such as 却、脚 versus their traditional-script character forms 卻、腳. Western readers of this article must be reminded that the keyboarding of Chinese characters depends on the input method. Even for a Chinese character which is Unicode-compliant, not all input methods can accommodate the typing or display of the character. In order to facilitate the use of these digitized texts by modern readers, we decided to standardize these variants by adopting the more common variant—very often based on the hit rate in an Internet search. This standardization assumes that the primary skopos for the average users of these texts is to read, to search, and perhaps to compare with other (Chinese) texts. The majority of users are not particularly concerned with the different glyphs of the characters. This principle applies to all texts with the exception of two—the Russian Orthodox NT-Psalms (1910) and CU-Mandarin 1919—because these two traditional texts are still currently used by their respective faith communities (namely, the Orthodox Chinese and the Protestant Chinese) and therefore the audience would expect as little change as possible. Other standardization involves minor improvements to the copyediting quality of the layout, such as providing missing verse numbers, adjusting punctuation, and improving consistency with regard to the proper name underline marking.
Conclusion
When the Chinese language meets with computer technology, there are always challenges. This may explain why digitization of old Chinese Bibles was rarely contemplated. The translation of the twenty-two Bibles amounts to a total of over 200 years of brain-draining work that took place during the prime of the lives of the translators, under very harsh working conditions in Mainland China. The current program can be seen as a tribute to the unique contribution that these translators made to their communities, which is now made accessible to the entire world through all digital media. As Dr. Kenny Wang, one of the advisors of the project, puts it, In addition to the spiritual dimension, these old Chinese Bible translations also offer valuable diachronic insights into spoken and written Chinese, as well as cultural, linguistic and religious negotiations between the West, Christianity and China. Lying dormant and hidden in these old Chinese Bibles for the large part of the last 200 years are a great deal of cultural heritage and Godly wisdom, and for the first time ever, people will be able to gain unprecedented access to them, the access made possible by the United Bible Societies’ digitization project.
By the time the present article is published, most of the twenty-two Chinese Bibles (full or NT) will have been full-text digitized and uploaded to DBL for wider distribution. Users should be able to access these texts via YouVersion and UBS applications. The final goal of the digitization program is to digitize all thirty-three extant complete Chinese New Testaments or full Bibles published prior to the 1950s.
