Abstract
This study investigates the prevalence and causes of inconsistency in translation memories (TM) using a sequential explanatory mixed methods design. The initial quantitative phase introduces a novel method and typology for measuring and categorizing inconsistencies in TM data. The data are the product of professional computer-aided translation of software documentation. In the follow-on qualitative phase, interviewees compare the quantitative results with their professional experience of TMs. Their confirmation of the quantitative results improves the validity of the study. The interview data also increase the utility of the research, suggesting possible causes and solutions for inconsistency. Results are presented interactively, followed by a short discussion of the findings and their consequences.
Situated within the field of translation studies, translation technology focuses on computer-aided human translation and machine translation, usually of texts that are repetitive and functional. Much of the research in the area is in the form of quantitative studies. This study instead uses a sequential explanatory mixed methods approach to analyze consistency in translation memory (TM), a widely used computer-based aid for human translation that was commercially introduced in the early 1990s.
A TM is a repository of previously translated text that has been divided into segments. By default, a segment is usually a sentence, a heading, or a list element. Segments in the source language are aligned with those in the target language so that they can be recycled within a TM tool. A TM tool manages the translation process, providing a user interface for the translator to see both source and target texts and automatically creating a TM during translation by saving a segment of source and target text together as a translation unit (TU). In the case of reappearance of a previously translated segment the TM software will propose the previous translation to the translator. Depending on the parameters set by the translator, the TM system will also suggest partial or “fuzzy” matches, based on a percentage of similarity between a new source text (ST) segment and a source language segment (or source-language segments) already in memory.
The axioms behind the use of TM tools are that they reduce the cost of translation, save time, increase productivity, and remove inconsistency by allowing the user to leverage legacy materials (Austermühl, 2001; Olohan, 2011). Costs are further reduced as translators are often paid based on TM match analyses, with full payment offered for translation from scratch, partial payment offered for editing fuzzy matches, and a small (or sometimes no) fee paid for reviewing 100% matches. In theory, TM tools should produce consistent translations as previously translated work is recycled. This study aims to discover whether this is true in practice. The presence of inconsistency in a TM has an associated cost in maintenance of the database and in lowering the match percentage (with which the translation price is associated), particularly in the case of inconsistent source text. Inconsistency may also reduce clarity and ease of use, and may impact on safety if the content is sufficiently confusing (Cronin, 2013).
Some prior research on quality and consistency in TM aided translations identified a tendency toward inconsistency (Merkel, 1998; Rieche, 2004) and error propagation (Bowker, 2005; Ribas López, 2007). These were studies using a small number of translators or pilot studies involving inexperienced translation students. Merkel (1998) surveyed 13 translators and said that, as those translators were gradually beginning to use TM, the inconsistencies were caused by a “clash between an established translation culture” and the recently introduced technology (p. 146). Bowker (2005), following a quantitative study with a small number of student translators, noted that “although it is frequently claimed that TMs improve consistency, this is not always the case” (p. 18).
In endeavoring to analyze translation consistency in TM data, the current research began with a pilot study, which found inconsistencies in the target text (TT) of two TMs but was limited by a focus on the TT and by a lack of explanatory data. The pilot study did, however, help identify the types of inconsistency that may occur in a TT. The dearth of full studies of consistency in the prior research and the limitations identified in the pilot study led to a choice of mixed methods for the main study. Rather than just conducting a quantitative study, as in the pilot, it was considered that a follow-up qualitative study would both “shore up weaknesses” and “provide confirmation and elaboration” of quantitative results (Parmelee, Perkins, & Sayre, 2007, p. 195). Frey, Botan, Freidman, and Kreps (1991) write that, although quantitative studies are understood to have high measurement reliability, a combination of qualitative and quantitative studies of the same concept enhances validity and reliability of measurements along with enhancing the “credibility of the conclusions they draw” (p. 124).
The pilot study led to the choice of a biphasic approach for the main study. In the first phase, we analyzed TM data to create a typology of commonly occurring inconsistencies in source and target texts. As a single case study is considered to be a poor basis for generalization, a decision was made to carry out analyses of multiple TMs using the typology, as a replicated study gives “considerable advantages over single-case studies in terms of the rigor of the conclusions which can be derived from them” (Susam-Sarajeva, 2009, p. 37). In the second phase, we carried out a series of interviews with localization professionals to ask whether they considered consistency a problem and to suggest possible causes of inconsistency. Credible and applicable conclusions were a focus within our research center, and the interviews added validity to our measures of inconsistency in TM data and to our recommendations for the minimization of inconsistency in the translation process.
Method
The purpose of this study was to analyze and document in detail the type of inconsistencies found in TM data, then to explain how common these inconsistencies are and how they might come about. This led to the choice of a sequential, explanatory mixed methods design for this study, in which the qualitative data are intended to expand on the quantitative results (Creswell & Plano Clark, 2007). As the study focused particularly on the quantitative analysis, and as the quantitative phase provided results from which the interview questions were drawn, the follow-up explanations model of the explanatory design was used. In the first, quantitative phase of the study, four sets of TM data (two English-to-German TMs and two English-to-Japanese TMs presented in the TMX or Translation Memory eXchange standard interchange format) from software documentation, donated by two world-renowned software companies, were measured for consistency. In the second phase, qualitative interviews with translators and other translation professionals who work with TMs were conducted to explore their experiences of consistency issues in TM. It was also anticipated that these interviews would allow explanation of some of the findings from the quantitative stage of the study. An independent level of interaction between the quantitative and qualitative research data was chosen, so that data from the two phases were initially analyzed separately.
Access to data for use in this study was subject to protracted discussion and, in one case, the signing of a nondisclosure agreement. Sharing of data is a hot topic generally but is particularly so in the localization industry. This is due to financial value being attributed to TMs and the lack of a clear precedent with regard to ownership of TM data (see Gow, 2007; Smith, 2009). In addition, no language services provider wants to be identified as having poor quality TM data. As a result, companies involved in localization are protective of the TM data they possess and accessing a large TM corpus for research purposes presents a challenge. The data used for this study were made available anonymously thanks to connections between the Centre for Next Generation Localisation at Dublin City University and the companies that donated data. This factor has an effect on the reliability and validity of the research—although this study uses real-world data, it was necessarily limited by what could be accessed.
For two of the TMs, a sample of 50,000 TUs was studied. To confirm homogeneity between the sample and the full TM corpus in each case, a chi square test was carried out using Excel. The test was based on comparative measurements of corpus statistics as measured using Wordsmith Wordlist software and comparative measurement of the frequency of Category 3 inconsistencies (see Table 1). The corpus statistics used were types (distinct words), standardized type-token ratio, and mean word length (in characters). The chi square test found no significant difference between the sample and the full TM.
Example of Translation Unit Categories.
Quantitative Phase
The quantitative phase of the study measures inconsistency at the segment and subsegment levels. Segment-level inconsistencies are observed where two segments that one could reasonably expect to be formally identical differ from each other in some way. Source segments are viewed as being formally different if their meanings differ, but the term inconsistent source segments is used to refer to cases where there are very minor formal differences between two source segments and such differences do not reflect any semantic differences between the segments in question. Such minor formal differences include differences in capitalization, tags, punctuation, spaces, character formatting, and spelling (where a segment may be inconsistent with another segment simply because of a misspelling, inconsistent use of British or U.S. English spelling, or a typographical error in one of the segments).
In the case of target segments, it appears reasonable to expect segments that are translations of “the same” source segment (i.e., segments that are translations of different tokens of the same source type) to be formally identical, especially in a translation memory scenario where the goal is to reuse existing translations for already encountered source segments.
Where there are two different translations (and thus two different target segments) for a single source segment type, this is considered a target segment-level inconsistency. As there may be more than one inconsistency within these segments, each discrete inconsistency is counted and categorized. The differences between the target segments in question can be very minor formal differences (as defined above), but they can also be more substantial, in extreme cases even leading to semantic differences between the two segments.
At segment level, the following four categories are possible:
Inconsistent source segments are translated as inconsistent target segments
Inconsistent source segments are translated as consistent target segments
Consistent source segments are translated as inconsistent target segments
Consistent source segments are translated as consistent target segments
This study is primarily interested in Categories 1 and 3, but also in the possibility of consistency being introduced during the process of computer-assisted translation (Category 2). Category 4 may be seen as the ideal in specialized translation, whereby the TM has provided the best possible leverage and thus saved the maximum possible amount of time and money. An example of each category from our TM data is given in Table 1.
Inconsistent segments are counted by identifying the number of types n. The number of segment-level inconsistencies is the type count minus one (n − 1). Thus, in the case of a single source segment (type) that has 4 tokens, if there are 3 separate translations (three types; one of which appears twice), then the number of target segment inconsistencies is 2 (or 3 − 1). We give a special status (of “master” or “reference” segment) to one of the target segments, and treat the other two segments as inconsistent with that reference segment. The reference segment is the one which appears first in our sorted list, and which a translator could have, but did not reuse in unchanged form. For example, the following four translations for “Click an empty part of the drawing area” appear in the TM data:
Klicken Sie auf der freien Zeichenfläche.
Klicken Sie auf einen freien Bereich der Zeichenfläche.
Klicken Sie auf einen freien Bereich der Zeichenfläche.
Klicken Sie auf einen beliebigen freien Bereich auf der Zeichenfläche.
Although there are four tokens, there are only three types: a, b, and d. If segment a is assigned the status of reference segment, the segments that are inconsistent with the reference segment are b (repeated for c) and d: thus two inconsistencies are counted. When there are three types n = 3, and since the count in this study is of type (n − 1), two inconsistencies are counted.
At the segment level, source or target segments are either consistent or formally differ and are thus inconsistent. However, there may be more than one inconsistency within these segments. For this reason, subsegment inconsistencies found within inconsistent target text segments (those found in Categories 1 and 3 above) are also counted and categorized.
These inconsistencies are categorized mostly per part of speech aside from those with inconsistent punctuation or where word order has been changed. If there are more than three inconsistencies within a target segment, that segment is considered to have been wholly retranslated. These categorized inconsistencies may be further subcategorized, for example, nominal inconsistencies that differ lexically, or in number (singular/plural). Other typical subsegment inconsistency categories are verb, space, punctuation, preposition (for German), and postposition (for Japanese).
These subsegment-level inconsistencies are counted in the same way as segment-level inconsistencies: We identify the number of types n, assign one the status of master or reference segment, then count the types that are inconsistent with the part-of-speech or word order in the reference segment. Thus, the count is n minus the reference segment (n − 1). Again, the reference segment is the one which appears first chronologically, and which a translator could have reused, but chose to edit in some way.
The aim of the quantitative study is to identify translation units that fall into the four categories as specified in Table 1. This was done by using a Python script to extract the ST and TT segments from the <seg> tags. The extracted, aligned segments were pasted into a spreadsheet using Libreoffice software and sorted alphabetically so that repeated segments would appear consecutively. When repeated ST segments were found (or those containing the minor inconsistencies as specified), the corresponding TT was checked for consistency. The TT was also checked for repeated segments and where they were found, the corresponding aligned ST segments were checked for consistency, to see if consistency was introduced via translation using TM as per Category 2. TUs extracted in this way were copied to a new spreadsheet and classified according to whether they belonged to Categories 1, 2, or 3.
TUs in each of these spreadsheets were then categorized using the categories of subsegment inconsistency from the pilot study: noun, verb, adverb, punctuation, preposition, word order, tag inconsistency, typographical error, or complete retranslation. In the case of Category 1, TUs were further categorized by ST inconsistency: capitalization, tags, punctuation, or typographical error. The topics chosen for the follow-up qualitative study were based on these results.
Qualitative Phase
The second phase of this study is a series of qualitative interviews with translators and others in the localization industry with experience of using TM tools in order to find the causes of inconsistency and methods of minimizing inconsistency. These are in the form of face-to-face personal interviews or, where this was not possible, telephone interviews, seeking opinions on results and conclusions reached in the quantitative phase of the study. Interviewees who are translators are usually native speakers of the target language who may also review and edit draft translations done by others for quality assurance purposes.
Semistructured interviews, usually allowing probes once the interviewee has begun to answer, are the “most common qualitative strategy used in mixed method design” (Morse & Niehaus, 2009, p. 127). Questions were scripted and standardized as this means the interview is highly focused and makes responses easy to compare (Quinn Patton, 2002). It was hoped that this would minimize the effect of the interviewer, while allowing for the interviewer to remain active, that is, to prompt or ask for further explanation if necessary, without pursuing unanticipated topics (Quinn Patton, 2002). Even a tightly scripted interview cannot be devoid of input from both parties. As the discourse is “shaped by prior exchanges between interviewer and respondent” (Mishler, 1991, p. 53), all participants are “inevitably involved in making meaning” (Gubrium & Holstein, 2003, p. 78).
Ethics approval for the interviews was sought and received from the university research ethics committee several months prior to beginning the qualitative phase. The committee confirmed that this research qualified as a low-risk social research project and granted approval. In doing so, they stipulated that interviewees should receive a Plain Language Statement, explaining the background to the research, its intended purpose, the voluntary nature of interviewees’ participation, and informing the participants that they may withdraw at any time. Thereafter, interviewees signed an Informed Consent Form to confirm that they understood the information contained in the Plain Language Statement, that they were aware that interviews were to be recorded, and that any further questions had been answered satisfactorily.
For interviewee selection, purposive sampling was used. This means that subjects who would provide the most detailed information about the research questions were chosen, emphasizing their depth of knowledge rather than seeking a large sample of respondents. Purposive sampling is associated with qualitative research and provides narrative data. Researchers using purposive techniques tend to minimize the sample size, selecting only cases that might “best illuminate and test the hypothesis of the research team” (Kemper, Stringfield, & Teddlie, 2003, p. 279). The subtype of purposive sampling in this case was homogeneous cases sampling, which aims to gather opinions from people who are “demographically, educationally, or professionally similar” (Kemper et al., 2003, p. 282). In this case, those sampled had worked as translators or with TM data for at least 5 years and were considered professionally similar. Selection of subjects may influence the validity of research, but in this case it was felt that translators (or those who work with TM) would be most accustomed to searching for inconsistencies in translations that had been overlooked by other translators, and best able to describe phenomena found in the TM data.
Thirteen interviewees took part, having responded to calls for potential interviewees circulated via email and Twitter, or having been approached at translation industry events. Face-to-face interviews were recorded using the Voice Recorder application on a smartphone. Where this was not possible, interviews were carried out remotely via Skype (version 2.2 for Linux) and recorded using Skype-Recorder version 0.8.
Each interview was limited to 1 hour because constraints on respondents’ time. Interviewees were told of potential uses of the research findings, how their identities and that of their company would be anonymized, and assured of secure storage of the data. The interviewees were asked questions based on the findings from the quantitative study.
The interviewer practiced active listening throughout the interviews (Kvale & Brinkmann, 2009), consciously analyzing replies and offering affirmation to create a rapport with the interviewee. In the case of an ambiguous response from the interviewee, the interviewer endeavored to interpret the statement to the interviewee’s satisfaction, so that incorrect interpretations could be ruled out. Remaining ambiguities were cleared up by email contact with the interviewee at the analysis stage. At the end of the interviews, interviewees were asked if they had any other comments or suggestions, then offered transcripts of the interview, not only as a matter of courtesy but also in case they wanted to change or add to an answer. Only one interviewee asked for a transcript, and no interviewees requested changes. Interviewees were later sent a copy of the final study to ensure that their input had been fairly represented.
Interviews took place between December 2011 and February 2012, and recordings were transferred to a password-protected PC. Interview data was then transcribed to a document following playback with VLC. Transcription is a significant stage in processing interview data, transforming the narrative mode from oral to written discourse and decontextualizing the interview conversation (Kvale & Brinkmann, 2009). This step necessarily involves interpretation of meaning and associated choices, such as where to place punctuation, which can substantially change the content.
Following transcription, top-down or concept-driven coding was applied, in that the responses were categorized by questions which were based on the five prescribed themes identified prior to the interviews: general opinions on TM, opinions of inconsistency, ST inconsistency, TT inconsistency, and the future of TM. For sections that digressed from the initial themes, bottom-up or data-driven coding was applied, allowing the interview data to set the theme. In such cases, sections were labeled according to their topics (Richards, 2005).
Coding was done using NVivo 9 qualitative analysis software in several steps: Transcripts were first imported into NVivo, these were then coded by interviewee, by question, and finally by themes that emerged over the course of the interviews. Each interview was assigned attributes signifying the interviewee’s job, gender, first language, and main TM tool so that queries could be refined using these attributes.
In attempting to glean data-driven themes from the interview material, the method that Kvale and Brinkmann (2009) term bricolage was used. This involved reading through the interviews to get an overall impression, to generate meaning, and to capture key understanding. As is typical when coding with Nvivo, emergent themes were gathered as free or open codes. These open codes were then sorted into a hierarchy of branching “tree nodes” to reflect the “structure of the data” (Bazeley, 2007, p. 100). Aside from adding organization to the open codes, the sorting stage is also said to prompt the user to code thoroughly, to improve conceptual clarity, and to help identify patterns and connections (Bazeley, 2007); see Figure 1.

Relationship of interview questions and emerging themes.
The first two questions were about the interviewee’s background. Questions 3 and 4 were intended to be quite broad, seeking the interviewee’s opinion of the benefits and disadvantages of TM, leading to the effect of TM on consistency and whether they felt consistency was important, in addition to their own experiences of consistency issues. Broad introductory questions may yield spontaneous and rich descriptions of the interviewee’s experience of the phenomena investigated (Kvale & Brinkmann, 2009). Question 5 related to ST inconsistencies such as those found in Categories 1 and 2. Question six concerned specific types of inconsistency found in the data (see the following section for details of these findings):
TT noun or term inconsistency—the largest category of subsegment inconsistency found in the study, also the prevalence of Anglicization in the target language, and suggestions to improve terminological consistency
Inconsistencies of ST format retention in the TT
Alternation of whole phrases throughout the TM
Explicitation in Japanese TT
Interviewees’ responses to the specific phenomena referred to in Questions 5 and 6 should confirm that the Phase 1 results were accurately represented, further justifying the choice of a mixed methods approach. Finally, Question 7 gave the interviewee an opportunity to suggest ways of improving how TM tools deal with consistency issues in future. Table 2 contains a profile of the interviewees.
Interviewees From the Qualitative Phase.
Results
Among the four TM corpora studied, between 2% and 5% of the TUs contained ST segments that were repeated with minor inconsistencies, falling into Categories 1 and 2. Figure 2 shows the ST inconsistencies found in the data.

Categories of source text inconsistency in all TMs.
The number of Category 1 and 2 TUs found in the four sets of TM data differed, but in all four TMs the most prevalent type of ST inconsistency was letter case or capitalization of words. All but one of the interviewees (E) said that they had seen many instances of ST inconsistencies as shown in Figure 1. Interviewee G, a QA specialist, said that every time he checks a TM “I have the same translation for different source texts that should have been the same.” He continued, “The TM technology . . . the quality issue of technology is there, but the source text is the problem.” Eight interviewees suggested the attitude of some of their clients as a cause of ST inconsistency. They felt that the focus of clients is usually on time and cost savings, which leads to source text that is hurriedly written, often by a nonnative source language speaker, and is thus ambiguous and inconsistent. They said that educating their customers about the effect of inconsistent ST on the translation process is one way in which they attempt to minimize inconsistency. Several interviewees said that this education usually proves difficult. For example, K, a COO, commented that this is “one of the biggest fights that everyone in the localization industry has to fight with clients” to make them understand that “if you don’t control your English . . . you can’t possibly expect to have savings and leverage over time with the TMs.” Four interviewees (G, H, L, and M) have returned to customers with an assessment of the financial repercussions of ST inconsistency in order to “show just how much money they’re wasting,” and to tell clients that “they could do things a hell of a lot cheaper and a lot better” (H; language technology consultant). Five interviewees said that inconsistent ST segments were, in their opinion, a cause of further TT inconsistency.
Category 1: Inconsistent Source and Target Text Segments
Table 3 shows the most commonly occurring types of Category 1 ST inconsistency. Category 1 TUs contain minor inconsistencies (as specified in our typology) in the source segment and other kinds of inconsistencies in the aligned target segment. In Category 1 TUs, none of the TT segments aligned with ST segments that contain minor inconsistencies themselves contain instances of the same kind of inconsistency; rather the TT segments in question evince other kinds of inconsistencies, as in Example 1 (with differences highlighted in bold) from TM A:
Inconsistencies Found in Category 1 Source Text Segments.
Note. TM = translation memory.
Segments 1.1s and 1.2s contain inconsistent letter case. Although there has been no semantic change in the ST, the word order has been changed in the TT from “add the macro from Exercise 1 to a toolbar” in Segment 1.1t, adding the genitive form in Segment 1.2t: “add the macro that was created in exercise 1 to a toolbar.”
We found a high incidence of inconsistent placing of the space character, especially in TM D. These spaces were initially noticed by automatically comparing the ST segment and the following, seemingly identical, ST segment, as 54 of the 68 space inconsistencies in TM D were at the end of the segment following a full stop. Again, the aligned TT segments contain other kinds of inconsistencies, as in the following example from TM A:
The ST segment contains a space inconsistency between the tags numbered 3 and 4, whereas the aligned TT segments contain an inconsistent noun and verb. The use of “Help Center” in 2.1t shows some influence from the source language, which can be seen throughout the TMs.
Table 4 shows the most commonly occurring types of Category 1 TT inconsistency. In Table 4, the number of subsegment inconsistencies may be seen to be larger than that in Table 3. This is because a segment may contain up to three subsegment inconsistencies before we consider it completely retranslated. Among Category 1 TUs, a large proportion of noun inconsistencies were found, comprising between 36% and 48% of the total number. For example, in TM C there were 323 noun inconsistencies of which 12 showed influence of the source language in one instance but not in another, and 87 contained inconsistencies of capitalization or case, as per the following example:
Inconsistencies Found in Category 1 Target Text Segments.
Note. TM = translation memories; N/A = not applicable.
Example 3 also displays a phenomenon that accounts for the high prevalence of preposition inconsistencies in TM C. A total of 138 preposition inconsistencies were found in Category 1, just less than 20% of the total, and 126 of these preposition inconsistencies (and thus 18% of the total) are secondary changes as required by the change of noun, thus we see an alternation between the phrases “in der Befehlszeile” (in the command line) and “an der Eingabeaufforderung” (at the command prompt).
In the qualitative phase, 11 of the 13 interviewees said that they often find similar inconsistent phrases propagated in TMs. K (COO) said that she sees these kinds of inconsistencies “day in, day out,” where there is “a new member of the translation team who will reckon he or she has a better solution.” If this new team member ignores the suggested match and rewrites the TT segment, “it’s just so easy to bring in or upload a new version onto the TM and there’s nothing that stops it.” F (QA specialist) also said that he sees this “all the time” and suggested that it may be the result of translators working independently without adequate terminological guidance. Also, with the software that his company uses, for fuzzy matches “it’s up to the translator to identify where that fuzziness is and correct it, and sometimes they just over-correct.” Despite the importance of consistency in his domain of medical device translation, he says “we wind up with this all the time.”
Category 2 TUs (Inconsistent ST Segments With Consistent TT Segments)
Table 5 shows the most commonly occurring types of Category 2 ST inconsistency.
Inconsistencies Found in Category 2 Segments.
Note. TM = translation memories.
Category 2 TUs contain ST inconsistency and thus introduce consistency in the TT. The majority of these ST inconsistencies in all TMs analyzed were inconsistent letter case. The following example from TM B is typical of this ST inconsistency.
As Japanese characters do not vary by letter case, this ST inconsistency has been removed in the TT. To convey the concept “case sensitive” in Japanese, the translator has chosen “the distinction between upper and lower case letters” in the TT. Although strict rules on capitalization in German means introduced consistency would also be expected, there are instances of the ST letter case being retained in the TT in all of these TMs (Roman lettering is sometimes used in the Japanese TT), particularly if the ST segment is in upper case. This means we have a mix of transposed ST grammar, punctuation, or formatting and native TT formatting in TT segments. In the following example from the same TM, containing a ST space inconsistency similar to those found in all four sets of TM data, we see a German noun written (incorrectly) in lower case:
Twelve of 13 interviewees had seen this sort of inappropriate target language formatting frequently. I (Project Manager) suggested that a formatting issue such as the one in Segment 4t occurs when no match is suggested, in which case the translator might copy the ST to the target segment and overwrite it without considering whether the formatting is “actually compliant with German rules.” She said that she would deal with this by compiling a style guide specifying what formatting to use, and that “that’s something you have to clarify up front, even if you have the best TM ever.”
Inconsistent punctuation is usually to do with the presence or absence of commas or full stops in the ST, which may or may not be retained in the TT. The following example from TM D contains a punctuation inconsistency but also contains an example of a section that has been marked out in the TT, followed by a comment by the translator, explaining that he chose the term
There are a number of reasons why ST inconsistencies may be ignored by a translator who chooses instead to accept a fuzzy match. Foremost among these in Japanese is the presence of plurals in the ST. In our study of English to German TMs, plurals did not register in our categories, as we considered that the ST had formally changed and thus accepted that the TT would be inconsistent. However, as there is no distinction between singular and plural in Japanese—numbers are given explicitly or are implicit in context—we can expect to see plural and singular nouns translated consistently in the Japanese TT, and this is indeed the case. Of 76 cases of inconsistent nouns in the ST segments of Category 2 TUs in TM B, 42 differ in number: singular in one case, plural in another, as in Example 6:
All 12 of the interviewees with experience of ST inconsistency said that they revert to clients with problems and one even tends to suggest a third party, a technical writing consultancy firm, to assist clients with their ST consistency. F, a QA specialist, agreed that ST consistency is important: “If they (clients) can’t control their source text, then we can’t be expected to control the target text for them.”
Category 3 TUs (Consistent ST Segments With Inconsistent TT Segments)
In the four TMs in this study, all four contained TT inconsistency introduced at a rate of approximately 5% where the ST segment was repeated exactly. The types of TT inconsistency found in Categories 1 and 3 may be seen in Figure 3.

Categories of target text inconsistency in all TMs.
Category 3 contains TUs with inconsistent TT segments, where inconsistency has been introduced in the TM data. Again, the most prevalent category of TT inconsistency was noun inconsistency, as may be seen in Table 6 (and Figure 3). In TM A, we found 81 inconsistently translated nouns (47% of the inconsistencies) of which 18 showed influence of the English source language in one instance as in Example 7:
Inconsistencies Found in Category 3 Segments.
Note. TM = translation memories; N/A = not applicable.
This alternation between “Border” and “Rand” occurred three times in the TT and was one of several patterns that emerged within the data.
All 13 interviewees agreed that this was a phenomenon that they saw often. E (translator) suggested that these inconsistencies may be caused by having several translators working on the same material without a follow-up consistency check to “catch this before it is propagated to the TM.” C (translator) suggested that this problem may be caused by merging TMs from different sources. She often receives TMs that contain inconsistent terminology, so that “when you look for a term in a translation memory, you do a concordance search,” in which case she often finds “two, three, or four different translations” for the same term.
L (workflow manager) suggested that inconsistencies such as in Example 7 could emerge even with terminology databases as a result of the conflict between approved terminology and search engine optimization (SEO). In this situation, a German translation for “border” may have been “approved and reviewed.” Despite this, the client may have realized that “in German, people don’t actually go to google.de and look for the German translation, but they actually look for the English words,” so when they start optimizing their German website, they might decide that “the approved term is actually not what’s going to get them the hits.” This is one explanation for the prevalence of noun inconsistencies such as in Example 7, featuring a native German word in one instance and a borrowed English word in another.
Non–translator interviewees felt that new or inexperienced translators on a job tend to add new translations to the memory. K (COO) said that translators may also choose to accept inappropriate matches. If inconsistencies exist in the TM, she continued, one cannot expect consistency to be increased in the TT via “human decisions.”
Nine of the 13 interviewees highlighted the need for terminology management as a method of minimizing noun inconsistencies. Terminology management is a function of most current TM tools, although TT term inconsistencies were still found in all four TMs, which our interviewees all said are commonplace in their experience of TMs. M (software localization engineer) warned that the use of TM without glossaries means “you’re going to end up with inconsistencies, mistranslations, and some [other] issues.” F (QA specialist) uses glossaries as his company “can’t afford to care about what happens to the syntax and the grammar and everything else,” rather they just “have to focus on terminology being consistent because that’s what the clients really care about.” H (language technology consultant) said that her clients, on the other hand, pay insufficient heed to terminology, and she considers it a “gift from heaven” if she can get “20, 50 words that . . . you guarantee in your documentation.” According to her, “Customers are not interested, don’t appreciate, do not understand; they’re not willing to pay for terminology work.”
The Japanese TT in TM B again showed detail being added in translation that was not in the ST. Ten inconsistencies in Example 8 were translations of the word “selecting.”
In the example above, 8.1t is taken as the reference translation as it appeared first in the TM data. Each TT segment contains the noun
Nine interviewees said that they saw this phenomenon frequently, not just in Japanese, but in German (B and I), Brazilian Portuguese (A), Romanian (H), Spanish (K), and Malay (K). K (COO) felt that the ST in segment 8s is unclear. She suggested that the translator needs more details as “it’s critical that they [ST segments] are taken in context.” She also introduced the topic of client expectation, saying that customers need to be told that “they might be 100% match, but they’re not perfect matches all the time.” Her policy is not to lock 100% matches to prevent new translations, but she tells translators not to touch 100% matches unless absolutely necessary. In the case of incorrect 100% matches it’s left to the assiduousness of her translators as to whether they revert to her with problems or leave the 100% matches untouched. If the context has changed, then the introduction of some inconsistency in TT may be necessary.
After noun inconsistency, the next most prevalent type of inconsistency in TMs A and B is verb inconsistency. Of the 40 verb inconsistencies contained in TM B, 18 of them contained another repeated pattern, alternating between using the verb
Segment 9.1t translates as “You can bind {1} XML elements by their corresponding ISO elements,” Segment 9.2t as “You can relate {1} XML elements by their corresponding ISO elements.” Looking through the metadata, each verb choice is not attributable to a single user ID, but the translations using
The other Japanese data, TM D, also contain a repeated pattern, alternating between the borrowed English word
Three interviewees said that they had experienced problems with inconsistencies such as in Example 10 in Japanese TMs, whereby in one case a kanji compound was used and the phonetic katakana loan word was used in another. J (PM), a native Japanese speaker, felt that “the trend is now phonetical translation, so that’s inconsistency just depending on the translator preference, so we have to give them guidelines, style guidelines.”
TM D contains a large number of punctuation inconsistencies. Many of these (23) are marked out using the # symbol, others have inconsistently placed quotation marks, and many show indecision as to whether or not to retain ST formatting for commas or full stops as in the following example:
The high rate of preposition inconsistency in TM C is again a secondary effect of noun inconsistency as shown previously in Example 3. The inconsistencies of particle in Japanese are also often secondary to a change in verb or verb form from active to passive or, as in the following example, required by verb choice with the
Category 4 TUs (Consistent ST Segments With Consistent TT Segments)
These TUs are those that we consider to have been translated consistently. By looking at the number of repeated TUs that fall into this category, we can see the overall rate of introduced TT inconsistency within a TM as per Table 7.
Introduced Inconsistency in all TMs.
Note. TM = translation memories; TU = translation unit; ST = source text.
While interviewees in the qualitative phase of this study agree that the inconsistencies exemplified here are common, in their experience, there are also instances where inconsistency is required. The interviewees would like “maximum consistency” (K), yet have problems with clients’ assumptions that all 100% matches can be automatically accepted. Eight interviewees said that 100% matches may be erroneous, a point previously made by Reinke (2004). Several interviewees (particularly nontranslators) felt that some TT inconsistency may be necessary. F is “guided by my translators” as to whether it might be better to introduce inconsistency. He said that, for him, it is more important for a translation to be “accurate and natural and fluent than it is for the resulting translation unit to be recyclable infinitely in all other documents,” adding that this loss of leverage is “a sacrifice we have to make.”
Conclusion
Following a simple workflow for translation using TM (see Figure 4), the study found inconsistencies in TM that interviewees blamed largely on the client for failing to control ST. At the end of the translation process, the level of quality or consistency is again dependent on the client and whether they feel it more important to make short term savings, as translation providers may find it necessary to “skip quality steps to offer lower rates, especially if the client is not measuring quality” (Kelly, 2012, p. 2). If insufficient heed is paid to quality, then inconsistency will be propagated in the TM.

Computer-aided translation workflow.
At the translation stage, the quantitative study showed that inconsistency is introduced. Interviewees felt that this was due to insufficient terminological assistance within the tools but also that translators may make different decisions, especially when inconsistent ST leaves them the option of editing the TT. Conversely, not all 100% matches may be appropriate for reuse, although the interviewees said that this is again related to ambiguous ST. If the ST is clear, the TT should be further recyclable. The inconsistencies of grammar and formatting in the TT showed the importance of a style guide to tell the translator whether to follow ST or, more usually, TT-appropriate grammar rules. At the end of the translation process, translations are usually reviewed, but the interviewees claimed that these final edits are rarely added to the TM, leaving inconsistency to be further propagated in the next translation project using the same TM.
In the introduction, we discussed the cost associated with inconsistency. We estimate that, for an average-sized translation project of 50,000 segments, each translated into 20 languages, the introduction of Category 3 inconsistencies at a rate of 5% will add a cost of over $21,000 to the subsequent iteration of the project. In addition, inconsistent source text will result in low match results, incurring an extra cost, and inconsistent content means more time to be spent by engineers and reviewers in QA at the end of a project. Many companies, particularly in the fields of in life sciences or medical device translation, spend a lot of time on measuring quality post-translation. Inconsistent content will add time and expense to this QA process.
As there is no one correct way to write a translated text or segment, new target segments are continually added by translators. This is referred to by the Business Development Officer of a prominent translation software company (SDL) as “accidental content” (Cronin, 2013, p. 37). Efforts to maximize consistency thus constrain the translator so as to minimize this accidental content. Our suggestions in this study involve standardization of source texts, which will maximize leverage if and when those source texts are repeated. We suggest employing style guides, which have been shown to be time-consuming to create, but are beneficial tools to limit target texts. As most of the introduced inconsistencies in this study were noun inconsistencies, maintaining a tight control on terminology should improve overall consistency a great deal. The interview data also suggest that the use of small, bespoke TMs, prioritizing precision over recall, will remove the chances of inappropriate matches being suggested by the TM tool.
Although the focus in this study was on the quantitative phase, the guidelines for improved translation processes are drawn largely from the qualitative phase, based on interviewees’ knowledge and experience. The descriptive analysis in Phase 1 may be presented as a stand-alone work, but has little impact without prescriptive measures, and as Phase 2 relies on the quantitative data, the potential effectiveness of this study is because of the choice of mixed methods. The validity of research conducted within the pragmatic paradigm is judged on the effectiveness of the work and the actions it engenders once completed and disseminated (Kvale & Brinkmann, 2009). A challenge lies in dissemination of key findings, which will need to reach the localization industry in order to be effective.
This study contributes to the mixed methods literature by applying mixed methods in a field where studies have tended to be quantitative (such as those focused on incremental improvements to machine translation systems) or qualitative (such as those exploring the reception of translation tools). Although some prior work on TMs has yielded quantitative and qualitative results from within a single study (Lagoudaki, 2008), this study connected both, adding substantially to the external validity of the study and showing that quantitative analysis of corpora may be successfully combined with qualitative interviews in the field of translation studies. Subsequent mixed methods research (Doherty, 2012; Guerberof Arenas, 2012) has appeared as the focus in translation studies has moved toward technology and applied research. This increase in published mixed methods studies is in line with the increased prevalence rates for applied disciplines as reported by Alise and Teddlie (2010) and reflects the need for user input as translation technology more widely applied commercially.
Strengths and Limitations of This Research
This study began in 2008, since when there has been a great deal of change in TM tool development. As technology is in a state of constant change and development, the current research is limited by focusing on an area that dates quickly. The study was carried out by one person; it was limited by the researcher’s language competence, restricting the study to English, German, and Japanese. As a result, it may be useful to replicate the methodology herein with different language pairs to see whether the results are repeated. The people interviewed in the second phase work predominantly in translation of IT documentation, which may have impacted on their views. The sample was limited to those who received and responded to our call for participants via email, social media, and at industry events.
The choice of a mixed methods design introduced a threat to validity at the stage between the two main phases of research, when the researcher must decide what results from the first phase to pursue in the second phase. In this study, topics chosen for the qualitative phase were based not only on the types of inconsistency that were most prevalent in the quantitative results but also on phenomena that were seen to have occurred across categories, such as whole phrase inconsistencies, in order to accurately represent the data.
Future Work
Findings from this study have been used to envisage realistic scenarios in which leverage is lost because of inconsistencies in the TM applied in a hypothetical job. This necessitates making a number of assumptions about the project, but the creation of such scenarios can help make the financial case for ensuring consistency (where possible) in TMs, despite the cost associated with doing so. If cost–benefit analyses can demonstrate that it is financially worthwhile to implement measures that promote consistency, then the major obstacle to doing so (clients’ perception of cost) may be removed.
As more machine translation is integrated into the localization process, this study also forms part of a larger discussion of consistency in translation technology, suggesting that consistency of bilingual resources will continue to be an issue into the future and that the “quality control of TMs needs to become much more sophisticated” (Zetzsche, 2012, p. 51). One follow-on project attempted to automate the removal of inconsistency with mixed results (Moorkens, Doherty, Kenny, & O’Brien, 2013). Future work may also involve further interviews to include other stakeholders in the translation process, such as clients and end-users of products, and translators of content from domains other than computing.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
