Abstract
A new thresholding algorithm is introduced for images of historical documents containing graphical elements as drawings. The major idea is to improve the binarization process such that the understanding of the drawings is not lost in the final bi-level image and the text areas are completely preserved for further recognition. As quality is a subjective matter, the algorithm allows non-specialized user interference in order to achieve the expected quality of the final image. The proposed method achieved satisfactory visual results when compared to other state-of-the-art thresholding algorithms. As a consequence of the algorithm, there is also a black ink economy in the printing of the images if compared with a halftoning printing of a grayscale image. This algorithm ranked the fourth position at H-DIBCO 2016 (the handwritten documents version of the annual Document Image Binarization COntest).
Introduction
There are many applications for image processing and computer vision: segmentation [5], recognition [7], registration [36], tracking [37], visual indexing [13], and contrast analysis [30]. Among them, there are the applications related to document image processing [18, 20], where thresholding (or binarization) [10] plays a major role. Although extensively studied, it is still a challenge in several domains. Basically, a thresholding algorithm converts a color image into a bi-level image; for document images, this is usually a black and white image. Given a threshold value, the colors above it are converted into white, while the colors below it are converted into black. For digital document images, for example, it is expected that an effective algorithm converts the colors of the paper (the background) into white and the colors of the ink (the foreground) into black. Although this conversion is quite simple, the main problem is to find the optimum threshold value that produces the best bi-level image, correctly separating background and foreground. Bi-level images are the basic object for several mathematical morphology algorithms, for example [26].
There are a great variety of types of thresholding algorithms [31], based on different approaches, as artificial neural networks [1, 2, 3, 12, 27]. However, to the best of the author’s knowledge, none of them takes into account the presence of graphical elements in the document image, aiming to preserve its quality just as the textual contents of the document. In general, binarization of documents with figures is done by first segmenting it to identify where is text and where is figure and then using different algorithms (or, at least, different settings) in each part [19, 32]; this approach is usually time consuming.
Thus, a new thresholding algorithm is proposed aiming to produce a high quality bi-level image considering the correct conversion of ink elements into black and paper elements into white, preserving the details not only of text but also of graphical elements as stamps, ideograms, logos, hand drawings, post cards, etc. With this in mind, our objectives are twofold: (1) to create a more pleasant image for visualization purposes even just in black and white, and (2) to reduce the amount of ink in printing process, especially important for print houses, which print large circulations/ editions.
At first, some constraints must be defined about what we call “preservation of visual quality”. Although there must be other issues to be considered, our hypotheses are:
The text remains as good as possible with a quality comparable to other effective thresholding algorithms to allow further automatic recognition: as the most important part of a document is, usually, the text; The stroke width of the text must be preserved as much as possible to avoid increasing the recognition difficulty: this complements the previous constraint with the same goal that is the preservation and enhancement of the text part for further recognition (in this case by human or machine); Artifacts due to aging process must be removed: in order to avoid noise in the final image; The edges must be preserved wherever they are strong: this is important to avoid loss of connectivity in the drawings; Large dark areas can be represented into contours, according to the needs of the user: as the definition of quality is subjective and it strongly depends on the user’s final application, it is reasonable to do not have a rigid result in some aspects; Quality is a subjective definition that varies from person to person. Thus the method must be able to create different types of images according to the user expectation but with minimum user interference.
Although, as hypotheses, they can not be proved, each one of them seems reasonable to be reached by an effective thresholding algorithm with the first two requirements related to the text, the third affects both text and drawings, and the other three are related to the drawings.
Next section summarizes some thresholding algorithms. Section 3 introduces the new algorithm in details. Experiments are reported in Section 4, with comparison to other state-of-the-art algorithms as Valizadeh-Kabir [35] (based on contrast images), Mesquita et al. [21] (based on visual perception),
The algorithm introduced herein ranked the fourth position among other 12 algorithms in the Handwritten Document Image Binarization Contest 2016 (H-DIBCO 2016) [25, 42] that ran during the International Conference on Frontiers in Handwritten Recognition (ICFHR 2016) held at China [43].
In this section, some thresholding algorithms are summarized. Some new approaches are highlighted considering their novelty on addressing the problem. The choice for these specific algorithms can be easily justified: Valizadeh and Kabir [35] presented an algorithm that has an improvement on techniques previously defined in [33]; this algorithm was later improved by Arruda-Mello in [4] and compared to 14 other algorithms against the datasets of DIBCO2011 and H-DIBCO2012, ranking the first. In [11], Howe introduced the winner of H-DIBCO 2012 (against 23 other algorithms), which was used as basis for a variation of Mesquita-Mello-Almeida [22] that was the winner of H-DIBCO 2014 (against 7 other algorithms). The last algorithm is Roe-Mello [29] which was applied to documents from the same dataset used here and compared to 22 classical algorithms with better results.
For more information about classical algorithms, we suggest the reading of Sezgin and Sankur survey [31].
Valizadeh-Kabir
Valizadeh and Kabir [35] proposed a new algorithm which uses Niblack [23] local thresholding algorithm as a major step. The grayscale image is mapped into a 2D feature space which is partitioned into small regions. These regions are classified as text or background based on Niblack’s application to the original image. The first feature is a structural contrast image that is created based on an estimation of the stroke width. The structural contrast image is a variation of the contrast image first proposed by Su et al. [33]. While the contrast image is defined by the local image maximum and minimum, the structural contrast image is a combination of the maximum value of pairs of pixels inside a window with size defined by the stroke width. The 2D feature space is then built mapping the gray values to the corresponding structural contrast value. The space is partitioned using a mode association clustering and the bi-level version of the original image (created using Niblack) is used to classify each partition as text or background. The drawback of the method is its high computational cost, but the use of structural contrast image proved to be an interesting approach. The method was improved by Arruda and Mello [4].
Arruda-Mello
An improvement on the method defined by Valizadeh and Kabir was presented in [4]. In this new version, two problems of the original algorithm were addressed: (1) the structural contrast image sometimes does not enhance the separability between background and foreground. This is particularly true in cases where the ink has faded and has colors very close to the background or if the document has a smear that closes to the ink tones. (2) Valizadeh and Kabir’s algorithm worked with a fixed value of parameters for Niblack algorithm, which did not reach the best solution for every image. Thus, Arruda and Mello proposed a new strategy with the combination of two structural contrast images (one of them is a normalized version) and two binarization steps by Niblack. The algorithm creates two bi-level images (called strong image and weak image) that are combined into a single final image. The method improved the results of Valizedh and Kabir although it still has a high computational cost.
Howe
Howe [11] introduced a binarization algorithm based on the minimization of an energy function inspired by a Markovian random field. The energy function includes matching costs and irregularity costs. The matching costs establishes how well the pixel matches its appearance (they must be invariant to local illumination and then are taken from the Laplacian of the image) and the irregularity costs are evaluated for each pixel that differs from horizontal and vertical neighbors acting as penalties (Canny edge detection is used for this). The major novelty of the method is the use of a tuning strategy to find the best parameters setting for each image. However, this strategy presented low variability in the choice of parameters, resulting some low quality images.
Mesquita-Mello-Almeida
Based on the concept of the perception of objects by distance, Mesquita et al. [22] presented a new approach to binarization. In several cases, the thresholding of documents is done by trying to enhance the ink colors. However, if we have information about the colors of the background, it is also possible to separate both classes. This is the idea of the. The enhancement of the background is done by simulating a person going far from the document. As the distance increases, the perception of details decreases (the ink). The distance, however, does not remove the perception of the major background colors. Through the evaluation of the stroke width, it is defined the necessary distance so that the ink would not be seen in a real situation. This separation is simulated through the downsize of the image and the use of morphological operations (as the corners tend to get rounded by distance). After this, the image is resized back to its original dimensions recovering the background of the document. The subtraction of this background and the original image returns almost just the ink but still a grayscale image. A strategy that combines both k-means clustering algorithm and Otsu’s thresholding method [24] is used to create the final bi-level image. This method was further improved with an analysis of the best setting of the parameters in [21] and this version was the winner of the H-DIBCO (Handwritten Document Image Binarization COntest) 2014 [39]. The method fails when dealing with document images with text with different stroke width.
Roe-Mello
A local equalization strategy is applied as the first step of the method introduced in [29]. The application of Otsu’s thresholding algorithm completes the first step, producing the bi-level image IM
Proposed algorithm
The major steps of the algorithm are described next.
(Left) original image with faded colors, and (right) result after applying gray world algorithm.
The colors of vintage documents usually change over time, becoming an overall brown-yellow. Such change can be more or less evident depending on several factors such as: the document age, the kind of paper and pigments used, how it was stored and how carefully it was handled. As the paper tends to become more brown, the distinction between the background (the paper) and the foreground (the ink) becomes less evident. In order to improve the distinction between background and foreground, an algorithm to enhance faded colors is applied. There are many proposed algorithms for color equalization based on color constancy [6], such as retinex [14, 15, 16], gray world [8] and ACE [28]. Gray world, as described by Ebner [8], was chosen due to better final results. Figure 1 exhibits the result of applying it to an image with faded colors. The goal here is not to try to recover the document original colors, but to enhance the distinction between the paper and ink (found both in text and drawings).
The image resulting from color enhancement phase has its brightness and contrast adjusted, to allow a control on the final binarization result, using Eq. (1) where
Small contrast and brightness absolute values, for example bri
Detail of a noisy document showing the effect of varying the distance cd. Note that text edges remain sharp.
The application of color enhancement algorithms, like gray world, enhances all colors, including undesirable elements like noisy spots, or foxing. Then, it is important to apply a filter on the image to reduce noise, but without blurring edges. For this task, a filter is proposed where both space and range distances are taken into account in a way similar to the bilateral filter [34].
The bilateral filter combines both range (or pixel intensity) and domain (or spatial distance), filtering using the neighbor pixels in a vicinity that are close to the pixel being filtered. Two pixels can be close to one another, not only if the spatial distance between them is small; they can also be close in a perceptual sense, i.e., if they have nearby color values. In our filter, the spatial vicinity is determined by a square window, centered at the pixel being filtered. The range distance
where
The filter works as follows: for each pixel
Example of local equalization algorithm application: (left) original image and (right) the equalized result.
Example of how the edges get thicker with greater local equalization window size (
Different types of degradation found in historical documents: (top-left) foxing, (center-left) glue from adhesive tapes, (bottom-left) uneven illumination and (right) how these problems are reduced by local equalization.
The main goal of local equalization is to prepare the image for the final binarization. The idea is to change the intensity differences between pixels, emphasizing it at opposite sides of a sharp edge and minimizing it for pixels in soft edges.
The local equalization, presented in [29], is performed through the following steps: first the image is converted into values in (0, 1] interval (0 value is converted into 0.01 to avoid division by 0) and it is scanned using a
Figure 3 shows the result of applying the local equalization directly on a sample image (not a bi-level image). The edge thickness, after local equalization, depends on the window size
Effect of first brightness and contrast adjustment on local equalized image: (left) bri

Effect of second brightness and contrast adjustment: (left) result of local equalization and (right) after adjusting bri to 
Binarization results using different values for the second brightness and contrast. Values were set to bri

Local equalization is good not only to enhance edges, but also to remove aging degradations as foxing (the brown spots that form on the paper surface), illumination artifacts (like shadows, due to acquisition process) and adhesive tape marks (samples can be seen in Fig. 5).
After local equalization, the image brightness and contrast are adjusted again. The differences observed in Fig. 6 are due to the first adjustment in brightness and contrast, as explained before.
Now the adjustments are done to transform the local equalization result closer to a bi-level image by setting bri to
Although Figs 6 and 7 exhibit results that look like bi-level images, they still have gray tones that must be binarized. Due to the fact that the images have only few gray tones different from black and white, the application of a fixed threshold value gives satisfactory results. All results are obtained using th
The values used in second brightness and contrast adjustment are not parameters to be set by the user; they are fixed values. They are set to avoid results with a washed effect, losing details (Fig. 8-left), or too noisy (Fig. 8-right).
Comparison between dithering (center) and the proposed algorithm (right) applied to an image with text and drawing (left).

It is important to notice the difference between the proposed technique and dithering or halftoning. Although both aim to generate a bi-level image, the final result is very different. Dithering, used to simulate gray scale, was very popular in printing industry but it is not appropriate to be used for OCR (Optical Character Recognition) purposes. Figure 9 shows a comparison between the proposed algorithm and Floyd–Steinberg dithering [9].
Our algorithm was implemented in Java and all the results were obtained with the following fixed parameter values: cd
Coefficient of variation of the four best ranked methods submitted to H-DIBCO 2016 for the analyzed measures
Coefficient of variation of the four best ranked methods submitted to H-DIBCO 2016 for the analyzed measures
Mean and standard deviation for the measures evaluated for the four best ranked methods submitted to H-DIBCO 2016
The first experiment was conducted on images of old documents from the end of the 19th Century and beginning of 20th Century from the ProHist project [40]. The original documents were digitized in 16 million colors, RGB color system and 200 dpi resolution. Although this can be considered low resolution nowadays, the documents were digitized in 1994. Due to their nature, there is no ground truth for these images; thus, the analysis of the results is done just qualitatively at this point. This is also coherent with our proposal as we expect the user to define what the desired final result is. Old documents were chosen as they are more susceptible to a binarization process. Figure 10 shows how some degraded documents of this dataset (previously presented in Fig. 5) can be binarized with good results by our algorithm.
Sample image from H-DIBCO 2016 [42]: (a) original image, (b) ground truth image and the results from (c) the best ranked algorithm, (d) the second place, (e) the third place and (f) the fourth place that is the algorithm presented herein. The visual inspection confirms what was evaluated by ANOVA: all four algorithms have very similar results.
Results varying brightness and contrast: (a) original image, (b) result with bri

Due to the quality of its results, our algorithm was submitted to the H-DIBCO 2016 (Handwritten Document Image Binarization COntest) [25] and ranked fourth out of a total of 12 new methods. Based on ground truth images, the contest analyses the resultant images from each submitted algorithm, comparingf-Measure (FM), pseudo f-Measure (pFM), Peak Signal to Noise Ratio (PSNR) and Distance Reciprocal Distortion (DRD). The mean, standard deviation and final score of each method can be found in [25]. Based on mean and standard deviation, Table 1 presents the coefficient of variation of the four first ranked methods (ours is the fourth as said before). The coefficient of variation is a measure of dispersion and it is evaluated as the ratio of the standard deviation to the mean value. In this sense, it can be seen that our method presented lower dispersion than the winner of the contest. The absolute results of average and standard deviation proved that the four methods are very close to each other. They are presented in Table 2, reporting exactly what was published in [25]. This could be confirmed based on the values of the metrics for each image; we have used Analysis of Variance (ANOVA) to check if the differences are statistically significant. According to ANOVA and considering the metrics used by H-DIBCO 2016 (FM, pFM, PSNR and DRD) there is no statistically significant differences between the first four best ranked algorithms:
Results varying brightness and contrast: (a) original image, (b) result with bri

Results varying brightness and contrast: (a) original image, (b) result with bri

In the third experiment, to estimate if the requirement to save ink in printing the binarized version, compared with the original image was reached, a commercial tool called APFill [41] is used. Ink or toner consumption is usually described, by printer vendors, as the number of A4 pages with 5% coverage (normal text), which can be printed using one cartridge.APFill calculates how many A4 pages the input image is equivalent to. The average result on 35 images can be seen in Table 3, which shows the average number of A4 pages the input images are equivalent to. We have not made any comparison to the results of other binarization methods as some results have large blank areas which, of course, need less ink (see Fig. 15d and f for examples). An Analysis of Variance (ANOVA) ran comparing the results of the gray images and the dithered images with our results for two set of settings: bri
Results comparing ink saving in printing
Results of some state-of-the-art thresholding algorithms: (a) original image, (b) Su-Lu-Tan [33], (c) Valizadeh and Kabir [35],(d) Howe [11] (e) Arruda-Mello [4], (f) Mesquita et al. [21], (g) Roe-Mello [29] and the results of our method with variations of brightness and contrast: (h) no variation (settings used in the experiments section of this paper); (i) bri

Figures 12 to 14 present some results of the application of the algorithm to some images of ProHist dataset. Figure 15 presents the results of some state of the art algorithms [4, 11, 21, 33, 35], including the winner of the H-DIBCO 2014 [21, 39] in comparison to ours.
Another set of document pages from Googlebooks [45] are available at the following link:
In this paper, a new binarization method for images of historical documents containing graphical elements as drawings was introduced. The main goal of this method is to binarize a document image keeping both the text as the graphical elements with ink saving in printing. The great challenge is to preserve the visual quality of the graphics also maintaining the expected quality of the text (more easily evaluated). The algorithm is guided by some constraints defined in the Introduction. All these constraints, including ink saving in print process, were reached by the proposal: the quality aspects are dealt by the brightness/contrast control fulfilling constraints 4 to 6; constraints 2 and 3 are met by local image equalization; while the constraint 1 is met by the whole algorithm. Both local equalization and brightness/contrast control ensure that the text is kept with the original stroke width (or very close).
Although the presented algorithm has only two parameters to be set, the authors intend, as future works, to optimize parameters definition to guarantee that the most suitable parameters are always selected and two approaches are being considered. One follows the same strategy adopted by Mesquita et al. [21] in which parameters are chosen by the application of a racing procedure based on a statistical approach, namedI/F-Race.
The parameter tuning finds the best general parameter configuration for a given set of input images (the settings are applied to all input images). Another approach is to analyze each input image to find the best parameter set adapted for each image.
It is worth noting that, in general, both the tuning and the adaptation to the best case for each image depend on some kind of training based on ground truth images, which is not available for the datasets used. Due to the subjective nature of the results in drawings of the document, by now, we can only make adjustments considering the text areas. For this reason, this problem is still under research.
Footnotes
Acknowledgments
The authors would like to thank Dr. Basilis Gatos and Dr. Konstaninos Zagoris for providing the images and data from H-DIBCO 2016.
