Abstract
This paper details the examination of a particular case of data compression, where the compression algorithm removes the redundancy from data, which occurs when edge-based compression algorithms compress (previously compressed) pixelated images. The newly created redundancy can be removed using another round of compression. This work utilized the JPEG-LS as an example of an edge-based compression algorithm for compressing pixelated images. The output of this process was subjected to another round of compression using a more robust but slower compressor (PAQ8f). The compression ratio of the second compression was, on average, 18%, which is high for random data. The results of the second compression were superior to the lossy JPEG. Under the used data set, lossy JPEG needs to sacrifice 10% on average to realize nearly total lossless compression ratios of the two-successive compressions. To generalize the results, fast general-purpose compression algorithms (7z, bz2, and Gzip) were used too.
Introduction
Data compression is the science of decreasing the size of data files. Compressed data is the final product of the compression process, and it requires less space and can be transmitted quicker. The first compression algorithm removes redundancies in the data [18], and theoretically, the data cannot be compressed further after being compressed by the first compression algorithm; therefore, instead of subjecting the compressed data to the second round of compression, scientists usually enhance the existing compression algorithms or invent other methods of compression. Enhancing the compression ratio of an algorithm can be done via pre-processing by changing the format of the data to make it more amenable for compression, or in-processing by modifying the algorithm from within to increase the compression ratio. Dealing with data in compressed format post-processing is a challenging task, and in most cases, the result of a second compression is an expansion as opposed to a shrinkage. Failing to compress compressed data results in the data being classified as random data [21], which is consistent with the Kolmogorov algorithm’s definition of randomness [23]. There is research that investigated the randomness of compressed data, but most reported the absence of meaningful correlations [3, 11], reflecting the complexity of the nature of such data. Despite the complexity of the compressed data, the overall behaviour of compressed data is typical across the board and can be captured when analysed based on reference patterns such as the Fibonacci code [10]. However, in the case of certain types of data, compression algorithms could fail to remove (all) redundancies within it, or it creates another redundancy when compressing the data. This paper discusses the edge-based compression algorithm. The algorithm, when removing redundancies from specific data (pixelated data), creates another redundancy. Understanding the algorithm helps in the selection of a suitable algorithm for specific types of data, as certain types of data might require more than one level of compression. The concept developed in this work applies across the board for other algorithms such as PNG, which utilizes the same prediction mechanism for the next pixel value. The rest of this paper is organized in the following order: Section 2 focuses on the background, Section 3 describes the methodologies and the results, and Section 4 concludes this work.
Theoretical Background
This section details the foremost example of an edge-based compression algorithm called JPEG-LS. The definition of a pixelated image and how edge-based compression utilizes its pixilation will also be elucidated.
Edge-Based compressors (and JPEG-LS)
[11] detailed a famous lossless algorithm, called CALIC. The core of this algorithm is a prediction scheme called GAP (gradient- adjusted predictor). Its early work was introduced in [26] and refined in [15]. The idea behind GAP is quite simple; it predicts the current value of the pixel based on the seven neighbors on the top and the left of the current pixel. Another famous algorithm in the category of lossless compression is JPEG-LS. It is registered as a US patent No. 5,680,129 and US patent No. 5,764,374, and owned by HP Co. JPEG-LS was introduced in [22–24] and it uses the prediction technique called Median Edge Detectors (MED). It predicts the value of the current pixel based on three neighbors (unlike GAP, which require seven neighbors). MED works in a more straightforward way than GAP, as it detects the horizontal and the vertical edge and predicts accordingly. In the case of edge-based model, a combination of the three pixels can be used to venture a guess. Two essential steps need to be carried out before encoding: prediction and error modeling. For the former, the horizontal and vertical edges detection can be done by examining the neighboring pixels of the current pixels, as shown in Figure 1.

The neighboring pixels around the current pixel.
Pixel B is used when a vertical edge is detected, while pixel A is used when a horizontal edge is detected. When there are no edges, a combination of pixels A, B and C are used. This simple prediction approach is called the MED prediction. Pixel X is predicted as follows:
Pixilation is the process of making pixels easily visible. The pixelated images come either from low resolution in artworks, for example, as a deliberate attempt to mask imperfections of details in the images. Pixelated images are heavily used in computer games due to its high-speed requirements. A perfect pixelated image is when the similar local intensity forms a small square of four pixels (2x2) that continually varies as per Figure 2, because if pixels beyond the four pixels (the pixel D specifically) share similar intensities, the algorithm will shift to the run mode. Image processing software such as Photoshop can be used to create such pixelated images.

An example of the image ideal recompression.
As the prediction mode is the dominant working mode in the pixelated image and due to the similarity in intensities in the pixelated image, the prediction error is mostly zero. One possible way to explain why the JPEG-LS produce compressible output is: When the error is zero, it causes the parameter of Colomb-Rice coding in JPEG-LS (which is called a code parameter k as in [26]) to become mostly 0, which encodes the error in a unary code. The word "mostly" is essential because the parameter of Colomb-Rice in JPEG-LS is affected by the contexts of where the error occurs, which would change the parameter from being zero. When the parameter is zero whereas the the error is not zero,Colomb-Rice is coding the error in a unary code which creates a long sequence of ones (or zeros) that could play an essential role in the creation of a compressible code. For large squares of similar pixels (more significant than 2x2), JPEG-LS will work in a predicted mode on the edges of the square, while in the heart of the square, the run mode dominates. Figure 3 shows a visualization of this concept.

The regular mode and the run mode.
In the figure, the square with the "X" sign represents the pixel working under the regular (prediction) mode, where JPEG-LS will detect the horizontal (top row) and vertical (left and right column) edges. In the middle, the run mode will occur.
The datasets used in our experiments were divided into two groups; the first contained images without pixilation and is called group 1, while the second is pixelated images called group 2. Both groups were compressed using JPEG-LS for its first compression phase. PAQ8f was used for the second compression due to it being a robust, lossless compression algorithm and its high compression ratio. The general-purpose compression algorithm was then applied due to its (high) speed, even at a lower compression ratio. The PNG was used alongside the JPEG-LS as a first compressor to ensure that any algorithm that follows the paradigm of edge-based compression algorithm will also produce compressible data. The edge-prediction mechanism and JPEG-2000 uses a different algorithm based on integer wavelet transform. The resulting compressed data was then fed into the second compression algorithm (PAQ8f). The following is a brief description of the algorithm and data sets used in this work.
Datasets used
The first group (group 1) comprised of 11 non-pixelated images, while the second group (group2) consist of pixelated images (group 2)[1, 2]. The pixelated images received other pixelated images to produce two by two pixels of similar intensities, as described earlier. Both groups were in the raw format (PGM) of 8 bits per pixel. Figure 4 and Figure 5 depict groups 1 and 2, respectively.

Grey images without pixilation (group 1).

Grey images without pixilation (group 1).
The data sets were first compressed using the JPEG-LS compression algorithm via IrfanView. PNG and JPEG-2000 were used for comparison between JPEG-LS and other image compression algorithms. JPEG-2000 is a wavelet-based lossy-to-lossless transform coder [20]. IrfanView can be used to apply JPEG-2000. PNG uses a 2-stage compression process: pre-compression filtering (prediction) and the DEFLATE algorithm. Bzip2 (Bz2 for short) concatenates RLE, Burrows-Wheeler transform, and Huffman coding. It also uses the 7-ZIP software. Gzip is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding. It also uses the 7-ZIP software.
For the second compression, a reliable compressor from the family of PAQ was used, which was the PAQ8f, described in [13] and available at [12]. However, PAQ is a slow compression algorithm due to the context modeling it uses, which is a time-consuming process, and, therefore, for the second compressor, fast general-purpose compressors, such as 7z, Bz2, and Gzip were used for comparisons and generalization.
As the second compression produces a high compression ratio, the double compression was compared with the well-known lossy jpeg [19].
Second compression for non-pixelated images
The first compression for the compressed data in JPEG-LS format was compressed for the second time using PAQ8F. The measurements used for evaluating the quality of the Compression are as follows:
The Bpp(bit per pixel) measurement was used due to its status as a global measurement able to show improvements via the two-phases of compression, while the only show improvements from the raw to the first compression or from the first to the second compression. It is also essential to quantify space-saving, and it is shown in percentage form in this work to make it easier to visualize and understand. The results of the group 1 experiment are shown in Table 1.
The result of first and second compression for group 1
The average compression ratio of the first compression via JPEG-LS for this group is high (2.219). The average saving is 50.126 %, with a minimum of 35.465 % and a maximum of 72.523 %. The saving is half of the image size, confirmed by its bpp value of 0.499. For the second compression via PAQ8f, the results show almost no improvement in the compression ratio, and the average bpp is nearly the same as that of the first Compression. There is only a marginal improvement in its bpp (0.002) and saving size (0.440 %) in the second compression ratio, both of which are almost negligible. The result of the second compression agrees with the common intuition in data compression, where it is assumed that the second compression is complicated, and if it does take place, it is purely by chance.
Since the images in this group were pixelated similarly, it is expected that the second compression would produce a close compression ratio. Table 2 shows the result of the application of the second compression using PAQ8F.
The result of first and second compression for group 2
The result of first and second compression for group 2
The compression ratio of the first compression, on average, is 4.463, with an average saving of 76.5 %, which is very high for lossless compression. The real saving of the files does not differ from the average, and its minimum is 69.215 %, and its maximum is 86.792 %. Relative to the first group, the compression ratio for the first compressor of this group is high. The second compression using PAQ8f resulted in a compression ratio of 1.228, with an average saving of 18.472 %, which is also a very high percentage for the data that is supposed to be random. The overall reduction in size can be obtained from the global bpp, reduced from 0.235 to 0.192, with an overall saving average of ~ 95 %. Such an effective compression needs to be compared with the lossy algorithm, which will be detailed in later sub-sections. It can be concluded from the results of the second compression that edge-based compression algorithms best compress pixelated images due to the output being compressible again at a percentage of saving.
The PAQ8F compressor used in the experiments is powerful but slow. Three commonly used general-purpose compressors described earlier to ensure that the second compression can be performed using any general-purpose compressor. These compressors are typically used extensively to decrease resources, and the comparison has a practical impact. Table 3 shows the result of applying the second round of compression using fast general-purpose compressors.
Second compression using general-purpose algorithms
Second compression using general-purpose algorithms
As can be seen in Table 3, the files for group 2 are still compressible with an excellent ratio using fast general-purpose algorithms, which can create a replacement for the lossy Compression for these images. This seems to suggest that it is better to use PAQ8f and similar compressors for offline processes, or when there is a need to decrease sizes at any cost. If speed is vital, as is the case with online compression or in a situation where time is a critical factor, a faster general-purpose algorithm should be used instead.
Despite using JPEG-LS as a first compressor, it should be pointed out that any algorithm under edged-based also behaves similarly. For this reason, one algorithm from this category, and another algorithm from a different data compression category were examined.
One of the most widely used algorithms is PNG. It utilizes the same prediction scheme as JPEG-LS. It is therefore expected that the PNG produces a compressible code for the same set of pixelated data samples. A lossless JPEG-2000, which uses an integer-based wavelet transformation, was used to eliminate the idea that any lossless image compression could behave similarly with the edge-based algorithms. Its mechanism is far too dissimilar to the edge-based algorithms. The second compression was performed using PAQ8F, and the results shown in Table 4.
The first compression using JPEG-LS, PNG, AND JPEG-2000
The first compression using JPEG-LS, PNG, AND JPEG-2000
PNG, similar to JPEG-LS, produces compressible code even though the second compression of JPEG-LS produces a higher size reduction over PNG. This could be because both algorithms handle post-prediction differently.JPEG-2000, on the other hand, does not produce any compressible code because it is not edge-based algorithm; hence the result of its second compression is marginal.
The previous results from the second compression lead to the comparison with the lossy compression algorithm. The lossy JPEG was selected due to its ubiquity in usage and the literature. The parameter of quality was adjusted to be as close as possible to the results of the second compression, as shown in Table 5.
Comparing the second compression with the lossy JPEG
Comparing the second compression with the lossy JPEG
The quality parameter of lossy JPEG compression was adjusted manually for each file, and the last column represents the best quality percentage. The second compression, which is lossless, will retain the details of the original images, while the lossy JPEG will sacrifice a tremendous amount of detail to decrease the original file into approximately the same size. Figure 6a compares an example of an original and restored image after lossy JPEG to show the details that were removed by lossy JPEG from the image,despite it is visually challenging for the eye to differentiate.The size of the "Old_woman" becomes 59,528 bytes after the second compression. In order to lower the original size of the image using a lossy JPEG, the quality of the compression was set to ~ 90 % with ~ 10 % loss of information, the latter of which is depicted in Figure 6c to showcase the effects of loss, where grey represents the pixel values changed by lossy JPEG.

The details loss by JPEG compression compared to the original.
This study addresses a rare case in data compression, where the compressed data are re-compressed. It shows how JPEG-LS and edge-based compression algorithms, in general, produce compressible data that can be compressed (again) using any general compression algorithms. This case occurs when the data under the first compression algorithm are pixelated images, where similar intensities composed of squares start from 2 by 2 pixels. The compression ratio of the second compression is very high (~ 18 % under PAQ8F) relative to that of random data. Two successive compression were compared to lossy JPEG, and it was seen that the two successive compression for this type of data are favorable relative to the lossy compressor. The quality parameter of the lossy JPEG, on average, needs to be adjusted to ~ 90 % of the quality, which translates into ~ 10 % missing details. The details removed by lossy JPEG decreased the overall quality of the reconstructed image, as shown in previous sub-sections, suggesting that the use of double compression of pixelated images gives better result than the lossy compression.However it is challenging to identify the pixelated image so the edge-based compression algorithms can be apply. This problem can be solved if the we know in advance what kind of images usually used in a particular field.For example, in computer games where pixelated images are frequently used, or when pixelated images are deliberately used as a kind of art.
