Abstract
Taking Uyghur character recognition as an example, this paper use a method of image segmentation which combines traditional methods with CNN, and transforms and implements the Uyghur character on intelligent devices. Firstly, after analyzing the application of several general segmentation methods, this paper finds some shortcomings but also some ideas for Uighur segmentation. Then, starting from the characteristics of Uyghur language, such as structure, word formation and input habits, the author studies the idea of Uyghur adhesive language segmentation from the perspective of language characteristics, puts forward the basic algorithm of Uyghur symbol segmentation, and applies the Uyghur character adhesion segmentation based on minimum spanning tree and multi-queue primitive merging model to improve the segmentation efficiency. In addition, in order to solve the limitations of the traditional handwriting recognition framework of “preprocessing
Introduction
Usually, the traditional handwriting recognition is based on handwriting basic database and segmentation and recognition technology, which mainly includes three parts: preprocessing, feature extraction and classification recognition. There are 2 methods for the traditional recognition process of Uyghur characters. One is to recognize the whole word directly, which has rich desirable features but needs to establish a large number of word databases; Another method is to segment words (or sequences of overlapping letters) first, and then recognize the single characters after segmentation. In recent ten years, researchers at home and abroad have got features from two directions: statistical recognition and structural recognition, and used artificial intelligence, neural network, Hidden Markov Model, HMM, Support Vector Machine, SVM, quadratic linear decision function, etc. As the core or a series-parallel classification method to classify and recognize, which has achieved high recognition results.
The algorithms for handwritten Uyghur language recognition are mainly based on the main body direction code features, stroke structure features, stroke relationships, direction line elements and other features. At present, most of the classification methods we use refer to Chinese characters using Euclidean distance classification, SVM, HMM, neural network and so on, and use a certain classification method or multiple classification methods in series and parallel for classification and recognition. Although these methods draw lessons from Chinese recognition methods, their recognition rates are limited both in theory and in practice due to the characteristics of language itself. At present, one of the key points in the research of Uyghur and other conglutinated characters recognition is to seek a multi-feature and multi-classifier fusion recognition method suitable for language features.
In fact, not only the traditional Uighur recognition method is limited by the characteristics of the characters themselves, but also the research of recognition technology has encountered bottleneck. Many languages that are widely used in handwriting recognition at present have also entered a bottleneck in improving the handwriting recognition rate. For example, after more than 40 years of unremitting efforts by researchers, HCCR has achieved great success. Taking the literature [1] as the example, the discriminant feature extraction method and the discriminant learning quadratic decision function classifier are used. On the challenging online and offline handwritten Chinese character data sets CASIA-OLHWDB and CASIA-HWDB, the best recognition rates of online handwritten single character recognition are 95.28% (DB1.0, 4037 Chinese characters), 94.85% (DB1.1, 3926 Chinese characters) and 95.31% (ICDAR2013 Competition DB), respectively 3755 Chinese characters), the best recognition rates of offline handwritten single character recognition are 94.20% (DB1.0), 92.08% (DB1.1) and 92.72% (ICDAR 2013 Competition DB). However, handwriting recognition, whether online or offline, still does not achieve high enough accuracy. The recent research shows that the recognition rate of the mainstream handwriting recognition software in the market has not reached high precision, the recognition rate of many well-known brands of handwriting input method software is less than 90%, the recognition rate of the best system is only about 95%, and the character set supported by many software systems is very incomplete. Under the text line/single word or overlapping/single word mixed writing recognition mode, the recognition rate of many systems drops sharply, and the recognition performance experience of related software products needs to be improved urgently. Therefore, we can draw the conclusion that handwritten Chinese character recognition is still an unsolved and challenging research topic [2].
With the rise of deep learning, the handwritten Chinese character recognition problem has brought great vitality and extremely effective solutions. For example, since 2011, the winners of two consecutive ICDAR (International Conference on Document Analysis and Recognition) handwritten Chinese character recognition competitions have all adopted methods based on deep learning or neural network. It is worth mentioning that in the ICDAR Handwritten Chinese Character Competition in 2013, Graham from The University of Warwick won the first place in online handwritten Chinese character recognition with the method of deep sparse convolutional neural network, and its recognition rate was greatly improved to 97.39%, while the team from Fujitsu Company won the first place in offline handwritten Chinese character recognition by using the improved CNN network, with a recognition rate as high as 94.77%. Whether online or offline HCCR, the results obtained by the above-mentioned methods based on deep learning are far ahead of the traditional methods, showing the great potential of deep learning in the field of character recognition. Compared with traditional methods, deep learning can achieve higher recognition rate, but it also has some problems, such as longer training and testing time, too large dictionary storage and so on [3]. Therefore, the application of neural network based on deep convolution in Uyghur text recognition is a very challenging research topic, and the research of its recognition technology needs to be further expanded and deepened.
Application analysis of several general segmentation methods
Recursive method of fast segmentation
Put forward by Okamoto et al., this method has achieved good results in the segmentation of printed English letters. The basic idea of this method is to first perform horizontal projection, segment vertically adjacent characters or character strings, and then recursively segment vertically projected characters, such as:
Hi Poly How are you
The segmentation process is as follows (Fig. 1).
Recursive segmentation.
The greatest strength of this method is its simple algorithm and fast segmentation speed. However, if there is no obvious blank in the horizontal and vertical directions, this method is difficult to apply. In view of Uyghur handwriting input habits and other reasons, there are few cases that actually meet this requirement, so this method has only theoretical significance for Uyghur segmentation, and the actual segmentation efficiency is poor [5].
The basic idea of this method is to find a window suitable for stroke size by calculating the bounding box of strokes, and then use the window to segment symbols. The specific process is as follows:
The width and height of the bounding box of each stroke in the sample book. Determine the size of a window, sum the width and height of the bounding box of all strokes and take the average value, and add a constant as the width and height of the window on the basis of the average value. Use the window to judge whether the strokes belong to a character. If the strokes are in the same window, then they are judged as a group of strokes belonging to a character. This window is represented by the dotted line in Fig. 2 below.
Window segmentation.
Through the analysis of window method, the accuracy of Uighur symbol segmentation is not high. In fact, the greater the difference in the size of the symbols to be recognized, the worse the segmentation effect will be, because the window method uses windows of uniform size to segment symbols. One of the important distinguishing features of Uyghur symbols and letters is the difference in aspect ratio, and in actual writing, one Uyghur letter often exists in the window of another letter, which easily leads to segmentation errors [6].
Martin, G once proposed a similar method of segmentation by sliding window, which is also a typical method based on recognition segmentation in handwriting recognition, and will not be described again.
This method mainly takes into account that ordinary characters are mostly composed of more than one stroke, so believe that the bounding box of a group of strokes is more effective than the bounding box of strokes. As shown in Fig. 3, bounding box of strokes are used to determine stroke grouping.
Bounding box of strokes
For example, the Uyghur symbol “ئ٠” has two strokes, and two main points are specified for each stroke, which are marked with circles. They are the upper left point and the lower right point of the bounding box of each stroke, and the coordinates are
Example of bounding box of stroke group.
The attributes of bounding box are determined below, where Width represents the bounding box width and Height represents the bounding box height, so the calculation method of the bounding box range and attributes is as follows [7]:
Using the bounding box information of characters, the segmentation process mainly depends on two characteristics, that is, the position relationship of strokes and the time between two strokes. Among them, the positional relationship of strokes can be divided into two types, namely, cross relationship and relative relationship. In case of there are one or more intersections for two strokes (including one bounding box contained in the other), then the two strokes are cross-related, and they are judged to belong to the same group of strokes; Otherwise, we can judge them by relative relation. As shown in Fig. 5 below.
Cross-relationship.
Three cases of relative relationship.
There are 3 cases for the relative relationship: The first case is that there is no intersection between two strokes, but the distance between them is within a given threshold range, so the two strokes also belong to a group of strokes. The first case is that two horizontal lines with similar sizes exist in two strokes. The third case is one of the two strokes is one or several points, and the other stroke is above (or below), then they belong to the same stroke group. As shown in Fig. 6 below.
Compared with the window method, the bounding box method has been improved, mainly because it does not use a uniform window size for segmentation, but uses the stroke bounding box intersection method, so this method is more efficient than the window method in character size difference when segmenting characters. In Uyghur handwritten symbols, some symbols meet the conditions of this method, but there are also quite a few symbols which are intersected by too many bounding boxes, so this method alone can not meet the requirements, but this method provide an effective solution [8].
The basic process of segmentation using dynamic programming is to find a path in the graph and separate each character in the graph. This path of character segmentation is the segmentation point to be found, that is to say, the segmentation process is to search for the best path. Dynamic programming algorithm is one of the most effective methods in solving the best path at present. Dynamic programming is a multi-step decision-making process. This method simplifies the N-step process into N single steps, which greatly reduces the computational complexity. Dynamic programming method needs to determine its search space, cost function and search method [9].
Method based on soft decision
This method constructs a virtual network, and calculates the probability that a stroke group belongs to or constitutes a character on the basis of defining the type and frequency of each stroke.
From the above analysis, it can be seen that all these segmentation methods have certain advantages, especially for characters with similar size and relatively independent symbols, but they also have some problems. Combining the characteristics of Uighur symbols, this paper puts forward a technical method to improve the segmentation performance [9].
Segmentation method based on CNN and multi-queue primitives
Application of multi-queue primitive merging model
In this paper, combined with primitive processing and based on contour features, a multi-queue primitive merging model for Uyghur symbol segmentation is proposed, so as to improve the segmentation accuracy.
The basic steps of over-segmentation based on contour features are as follows:
For a Uyghur handwritten sequence, the object of over-segmentation is the main stroke (that is, the symbol main body within a certain threshold range of the baseline). Then the main stroke is obtained and its connected domain is segmented to make it a primitive. Generally, additional strokes may not be segmented, because additional strokes are already primitives in general. Over-segmentation is a method of segmentation using the related features of contour, which can find all suitable segmentation points with high anti-interference. The main purpose of finding contour features is to find the feature points of the contour, and then apply structural element rules to filter the segmentation points. The structural elements of Uyghur symbols mainly include holes, additional symbols, symbols above the baseline, symbols below the baseline, symbols across the baseline, turning points, etc. The process of searching for segmentation points in Uyghur is as follows:
Selecting the local lowest point of the upper contour as a candidate segmentation point. The point where the straight line distance from the upper contour to the lower contour is within a certain threshold range is selected as the candidate segmentation point. Similar to Rule 2, all points whose distance from the lower contour to the upper contour is within a certain threshold range are selected as segmentation points. The intersection of the upper contour through which the baseline passes is selected as the segmentation point, and Hough transform method is used for baseline detection. Filter the segmentation points obtained above, and the final segmentation points should ensure that the straight line distance between any two candidate points should be less than a threshold, otherwise, delete one of the candidate segmentation points.
These segmentation rules are based on the characteristics of Uyghur symbols, that is, adjacent letters in a Uyghur word are usually connected with each other on the baseline (within a certain range). Based on this phenomenon, the above rules select the points on the baseline as the initial segmentation points. Theoretically, these three different characteristics can find almost all segmentation points, which include local minima found according to rule 1, bottleneck points of upper and lower contours found according to rule 2 and rule 3, and baseline intersection points found according to rule 4. Although the candidate segmentation points searched by using these rules as a whole will have redundancy, its advantage is to minimize the missing points [10].
Considering the difficulty of Uyghur primitive merging, we use the multi-queue primitive merging model to deal with the complex Uyghur primitive merging problem. The processing of this model mainly includes stroke subject merging, stroke addition merging and stroke addition and stroke subject merging. In the whole merging process, geometric information is combined, and comprehensive information of recognition and logic is applied to analyze the rationality of the final merging scheme.
For the queue model of Uyghur symbols, we divide primitives into three parallel queues, namely, the main body, the queue above the baseline and the queue below the baseline, thus forming a three-dimensional state space. All segmentation schemes of primitives are described by state path. Thus the merging of Uyghur primitives becomes a mathematical problem of path optimization, and the symbolic confidence is calculated by various feature information. In this paper, the dynamic programming method is used to optimize the path segmentation.
The basic idea is to simplify the complex layout of primitives into multiple parallel primitive queues, and then describe and optimize the primitive merging scheme [11, 12].
Set the sequence
Among them,
For vector sets
Among them,
There are restrictions on the first and last states of the split path
The purpose of segmentation is to optimize the objective function:
This is the confidence weighted average of independent symbols. In which
Then there is the process of using dynamic programming to solve the best path. First, the dynamic programming of one dimensional element merging is solved as follows:
In this primitive merging model, any legal Uyghur symbol is formed by merging a continuous group of primitives in the primitive sequence. It is assumed that there are N pieces of Uyghur symbol primitives
Among them,
According to the primitive model of Uyghur segmentation, the one-dimensional dynamic programming is extended to the three-dimensional dynamic programming of Uyghur symbol primitives, namely:
For the 3D Uyghur symbol primitive segmentation sequence
In which
The initial state and the final state of this dynamic system are limited to
Among them,
And
The process of optimization is to find the maximum objective function
Uyghur handwriting recognition model based on CNN and random elastic deformation
CNN is composed of input layer, hidden layer and output layer. And the hidden layer is divided into three categories: 1. Convolution layer for feature extraction; 2. Sampling laye for feature optimization and selection; 3. Hidden layer in traditional multilayer perceptron. In accordance with the characteristics of Uyghur, especially the problem of too many similar characters, the input images are convolved with 8 trained filters (K
The sampling layer improves the anti-noise ability by reducing the resolution of the feature map, that is, by sampling the feature map of the convolution layer to extract important classification features, while ignoring irrelevant details. At present, the sampling layer of CNN model mainly adopts two construction methods: subsampling (using the sampling method of average value) and maximum pool (using the sampling method of maximum value). Each neuron in the sampling layer is connected to the neighboring neurons in the 2
Where
Uyghur handwriting recognition method combining domain knowledge with CNN
In order to avoid over-fitting in the training process and improve the recognition performance of CNN model, in addition to some classical methods such as Dropout, obtaining sufficient training samples is the prerequisite to ensure the high performance of CNN model. However, the training samples obtained are very limited in many cases. Therefore, we should consider applying data addition technology to improve the robustness and popularization ability of CNN system. Based on affine transformation, this paper further discuss the elastic deformation data generation technology applied to online Uighur recognition. Theoretically, these methods can enrich the number of samples in the sample space without changing the internal structure of the original character sequence, and control the degree of sample transformation to make the generated samples conform to the actual sample distribution.
Based on the sparse CNN model, this paper further introduces the path integral feature to act on the input layer of CNN as an on-line feature extraction method of time series information. In theory, the introduction of path integral feature map can make the recognition result on the data set obviously superior to that of MCDNN. In the design, it is considered to further add relevant domain knowledge such as nonlinear normalization, virtual stroke technology, eight-direction feature and path integral feature map as prior knowledge. For example, virtual stroke technology is mainly used to weigh the weight distribution between real strokes and virtual strokes, while eight-direction feature and path integral feature are mainly used to complement each other to improve recognition performance.
System prototype and implementation technology based on intelligent equipment
The related techniques of Uyghur symbol segmentation are proposed in this paper, and the implementation techniques of the system mainly include MFC application and Windows Mobile (WM) development technology. This paper will introduce the prototype system and its development and running environment, and explain the Visual Studio IDE using MFC class library. In addition, in order to apply some related algorithms based on windows platform to smart devices, this paper also studies and analyzes the event-driven and message response mechanism of WM, as well as the key steps of porting Windows programs to WM programs, and summarizes and analyzes many differences between WM programs and Windows programs.
Visual Studio IDE
Visual Studio can create applications and network services under Windows environment, in addition it can create applications, network services and Office plug-ins for smart mobile devices. Besides, Visual C++ can not only be integrated with Visual Studio, but Visual C++ can be installed separately for convenience.
Visual Studio contains a variety of enhancements, such as language enhancements that can handle almost all data types, visual designers, and a large number of improvements in Web tools. Visual Studio packages a wealth of tools and frameworks for developers to use, so as to facilitate the creation of applications.
Visual Studio integrates MFC. MFC class library encapsulates some Windows API in the form of C language classes, and also contains a complete program framework to help developers reduce their workload. MFC includes many encapsulation classes of Windows handles and encapsulation classes of Windows built-in components.
Event and message mechanism of WM
WM programming also belongs to event-driven design mode. WM program provides users with many visual objects. By selecting any possible functions, the functions selected by users will generate corresponding events. Then, the event transmits the message to the specific object of the program, and then the object performs the corresponding function by calling the message function. The visual interface provided by WM application is also regarded as an object, and the operation performed on the visual object will trigger the corresponding function through an event. The process in which WM application runs is the process in which external users operate to form events, and the events are processed accordingly.
The event mechanism of WM originates from Windows. Loading message and response message are the unique mechanisms of Windows core. In Windows applications, messages respond to events, and events form messages. This processing mechanism is between users and systems, between systems and programs, and between programs, as shown in Fig. 7 below.
Message response mechanism.
It has been proved that most Windows applications can be ported to WM, and the workload of porting is much less than rewriting these programs, especially those based on MFC. Compared with Windows applications, there are many differences in developing WM applications. based on the Uighur handwriting recognition system, the differences between WM and Windows programs are analyzed as follows [16, 17, 18]:
The difference between Microsoft Win32 API and Windows CE API
Win32 API includes Windows CE API, that is, Windows CE API is a subset of it, and many of its functions have been simplified. For example, in the programming of Uyghur handwriting recognition system, it is necessary to make a file on Windows CE API that correctly displays and splices Uyghur fonts on mobile devices, because its support for fonts is limited.
In addition, some extended functions of Windows CE API, such as touch screen and notification, need hardware support.
There are limitations in exception handling of Windows CE. Although it supports structured handling mechanism, it does not support exception handling of C++.
When porting Win32 programs to the target Windows CE platform, the most time-consuming task is that there is no corresponding Windows CE API to replace Win32 API, and the related functions need to be redesigned.
The difference between Microsoft MFC and Windows CE MFC standards
Although Windows CE MFC is in line with the specification of Microsoft MFC, it is quite different in the structure and function of providing classes. In addition, some features in Windows CE MFC are only for Windows CE. In this paper, to transplant Windows applications, it is necessary to systematically check the classes and their methods, attributes and scopes used by the programs and determine their compatibility with Windows CE MFC.
The scope and critical value of variables
WM mobile devices have small memory, CPU is slower than PC, and the scope and critical value of variables are different to some extent, so special attention is required in programming.
Memory limitation
Generally, the memory of mobile devices is smaller in capacity and speed than PC. In most cases, it is necessary to reduce the size of the application to migrate to WM. In this paper, when relevant codes are transplanted to WM, necessary functions such as sample data collection, recognition of Uyghur script mother and conjoined segments are modified, while relatively complex functions such as segmentation are no longer considered.
Treatment of testing and debugging process
The programming work is carried out in WM simulation environment, which is different from the practical application in intelligent devices. Generally, before an application is officially used, it must be tested on all the devices that are expected to run the application, not just relying on the simulation environment.
Compact Framework tools
Compact Framework simplifies the development process of intelligent terminal applications, and provides virtual devices such as Pocket PC, WM, Pocket PC Phone and other devices in windows CE.NET environment.
The NET Compact Framework procedure is created on Visual Studio .NET in this paper, and a Windows procedure is created on Visual C++ .NET.
Key factors for porting windows programs to WM
We don’t annotate the specific steps of transplantation in detail here, but will clarify the key points, which need to be handled according to the requirements of the program in practice.
Find the corresponding Windows CE API
Although Win32 can usually backwards compatibility Win16 functions, the simplified Windows CE can’t. Check API references in programs (such as functions, messages and data types, etc.), and replace or modify references that are not supported by Windows CE API. For example:
Win32 functions is not supported. If there is a replacement function, it will be updated as a replacement function. If not, it is necessary to establish a function to replace its functions, such as LineTo function and MoveTo function, which are no longer supported by Windows CE, and the function PolyLine is used to replace them.
The same function function of Windows CE replaces the original function function of Win32, for example, CommandBar function.
Win32 functions with some limitations. For example, the parameters of some functions have changed or the scope of the parameters has changed, such as CreateWindow function and CreateWindowEx function, whose window style has changed.
Some data types need to be modified.
Many messages are no longer supported. For example, many WM_* messages and EM_* messages are no longer supported, some changes have taken place in wParam, and some new messages have been added to Windows CE, such as WM_HIBERNATE.
Windows CE memory
For Windows CE applications, we should minimize the use of memory, simplify or even eliminate functions or files that consume large amounts of memory, and try not to use or keep temporary files as much as possible. Even sometimes, you can rewrite the code to reduce the speed in exchange for reducing the use of memory, and find a suitable compromise.
If there aren’t enough system resources, the WM_HIBERNATE message of Windows CE will inform the corresponding program. If the program receives this message, it will release resources and return the memory to normal level as much as possible. The key of this process is WM_HIBERNATE. A robust program should handle WM_HIBERNATE well and respond by releasing temporary files when resources are scarce.
Power consumption management
Mobile devices mostly run on batteries, so try to avoid unnecessary calls to CPU by code in programs. The use of PeekMessage function needs to be cautious, which will make CPU cycle run almost continuously.
Graphical user interface
Many Win32 GDI functions are no longer supported in Windows CE, so they need to be changed before transplantation.
Adjust bitmap and icon
Windows CE devices have smaller touch screens and different sizes, so it is necessary to modify the program to adapt to similar restrictions. It should be noted that the static layout should not be arranged as much as possible, but the screen size should be defined after obtaining the size through the function GetSystemMetrics.
Unicode encoding
Due to Windows CE itself is a Unicode coding environment, and it also supports calling ASCII functions, it can facilitate the exchange of text and files, which also facilitates the test of the algorithm in this paper. The conversion method from ASCII program to Unicode program is as follows:
Include Tchar.h, so as to include all necessary transformations.
Choose Win32 string processing function, such as lstrlen function, instead of using c runtime library.
Clearly select TCHAR, LPTSTR and other types in the statement. Make the code easier to compile into Unicode and ASCII.
select TEXT and _T macro to map the character string, such as text (“Uygur Pen”).
Make sure that the length of characters is not 1 byte, and the character string ends with 2 zeros.
To ensure the validity of Unicode and ASCII, sizeof(TCHAR) should be used when adding array pointers and counting characters.
Management window
Windows CE has few window styles and functions, so there is no available resize processing, and the window can only be the size defined when it was created.
Experimental results and analysis
Experimental evaluation index
This paper continues to use the following four groups of indicators to characterize the performance of the system (there are two calculation modes I and II), so as to evaluate the advantages and disadvantages of the algorithm. See Reference [19] for detailed indicator system and data set.
Accuracy CRSR Error Segmentation/Recognition Rate ERSR Segmentation/Recognition Rejection Rate RRSR Average Time of Segmentation/Recognition ATSR Segmentation/Recognition Accuracy PSR
Segmentation results of common syllables
In this experiment, we made a statistical analysis of the segmentation and recognition operations of the two test sets. Among the 400 commonly used syllables of random data, 366 were correctly segmented, and 20 syllables were segmented. However, there were also cases of incorrect segmentation or partial segmentation errors, and 14 syllables were rejected. Therefore, according to the second mode,
The segmentation accuracy is:
Segmentation error rate is calculated according to mode II:
Or
The segmentation rejection rate of available RCD test data is:
Similarly, four groups of index values of SCD test data can be obtained, as shown in Table 1.
Index value of segmentation test
Index value of segmentation test
The statistical results of RCD and SCD identification experiments are shown in the following table.
Test results of RCD and SCD
Based on the above two tables, compared with the previous traditional methods, we can conclude that CNN’s segmentation ability and recognition ability have been improved to a certain extent, and the rejection rate has been significantly reduced. (The result of the traditional method is that the segmentation error and rejection ratio is close to 14%, and the recognition rate of single character is less than 90%, see Reference [19] for details.) Moreover, this paper has completed a relatively complete test on smart devices for the first time. Although the data set has not changed, it also verifies the effectiveness of the hybrid method proposed in this paper on smart devices to a certain extent. The major problems encountered in this paper are that the time and space complexity of the algorithm is relatively high, and the battery power consumption is large, and the equipment heat generation is large. Next, we plan to continue to optimize the algorithm to reduce the time and space complexity of the algorithm, and test it on the latest basic database Ucpen2.0.
Footnotes
Acknowledgments
This work was supported by Natural Science Foundation of Xinjiang Province (No. 2018D01A28), Scientific Research Program of the Higher Education Institution of XinJiang (XJEDU2017M021), National Natural Science Foundation of China (No. U1903215, 61363062), Xinjiang University Research Foundation(No. BS180250).
