Abstract
Using the current recognition system to recognize dynamic scene cannot effectively speed up the target recognition. When target recognition increases, the accuracy of target recognition is relatively low. In order to solve this problem, a target recognition system of dynamic scene based on DSP was designed. Combined with the idea of DSP system design, the design process and composition of target recognition system was expounded. The recognition algorithm based on spatial-temporal condition information was used to realize the designed recognition system. By introducing the visual attention mechanism, the spatial-temporal domain model based on visual significance was built. The pixel neighborhood weighted condition information was used as classification features to enhance the linear separability for target and background and improve the recognition accuracy of dynamic scene moving target. Finally, combined with image block modeling strategy, the efficient and real-time recognition of moving target in dynamic scene was realized. Experimental results show that the proposed target recognition system can effectively improve the accuracy of target recognition.
Introduction
The dynamic scene target recognition in artificial intelligence vision is a research hotspot in the field of computer vision. To effectivelyrealize the recognition and tracking of mobile target is the key for further video analysis and processing [1, 2]. The essence of mobile target recognition is to extract the changing region of images fromvideo sequence images. Therefore, the accurate recognition and effective segmentation of mobile targets is an important basis of whole recognition and tracking system, [3, 4]. For several years, the satisfactory result for the moving target recognition under simple static backgroundhas been obtained, but it lacks of practicability. The research on the recognition and tracking of mobile targets under artificial intelligent vision is raising, but the effect in practical application is not ideal [5]. Therefore, to design and develop a dynamic scene target recognition system under artificial intelligent vision has great significance [6].
Reference [7] proposes a moving target recognition algorithm based on multi-level background model. This algorithm can effectively overcome influence of many kinds of interference factors under complex background, but this algorithm has low accuracy of recognition. Reference [8] proposes a moving object recognition algorithm based on background classification. This algorithm can obtain complete and clearforeground and has the fast video processing. The proposed algorithm is simple and practical, which shows good robustness to noise interference. But, it has low recognition accuracy. Reference [9] proposes a real-time recognition method for mobile communication target signal based on machine learning. Although this method can carry out the real-time recognition, it has thepoor recognition accuracy. Reference [10] proposes an algorithm of recognizing infrared weak and small target based on coherent filter. Compared with classical recognition algorithms, the proposed method effectively decreases the false alarm probability and has good real-time performance. It is suitable for real-time recognition of weak and small targets under the complex background condition, but the accuracy of recognition is poor. Reference [11] proposes an algorithm for recognizing moving target based on optical flow analysis. This algorithm does not need any prior assumptions and can process dynamic background and static background. This algorithm is better than the existing algorithm in recognition and processing speed, but it has poor recognition accuracy.
General architecture design the different modules in dynamic scene target recognition system.
Use the recognition algorithm based on time and space condition information for the application implementation of designed target recognition system.
Experiment summarize the target recognition system and propose problems which need to be further improve.
Material and methods
Composition scheme of target recognition system
The structure of target recognition system of dynamic scenebased on DSP is shown in Fig. 1, which mainly includes power control module, signal processing module, central processing and control unit, and identification module. In the process of target recognition, the system plays the role of signal collection, preprocessing, A/D conversion, and sending processed signals into the central processing unit for feature extraction and target recognition. Meanwhile, considering some factors such as reducing noise, improving signal to noise ratio, reducing system electromagnetic interference and volume power consumption, the target recognition system must also have some necessary auxiliary circuits, which will beintroduced in the following parts [12].

Composition scheme of hardware in target recognition system.
The three-stage operational amplifiercircuit is used to meet the requirements from the sensor to A/D conversion. The operational amplifier must have high input impedance, low input offset voltage, low input bias current and low noise. As the interface of A/D, operational amplifier must have the ability of rail to rail output and heavy load. In addition, as the designed target recognition system, the operational amplifier must have a very low power consumption (shown as Fig. 2) [13].
The system amplifying circuit is composed of three levels: two symmetrical noninverting amplifiers such as UlA and U1D constitute the firs level, and the second stage is the differential amplifier, and the third level is the simple amplifier. In order to improve the ability to resist the common mode interface and the influence of depressing drift, the balanced matching of feedback resistors at each level is strictly guaranteed in designing circuit. The closed-loop magnification times of whole amplifier is:

Schematic diagram of signal amplifying circuit.
The experiment proves that the input misalignment error of amplifier is greatly reduced due to the matching of the operational amplifier UlA and U1D parameters. The equivalent imbalance parameter imbalance parameter of second stage amplifier U1B which is converted to the amplifier input terminal is smaller, thus the requirement for operational amplifier U1C can be appropriately reduced. Therefore, the three-level amplifier can not only satisfy the requirement of circuitstability, but also provide about 6000 times gain.
This system uses A/D converter TLV2544 with single chip, four channels and twelve bits serial in TI companyto realize the collection of target signals. The chip is a typical sampling A/D converter, which can reach the highest sampling rate of 200 bps. Ithas the built-inreference voltage and output conversion clock. The output is the synchronous data stream in a serial manner, the maximum transmission rate can reach 20 MHz. It can work at supply voltage from 2.7VDC to 5.5VDC, the power consumption is low. The digital interface of chip includes three inputs and a three-state output, which providesthe simple four-line interface to the general master chip. This chip has four kinds of working modes: single conversion mode, repeated mode, scanning mode and repeated scanning mode. In the second mode, third mode and fourth mode,the internal FIFO storage data must be used. According to the real-time requirement of target recognition system, the first mode is selected. The sampling control of TLU2544 is realizedby the channel control word which is given by main control chip. Moreover, it is a single chip for realizing multi-channel signal conversion, the timeslot allocation control is also realized when the channel control word is given by the main chip. Thus, the peripheral hardware design of system is simplified and some tasks are converted to the software [14].
During the system design, considering the real-time requirement of system and the character of DSP chip, combined multi-channel buffered serial portMCBSP and DMA based on DSP, the acquisition and transmissionof target signal are completed, which greatly reduces the DSP CPU load and meets the real-time requirement. In this system, the collection and transmission of data is completed by MCBSPO. First, the internal clock generator divides the frequency of DSP working clock, then the sending clock signal CLKXO is generated. Secondly, the frequency of CLKXO signal is divided by the frame synchronization generator, then the sending frame synchronization signal FSXO is generated. Because the receiving time sequence is the same sending time sequence of signal, the received clock signal CLKRO and the received framesynchronization signal FSRO can use CLKXO signal and FSXO signaldirectly. Meanwhile, two DMA channels such as DMAO and DMA1 are used to send ADC control words and receive data. The synchronicity event of DMAO is the interrupt signal given by ADC, and the synchronicity event of DMA1 is the receiving receive interruption of MCBSPO. In the whole sampling process, DSP only needs to usepin XF of IO to start A/D for the first data conversion, and others are completely completed by DMA controller.
DSP memory storage
DSP chip in TMS320C54X series has rich internal-fast memories. Using internal memory can run the program code at full speed so as to reach the highest speed of chip. Therefore, to make full use of internal memory internal memory can make the overall performance of DSP system best. For VC5409, it has the ROM with16K bits and the RAM with 32K bits. Figure 3 is an organization diagram of memory in the chip. The RAM and ROM in the chip can be set flexibly according to the OVLY bit and DROM bit in PMST register. As shown: in the data area, 00H∼5FH is the mapping register of memory. 60H∼7FH is the double addressing RAM (DARAM), 80H∼1FFFH is DARAM, 20000H∼7FFFH is the single addressing RAM (SARAM). When DROM = 1, the internal C000H∼7FFFH is mapped to the data area at the same time. When OVLY = 1, the internal 80H∼1FFFH and 2000H∼7FFFH are mapped in the program area at the same time. FF80 H begins to store inherent interrupt vectors. When the chip works in a microcomputer mode, the 16K ROMwhich is started from C000 H is also mapped to the program area. DARAM means that it can be read and written one time or read twice in a cycle [15].

Organization chart of TMS320VC5409 memorizer.
For the application of VC5409 in this system, we mainly use the internal RAM. Therefore, the
At present, the popular EPROM operating voltage is 5 V. The level conversion problem should be considered when it is connected to3.3VDSP chip. Their volumesare large. Compared with EPROM, FLASH memory not only has a better cost-effective property, but also has thesmall volume and low power consumption, and it is electrically erasable and easy to use. 3.3VFLASH can be directly connected to DSP. Therefore, this system chooses FLASH memory SST39LF100 in SST company to solidify DSP system program. SST39LF100 has 64K bits storage capacity, 3.3 V single power read-write operation, low power consumption,quick erasure and word programming. The read-write time is 45 ns. Its interface circuit with DSP is shown in Fig. 4.
Interfacesof DSP and FLASH.
There are many modes for interface of DSP memory and FLASH memory, such as the partition of program area and data, the combination of program area and data area, and the adoption of optimized mixed program area and data area. The connection form in this system is to mix program area and data region. From the above figure, the connection form does not use
Figure 5 is the power control circuit of target recognition system. The DSP processor ofrecognition system works at low voltage, and its core voltage is 1.8 V, and I/O pin voltage is 3.3 V. In order to make the system have low power consumption and small volume, the 3.3 V peripheral devices are used in the system. Therefore, the power chip chooses TPS767D318 from TI company. This power chip can be generated by 3.3 V voltageand 1.8 V voltage and outputted by 5 V voltage. The maximum output current is 1000 mA, which can satisfy the needs of system.

Power control circuit.
Because the amplitude of analog input signal of A/D is small, the digital switching noise produced by DSP digital circuit in the system will seriously influence the conversion precision of A/D. The 3.3 V power supply in digital circuit and analog circuit respectively uses power filter chip ACF-153 for power filter isolation, so as to reduce interference of digital circuit on analog circuit, and thedigital ground and the analog ground in circuit board should be strictly separated, namely common ground only in a point, which prevents the influence of digital switching noise on the accuracy of A/D. In Fig. 5, the electrolytic capacitor mainly filters out the power noise. During arrangement of wire, it should be connected to the main line of power and ground, and the other capacitors with different capacitancesare used forthe decoupling of chip. Meanwhile, it should be near the chip power pin to filter out the noise signals with different frequencies.
For general DSP systems, when the power system is powered up, it is in an unknown state. It should be reset to make the system in a known state. Because the reset operation terminates the memory operation and initializes each status bit of register of DSP processor, the system should be reinitialized after each reset. For convenience of debugging, the system uses the reset circuit in Fig. 6 during the design process. In order to initialize the system correctly, the RESET is generally guaranteed to be at least 3 CLKOUT periods. After powering up, the crystal oscillator of system often needs a stable period of several hundred milliseconds, generally 100∼200 ms. In the reset circuit, the reset time is mainly determined by R24 and C20. If R24 = R, C20 = C, V1 = 1.5V are the demarcation point of low level and high level, the reset time tis:
Reset circuit of target recognition system.
Subsequently, Schmidt trigger ensures that the time of duration with low level is at least t, so as to satisfy the reset requirement of target recognition system [17].
VC5409 provides clock pin X2 and clock pinX1. X2 is also called CLKIN, which is an input pin, and X1 is an output pins. The clock generator enables the selection of designer for clock source: one is to start DSP internal oscillator through a crystal oscillatorbetween X1 and X2/CLKIN, as shown in Fig. 7 (a) shows, two is to directly connectthe external clock to clock pin X2, X1 is dangled, as shown in Fig. 7 (b). The active crystal oscillator does not need the internal oscillator of DSP, and the signal is more stable. Therefore, this system adopts the clock circuit as shown in Fig. 7 (b). The VC5409 clock generator includes the crystal oscillator and the phase-locked loop (PLL) in the chip, and the PLL circuit can be seton the hardware by itself. The CPU of VC5409 has three clock modeselection pins CLKMDI, CIJKMD2 and CLKMD3. The different states of these pins decide that the working frequency of DSP is obtained by thefrequency multiplicationor frequency demultiplication of external clock.
DSP typical clock circuit.
For the convenient operation of system, the values of three pins can be controlled by D1P switch, which enhances the flexibility of selecting clock frequency.
It is similar to most of DSP, VC5409 has JTAG simulation interface with IEEE standard. It is directly accessed by emulator,which providesthe simulation function based on scanning. This kind of design greatly facilitates the debugging ofsoftware and hardware of VC5409. For convenience of hardware and software debugging and programming of this system, the designed target recognition circuit board has JTAG simulation interface, and its circuit is shown in Fig. 8.
DSP and emulator interface circuit.
In general, the distance between artificial head and JTAG target chip is not more than six inchesin order to make the simulator work. Where, the EMUO signal and EMU1 signal must be connected to the power source through the pull-up resistor and provide the time of signal rise which is less than l0us, R1 and R2 can take the value of 10K [18].
Some improvements were made for the design, which will be developed with DSP and PC machine. The main process is shown in Fig. 9.
Flow chart of color recognition.
This is a simple explanation for function of each process:
Extraction: the color image with 16bit is extracted from the converted RGB format image.
Color classification: each pixel is compressed from 16bit to 8bit, including the 4bitcolor and 4bitgray scale value.
Color selection: the specified color is selected from the above16 kinds of color (4bit), and other useless color is discarded, finally, the outputting 4bit data is the specified grayscale value.
Low pass filtering: the low pass filter (mean filter) with 3*3 is performed and the high frequency signal is filtered out.
Dualization: the useful color is set as 1, and the useless color is set as 0.
Combined with the description of different modules in dynamic scene target recognition system, the recognition algorithm based on space-time condition information is used to apply the design ed recognition system.
The non-parameter kernel density estimation method is used to estimate the conditional probability.
Where, δ is a kernel function, which has the advantages of simple computation and accelerated computation with histogram. x is the feature vector of pixel y in the current dynamic scene image. B is a reference background set. b is a sample in reference background set B. |B| is a normalized factor that denotes the total data volume of dataset B. To introduce the conditional probability p (x|B) intoformula (3) can get SCI of pixel y, namely Q SCI (x).
Considering the local time-space consistency between pixels in dynamic scene video, the pixel NWSCI can be used as the classification feature, which can effectively suppress the isolated noise interference.
The NWSCI of pixel y in the current dynamic scene image is defined as:
Where, i, j is the image coordinate of pixel y; x
kl
is feature vector of neighborhood pixel of y; k, l is a relative coordinates of neighborhood pixel; α
kl
is a weighted value; S
n
is the size of neighborhood, which is equal to the size of central domain S
c
; H
ij
(x
kl
) denotes feature vectors of neighborhood pixel of pixel y, and the value is obtained from spatial-temporal domainhistogram H
ij
of dynamic scene reference background; |H
ij
| is a normalized factor. All values of H
ij
are calculated to get a summation;
In actual applications, there are many unchanged regions such as surface and ground of building in dynamic scene video. Target motion inevitably causesthe change of scene. Inevitably, the unchanged region does not have moving objects. Therefore, a simple and fast image block difference can be used.
The change area which has been recognized in advance is taken as the foreground candidate region, and then the spatial-temporal condition information of dynamic scene image block is used for the secondary recognition.
It can reduce the amount of data which needs to be processed and improve the speed of recognition. Where, L
IB
is the marking matrix of dynamic scene image block. m, n is the location coordinate of image block. t is the threshold of image block difference classification. 1 denotes the foreground, and 0 denotes the background. Q
IBSCI
is IBSCI. τ is its classification threshold [19–21]. The image block difference can be calculated quickly by summation of absolute difference values. SAD is defined as:
Where, S b is the size ofimage block; k, l denotes the relative position of pixels in dynamic scene image block; I cur is the current image; I bck is the reference background image.
Similarly to formula (4),
Where, H mn is the histogram of reference background spatio-temporal domain of image block which is located at (m, n). x kl is the pixel feature vector in the dynamic scene image block. k, l denotes the relative position coordinate of pixel in image block.
The typical spatial-temporal domain model is initialized by caching the background sequence image with N frames. The sliding window method is used to update the model. But the model hasthe problem of slow initialization, large storage space and missed detection. We no longercache the background sequence image with N frames and directly initialize the model directly by using the single frame of pixel neighborhood color distribution, and then use selective data update strategy to avoid the missing recognition of target. The specific method is as follows:
Where, H
ij
is the color distribution of reference background in the position i, j.
In the pre-recognitionprocess of difference of dynamic scene image block, the reference background I
bck
is extracted by the background initialization method, and the reference background is updated by:
Where, i, j denotes the position of pixel. L is the label matrix (1 is the foreground, 0 is the background), and the update coefficient β1 > β2, that is to say, the foreground region is updated slowly and the background region is updated rapidly.
The basic flow of proposed algorithm for moving target recognition in dynamic scene based on SCI is shown in Fig. 10. Firstly,we need to construct the visual saliency in time-space domain of reference background for pixel y in the input image and calculate the conditional probability of y using estimation method of kernel density, thenuse the negative logarithmic kernel transformation to calculate its NWSCI. Finally, as classification features,it is classified by linear classifier get therecognition results of pixel y.

Frame of target recognition algorithm based on SCI movement.
The image block acceleration strategy improves the recognition speed, and the procedure is shown in Fig. 11. First, the image is divided into blocks, and formula (5) is used to determine whether the current image block is the foreground. If it is the foreground candidate region, formula (6) is used for the secondary recognition. Otherwise, the pixel of its corresponding position is directlymarked as the background. In secondary recognition process of image block,firstlywe determine the space-time neighborhood of visual saliency reference of image block, then calculate the IBSCI with this reference neighborhood, and finally classify this image block with linear classifier.

Frame of recognition algorithm based on image block acceleration.
In conclusion, the hardware modules of dynamic scene recognition system based on DSP are expounded. The designed recognition system is realized by using the recognition algorithm based on spatial-temporal condition information, and the research on target recognition system of dynamic scene based on artificial intelligence vision is completed.
In order to prove the effectiveness of proposed target recognition system of dynamic scene, a simulation experiment is carried out. Experimental conditions: Visual Studio 2017 development platform, using OpenCV2. 2 for programming. The configuration of computer: Intel Core i5 3210M, CPU is 2.50 GHz, the memory is 4 GB, and the operating system is Windows7.
First, two videoswere selected to identify the sensitivity of algorithm. The improved algorithm could identify the more complete moving target. As shown in Fig. 12, Figure (a) was the 308th frame intercepted from the first video, and Figure (d) was the 27th frame intercepted from the second video.
Comparison of recognition results with different target recognition algorithms.
Figure (b) and Figure (e) were the processingresults of algorithm of reference [7], and Figure (c) and Figure (f) were the processingresults of the proposed recognition system. Through the comparison, we could see that the algorithm of reference [7] could not extract the complete area of target object, only could extract some moving targets, which had low sensitivity for the moving target recognition. The proposed recognition system could identify most of the targets and extract complete target objects. The target recognition effect of large and slow moving was also improved significantly.
The evaluation index ofperformance was used to compare four target recognition algorithms such as the algorithm of reference [7], the algorithm of reference [8], the algorithm of reference [9] andthe proposed algorithm based on space-time condition information, where, the evaluation index included: (1) accurate rate; (2) recall rate; (3) comprehensive evaluation index; (4) false detection rate; (5) missing rate,which were defined as follows:
Where, T p was the number that foreground pixels was correctly recognized; T N was the number that background pixels was correctly recognized; F N was the number that background pixels was recognizedin error; F P was the number that foreground pixels was recognized in error.
The running program of performance evaluation procedures in above four algorithms all came from the ChangeDetection2017 data set. Through above calculation of performance evaluation code, the corresponding evaluation results were obtainedas shown in Table 1. From the performance evaluation resultsin Table 1, we could see that the performance evaluation results in data set were roughly the same as the results recognitionof experimental video. Compared with other three recognition algorithms, the algorithm based on spatial-temporal information had significantly high-performance evaluation indexes, such as accuracy and recall. Compared with the algorithm of reference [8] and the algorithm of reference [9], the comprehensive index F was increased by 6.9% and 4.4% respectively. Thus, the practicability and reliability of proposedalgorithm based on patio-temporal condition information was proved.
Results of performance evaluation indexes of different algorithms
Evaluation results of computing efficiency with different algorithms
For the low accuracy of target recognition of dynamic scene in artificial intelligent vision, a dynamic scene target recognition system based on DSP was proposed. The different modules in target recognition system were explained. The spatial-temporal information recognition algorithm and negative logarithmic kernel transformation were used to enhance linear separability of foreground and background in dynamic scene, and the visual attention mechanism was introduced. Moreover, the neighborhood weighted CI was used as the recognition and classification feature of moving target, which effectively suppressed the background disturbance in dynamic scenes and isolated noise interference and improved accuracy of dynamic scene moving target recognition. The image block strategy was used to simplify and speed up the algorithm, which realized the real-time recognition of dynamic scene moving target. Finally,the strategy of randomly selecting updates was used to update model, so as to reduce the missing recognitionof slow moving target.
The next research: the application of nonlinear transformation method in dynamic scene target detection is studied, including that the nonlinear kernel in SVM classifier is used for dynamic scene moving target recognitionand the algorithm based on space-time condition information is applied to embedded intelligentcamera platform to enhance the ability of intelligent camera for processing dynamic scene.
Footnotes
Acknowledgments
Funded by the Key Laboratory of the Advanced Technology (IOT2017B04).
