Abstract
With the fast development, Internet technology has become a game-changer to the automotive industry. The advances and general applications of high-precision maps make it possible for accurate real-time positioning of vehicles. Meanwhile, the extensive applications of intelligent driving technology make it easier and more intelligent to drive vehicles. This paper reviewed the application of artificial intelligence (AI) in the field of autonomous driving comprehensively and explored the innovative studies of other unmanned motion systems at the same time. Firstly, the hardware architecture of the autonomous driving system is introduced, including five modules as follows: sensing, autonomous driving computer, power supply, signal communication, execution and braking. In addition, General Motors autonomous vehicle is used as an example to introduce its differences from the traditional vehicles in the hardware part. Subsequently, the autonomous driving software is divided into four modules according to functions: positioning, sensing, planning, and control, and the innovative application of artificial intelligence algorithms is introduced. Finally, this paper expands from autonomous driving technology and puts forward an innovative research idea for the intelligent unmanned system.
Introduction
With the fast development, Internet technology has become a game-changer to the automotive industry. The advances and general applications of high-precision maps make it possible for accurate real-time positioning of vehicles. Meanwhile, the extensive applications of intelligent driving technology make it easier and more intelligent to drive vehicles [1, 3]. The Internet, high-precision maps, and intelligent driving technology are co-boosters for the progress of autonomous vehicle technology [2]. It is increasingly expected that unmanned systems can replace humans in some activities. From the small sweeping robots that help people clean the ground automatically, to the large unmanned aerial vehicles that assist manned aerial vehicles to conduct battlefield situation sensing and coordinated operations, unmanned systems have penetrated into all aspects of human activities [4]. As one of the unmanned systems, autonomous vehicles have a broad range of market demands from battlefield operations, port freight, to passenger vehicle driving. In recent years, with the promotion of demand, many technological breakthroughs have been made in autonomous vehicles. Meanwhile, more investment and scientific and technological power have been attracted to this field, making it a vibrant emerging technology domain [5].
In recent years, with the growing demand for active safety and intelligence of vehicles in the market, the tremendous social and economic value of autonomous driving has been increasingly highlighted. More and more companies and orgniazations of scientific research have actively engaged in and promoted autonomous driving [6, 7]. Currently, vehicles that can realize fully autonomous driving have yet to be officially mass-produced and sold. However, there are already a considerable number of experimental models that can implement highly autonomous driving behaviors through environmental sensing such as starting, accelerating, braking, lane tracking, lane changing, avoiding collision, and parking [8]. NHTSA (national highway traffic safety administration) defines autonomous driving in five levels, that is, advanced driver-assistance system (ADAS), specific function assistance, composite function assistance, high-automated driving, and fully autonomous driving. Currently, the majority of models are still in the level of composite function assistance (Level 2), a long way to go before fully autonomous vehicles can be mass-produced [9, 10].
Autonomous driving refers to replacing the human driver, partially or completely, to drive a vehicle safely through the autonomous driving system. The autonomous driving system of automobiles is a complex system with many functional modules and technologies combined [11, 12]. Before machine learning, big data, and AI technologies rise in large scale, the autonomous driving system, like the other robotic systems, depends on traditional optimization technologies for total solution. As significant breakthroughs of AI and machine learning have been made in computer vision, natural language processing, and intelligent decision systems, the application of AI and machine learning in various modules of autonomous vehicle systems has been explored in the academia and industrial circles gradually and achieved some results so far.
In this paper, the applications of AI to autonomous driving are reviewed comprehensively, and the innovative studies of the other unmanned motion systems are explored at the same time.
Autonomous driving hardware system architecture
In the industrialization of autonomous vehicles, two main bottlenecks are present: 1) How to integrate multi-sensor information more efficiently and rapidly; 2) How to minimize equipment cost while ensuring the performance of autonomous driving.
The autonomous driving system is generally added to traditional vehicles to construct the whole system, as shown in Table 1.
Definition of autonomous driving (driverless)
Definition of autonomous driving (driverless)
From Fig. 1, it can be clearly seen that the autonomous driving hardware system mainly includes five modules as follows: sensing, autonomous driving computer, power supply, signal communication, execution and braking.

Autonomous vehicle hardware system architecture.
In autonomous vehicles, the highly complete sensing module, which is not available in traditional vehicles, mainly functions as eyes and “vehicle sense” of drivers when they drive manned vehicles. In general, it is composed of a camera, lidar (laser radar), millimeter-wave radar, and GNSS/IMU.
The camera is mainly used to acquire image information to identify pedestrians, cars, trees, traffic lights, signal signs, and perform positioning, etc.
The lidar is used to obtain laser scanning reflection data to identify pedestrians, cars, trees, and other obstacles and perform positioning, etc. The principle of three-dimensional ranging is to determine the distance by measuring the time difference and phase difference of the laser signal and measure the angle through horizontal rotation scanning. According to the two data, a two-dimensional polar coordinate system is established. The third-dimensional height information is obtained by acquiring signals of different pitch angles. Figure 2 is a diagram obtained after the data information obtained by the laser is processed through identification, classification, and labeling of different colors.

Laser information.
The millimeter-wave radar obtains reflection data mainly to identify obstacles, measure distances. It is installed in traditional vehicles to assist in avoiding obstacles. The GNSS/IMU combination is used to obtain global positioning information in real time.
In the sensing module, the most important is the lidar because it has high accuracy and high reliability, which meet the functional requirement of high precision positioning and recognition in autonomous driving. It can be said that lidar has directly accelerated the engineering application of autonomous driving technology.
The autonomous driving computer, as its name indicates, performs calculations related to autonomous driving. In general, it mainly consists of five parts: CPU, GPU, super large memory, super large hard disk storage space, and abundant hardware interfaces.
Among them, the CPU is used to process software containing logic judgments, processes, and other control and planning functions based on its performance features. The GPU is used to acquire sensor data based on its performance features and carry out a large number of data calculations of the same type, such as identification, classification processing, and sensing and positioning functional. The super large memory is used to process massive amounts of data and load high-precision maps. The super large hard disk storage space is used to store high-precision maps. The abundant hardware interfaces, such as serial ports, CAN, Ethernet, and USB, are used for the connection of multiple sensors.
Execution and braking module
Progress has also been made in the execution and braking systems along with the autonomous driving technology. The execution system receives the execution instructions of the autonomous driving control module to operate the vehicle, controls the execution of the vehicle power (throttle and gear), chassis (steering and braking), electronic and electrical systems, and implements the speed and direction control of autonomous driving. In contrast, the traditional automobile chassis braking system is hydraulic and pneumatic braking. In order to achieve the stability of the body structure and extend the intelligent driving function, the brake-by-wire (BBW) will be the long-term development trend of the automobile braking technology. The BBW can implement deep integration of the intelligent driving functional module, which is similar to the process in which the flight control system in the aviation field is gradually converted from hydraulic to fly-by-wire (FBW).
Autonomous driving software system architecture
If the autonomous driving hardware system is retrofitted and upgraded based on the traditional vehicle, the software system can be described as a brand new system. The autonomous driving software is mainly divided into four modules based on functions: positioning, sensing, planning, and control. Among them, the positioning module is generally regarded as the foundation. The content of each module is shown in Fig. 3 below.

Autonomous driving software system architecture.
Among them, the positioning solution is inseparable from the assistance of high-precision maps. Based on the positioning information, environmental sensing, route planning, driving behavior decision-making, and vehicle motion control can be carried out. While route planning, behavior decision-making, and motion control are three issues that are more and more specific and bottom-level gradually, in which the previous output can be used as the input condition of the latter. That is to say, the action of the decision planning can be taken as the input of the control module to calculate the turning angle and throttle that should be executed.
Research and development personnel face the most challenging problem—how to enhance automobile vision. So far, the computer vision system is still quite low and primitive. It is highly challenging to enable the computer system with nearly or same human vision. Autonomous vehicles must learn their surroundings (vehicles and pedestrians) at all times. In addition, they should be able to identify the lanes and lines on the ground in real time, understand traffic signs and lights, and respond to a series of complex environmental factors such as wind, frost, rain and dew, strong light, and low light. In the case that they cannot “see” road signs for some reason, or there are no road signs in some circumstances, the only feasible way to implement fully autonomous driving currently is to use multi-sensor information fusion for decision-making and fully leverage the advantages of various sensors for the expected effect. For example, millimeter-wave radar is applicable to short-range target monitoring and interception in high resolution. Attributing to high penetration, it can be applied to vision systems for acquiring lane lines, traffic light colors, and other information. On the other hand, the disadvantage of the vision system is that its ranging capacity is not as accurate as lidar. Hence, the integration of lidar, millimeter-wave radar, and vision sensor can make up for the deficiency in individual use effectively. In the detection of objects, spatial ranging and image recognition can also be carried out.
Vehicle-mounted sensors are objects that need to be taken into comprehensive consideration from multiple aspects in the development of autonomous vehicles, which includes the sensor accuracy, sensitivity, active and passive sensors, and so on. The performance indexes of various vehicle-mounted sensors are compared, as shown in Table 2.
Comparison of vehicle-mounted sensor indexes
Analysis of sensors and sensing parameters required to implement different autonomous driving functions in detail
Positioning
In order to satisfy the demand of car driving, the current requirement for positioning accuracy of autonomous driving is about 10 cm. The positioning scheme adopted by such a high-precision positioning system is generally the integration of multiple sensors and high-precision maps, specifically the fusion of GNSS, IMU, lidar, camera, and high-precision maps. Among them, the Global Navigation Satellite System (GNSS) mainly provides an approximate absolute position (latitude and longitude). Subsequently, it obtains a more accurate positioning based on the lidar and camera data collected from the environment where it is located and matches the data with high-precision maps. The IMU (Inertial Measurement Unit) provides acceleration and angular velocity in the state equation (prediction) of the state estimation algorithm.
The block diagram of the positioning scheme adopted by Baidu autonomous vehicle team is shown in Fig. 4. Currently, this is a relatively common and effective positioning algorithm architecture. The subtlety of the positioning algorithm is that some minor processing and changes can also lead to a relatively large accuracy gap. Hence, scholars continue to make breakthroughs in the positioning algorithms.

Block diagram of positioning algorithm implementation.
In the positioning scheme, high precision (HP) maps play a pivotal role. HD maps use high-precision lidar, camera, GNSS, and other sensors to acquire road information data. The more sensors, the more comprehensive information coverage, the higher the precision, the more accurate the HP maps. When used in autonomous driving, it can be expressed in the form of computer language and stored in the hard disk of the autonomous driving computer. During driving, high precision positioning is achieved through real-time comparison with high-precision maps.
Due to the enormous amount of collected data, AI algorithms are required for data processing. High-precision maps mainly include the following information: lane latitude and longitude, lane width, curvature, and elevation; lane intersection location, width, curvature, and the number of crossings; sign location and meaning; signal light location, and so on.
There are a large number of classification problems in the process of establishing a high-precision map. The application of CNN (convolutional neural network) in the field of computer vision has provided a good solution to the problem.
CNN includes one (or more) convolutional layers and completely connected layers (corresponding to the optional classic neural network). The calculations performed in the convolutional layers include convolution and pooling operations. The convolution calculation is to obtain the convolved data through the inner product of different window data and filter matrix (a set of fixed weights) (multiplying element by element and then summing). The pooling calculation divides the data into blocks and obtains the maximum or mean of each block as the representative value of the data block. The specific operation diagram is shown in Fig. 5 as the following.

Operation diagram of convolution (left) and pooling (right).
Another feature of the CNN algorithm is weight sharing. For each point in an image, the weight of the convolution operation in a specific layer is the same. The parameters of the CNN training are converted into the training filter matrix (convolution kernel), which has significantly reduced the number of parameters. The CNN is to obtain geometric information features in different directions through multiple convolutional layers. Through the extraction of these features, the correlation of the input data is obtained, and the training complexity is reduced in consideration of these correlations. This method is used in image and speech processing, with very good results.
Given a sound vector s with the length of N, from the highly overlapping and windowed frame of samples with a length of w
s
, spectral line f is obtained. Hence, spectral line f
F
for frame F can be obtained:
In the above equation, δ represents the sample lead time of analysis frames. w (n) denotes an N-point Hamming window. Through averaging B′ = [w
s
/2B] samples, B bin frequency resolution of SAI-based method is matched by down-sampling. An overlapping spectrogram (S) is formed by stacking the spectra.
In practical applications, spectrogram S includes up to D continuous spectral lines in historical records (m = 0 ⋯ D - 1). These spectral lines are connected in series to fill the (B . D + 1)-dimensional eigenvector V, and the scalar energy measurement is increased by a. The eigenvector v contains the element v (i) = S (⌊ i/B ⌋ , i - B . ⌊ i/B ⌋) of i = 0 ⋯ (B . D - 1), with the energy metric as follows:
The scalar energy metric is used for acquiring frame energy information, given that compared to higher-energy frames, very low-energy frames have less capacity to discriminate sound classification. In fact, our tests show that under noisy conditions, a 10% to 20% improvement in classification performance can be achieved by using only one energy metric. This value will be studied later as the scaling ratio of DNN frame output. The eigenvector (v) with only (B . D + 1) dimensions forms the initial layer input of DNN. Hence, the input layer size is defined.
Each file corrupted by accumulated noise in the test data set is expressed as multiple overlapped analysis frames of spectrogram information down-sampled. Every frame generates a spectrum vector of length B. In order to implement noise reduction, the minimum value of each frequency point in B frequency points is calculated in the whole audio file. Subsequently, each minimum value is subtracted from each spectrum vector before a feature matrix is formed. Denoising is carried out from S in the Equation (3). Hence, the denoising spectrogram S
dn
can be obtained as the following
Where m = 0 ⋯ (B - 1). The initial element B . D of final eigenvector (v) is composed of S dn rather than S dn , whereas energy metric v (B . D) is obtained by calculating raw data of spectrogram, as shown in Equation (4).
Given input eigenvector v = [v1, v2, ⋯ , v
V
]
T
with a length of V, where v ∈ R
V
, and the corresponding K-vector y = [y1, y2, ⋯ , y
K
]
T
, y = [y1, y2, ⋯ , y
K
]
T
is y ∈ { 1, - 1 }
K
. The linear kernel SVM is used to optimize the normal vector (w) of the supervector.
The method of online environment sensing is similar to the process of establishing a high-precision map. The collected data are identified and classified online in real time. The difference is that the input data are dynamic, which poses a new challenge.
Planning
The planning problem is to make motion sequence decisions based on the perceived dynamic environment and the prediction of the moving object. The decision-making problem is highly complicated in a complex environment. It can be imagined that how to pass the test of an extremely complex intersection is a crucial issue that reflects the level of intelligence in autonomous driving. The traditional route planning algorithms such as A* and Dijkstra can implement conservative driving of vehicles. However, they are not applicable to complex dynamic environments when the time complexity is high. In contrast, reinforcement learning is an excellent way to solve sequential decision problems. Currently, there are many simulation demonstrations in solving autonomous driving planning problems.
Reinforcement learning is a branch of machine learning that is at the same level as supervised learning and unsupervised learning. It is derived from animal learning psychology and can be traced back to Paplov’s conditioned reflex test, in which the effect of feedback is obtained to improve the learning effect continuously.
The basic structure of a reinforcement learning problem is interaction. An agent is in an environment. At every time, the agent makes an action (a) and then obtains an observation (state measurement) and reward (r) from the environment. The goal of reinforcement learning is as the following: How to take a series of actions in an unknown environment to maximize the (total) cumulative reward received by an agent. The iterative relationship of state, action, and reward in this interactive process is shown in Fig. 6 as the following.

Schematic diagram of the iterative relationship between status, action, and reward.
Reinforcement learning has the following features: It is essentially a closed-loop system, with input and output interdependent of each other. The feedback is delayed, not instant, and the impact of action may only be reflected after a few steps. There is no direct guidance to tell what to do; only the reward function. Time is very important. The observations, rewards, and so on are sequences about time and do not comply with the assumption of independent and identical distribution. The actions of an agent can directly affect the data it receives afterward.
Based on the above features, it is assumed that the set of environmental states is S, and the set of actions is A. Reinforcement learning has four elements as the following: Strategy (π): The mapping learning from environmental state to act is referred to as strategy, which is denoted as π: S⟶A. Reward (R): A quantitative expression of the effect produced by the state and action, which is denoted as R: S×A⟶R. Value function: The value function consists of h steps in the future, in which the strategy π of the maximization value function is taken as the goal of reinforcement learning. Model:
The model is known (white box): The probability G of the system transitioning to the next state S’ is known, and the reward r generated by action a is known. The model is unknown (black box): The probability P of the system transitioning to the next state S’ is unknown, the reward r generated by action a is unknown, and most of the scene models are unknown.
The reinforcement learning theory is relatively profound with a high introduction threshold. The implementation method of reinforcement learning to solve problems is offline training learning (trial and error) + online inference and decision-making. As neural networks are good at making descriptions that can be easily completed by humans but are difficult to standardize (analyze), they are extensively used in the field of machine learning. The reinforcement learning method based on the neural network as function approximator for the learned strategy (neural network can be considered as a non- linear fitting) is referred to as deep reinforcement learning. Deep reinforcement learning is considered as the only path to general AI. Currently, the theoretical study and simulation test of using deep reinforcement learning to solve the route planning issues of autonomous driving have demonstrated that it is highly effective.
In the control task, the output trajectory points of the upper-level action planning module are digested and converted into control signals for the accelerator, brake, and steering wheel of the vehicle through a series of dynamic calculations so as to control the vehicle as much as possible to execute these trajectory points in practice. In general, this issue is transformed into the identification of the steering wheel angle control (lateral control of the vehicle) and driving speed control (longitudinal control of the vehicle) that meet the vehicle dynamic attitude constraints. The classic PID control algorithm can be used to control these state variables. However, due to the model base, it has relatively high errors. Fuzzy control, neural network, and other smart control algorithms have been widely studied and applied to autonomous vehicle control. In neural network control, neural network is used, and control is deemed as a pattern identification problem. The pattern identified is mapped into the “changing” “behavior” signals. The controller can even be trained to obtain the control algorithm with manipulation process data of the driver.
As market demand for technologies such as automotive active safety technology, and intelligence is increasing, more and more enterprises are investing in the related fields to drive the development of autonomous vehicles. Meanwhile, as computer performance is improved and the cost is lowered continuously, great progress has been made in various sensors used in autonomous vehicles. The threshold for technological research and development has been reduced accordingly, with a highly promising prospect. In recent years, breakthrough progress has been made in deep learning in terms of image recognition, which has injected strong vitality to the maturity of autonomous driving technology. The real-time image processing is implemented by using the efficient image processing performance of GPU. Significant progress has been made in the research achievements of smart vehicles both at home and abroad. However, it can be seen from the recent autonomous vehicle related accidents that there are still many problems to be solved before smart vehicles can be put into practical use. Hence, it is necessary to make progress in the following aspects for autonomous driving in the future.
Firstly, it is necessary to adopt better sensors and optimized configurations to enhance autonomous driving functions. Only though high precision sensors can environmental information around the vehicle be detected in complex traffic. At the current stage, the sensors have yet to overcome the signal interference of vehicle movement, environmental climate, and working range. As a result, the accurate detection of all driving factors cannot be guaranteed. However, the high price of lidar sensors has also limited the practical application of this technology. In the future, l lidar will become smaller and lighter, more integrated, cheaper, and changed into a solid state. In the face of the sensing requirement in a complex environment, implementing sensing, positioning, decision-making, and planning by integrating the data from multiple sensors is the recent development direction of autonomous vehicles.
Secondly, in order to implement autonomous driving, it is necessary to combine an integrated control system, a new type of bus distribution, and an autonomous driving architecture. The safety and robustness of the autonomous driving technology can be improved through vehicle software with multiple sensing and decision-making algorithms. The adoption of the Internet of Vehicles (IoV) share information between vehicles and effectively expand the sensing range. High-precision maps and GPS positioning can reduce the vehicle-mounted sensors required, thereby making autonomous driving easier. In addition, the combination of deep learning technology and the integration of a vehicle computing platform with high performance can improve the level of autonomous vehicle driving.
In addition, the autonomous vehicle technology requires the support of a computing platform with high performance. The combination of the vehicle computing platform with deep learning technology can enhance the intelligence level of autonomous vehicles and apply the new breakthroughs in AI technology to the autonomous driving.
The ultimate goal of developing autonomous vehicles is to establish an automated platform that is connected and informatized with the integration of humans and vehicles, capable of implementing real-time, all-weather, and efficient autonomous driving. The autonomous driving technology can significantly increase social productivity, produce tremendous social benefits, while improving the way people travel and making our living environment better at the same time.
Conclusion
The autonomous driving problem of a vehicle as a land-based moving object is quite similar to the autonomous driving and control of a moving object in the air (unmanned aerial vehicle) and underwater (autonomous underwater vehicle), both in hardware system and software system. Through learning and studying the technical route of autonomous driving, we may get ideas for intelligent innovation of unmanned moving objects. Firstly, the improvement of the capacity to acquire information, improve the quality of information, and increase the type of information and the volume of data is conducive to enhancing the intelligence of the system. Secondly, in the face of future battlefield environment, the environmental models are complex and changeable, and the traditional methods are poorly adaptable. It is necessary to explore decision-making methods that do not rely on models. The reinforcement learning may be an option. Finally, innovation needs to be problem-oriented. Problems are the starting point of practice and innovation. Problems should be identified and solved accordingly.
