A Study of Unmanned Aerial Vehicle Path Planning using Reinforcement Learning

(1)

반도체디스플레이기술학회지 제17권 제1호(2018년 3월) Journal of the Semiconductor & Display Technology, Vol. 17, No. 1. March 2018.

A Study of Unmanned Aerial Vehicle Path Planning using Reinforcement Learning

Cheong Ghil Kim

^*†

*†

Department Of Computer Science, Namseoul University

ABSTRACT

Currently drone industry has become one of the fast growing markets and the technology for unmanned aerial vehicles are expected to continue to develop at a rapid rate. Especially small unmanned aerial vehicle systems have been designed and utilized for the various field with their own specific purposes. In these fields the path planning problem to find the shortest path between two oriented points is important. In this paper we introduce a path planning strategy for an autonomous flight of unmanned aerial vehicles through reinforcement learning with self-positioning technique. We perform Q-learning algorithm, a kind of reinforcement learning algorithm. At the same time, multi sensors of acceleraion sensor, gyro sensor, and magnetic are used to estimate the position. For the functional evaluation, the proposed method was simulated with virtual UAV environment and visualized the results. The flight history was based on a PX4 based drones system equipped with a smartphone.

Key Words : Unmanned Aerial Vehicles, Path Planning, Multi-sensor, Self-positioning, Reinforce Learning

1. 서 론

1

Drones are defined as Unmanned Aerial Vehicles (UAVs). In other words, drones are flying devices that are autonomously programmed or remotely controlled, either by a remote control or a ground station, and are categorized as networked robotic technologies [1, 2]. At the beginning, drones were developed for the military;

now their markets are growing with the interest of large companies of Google, Facebook, UPS, and Amazon. Also, the consumer market is rapidly expanding and DJI in China becomes as a representative company in recent years. Fig. 1 shows samples of drone fields including military, logistics, and consumer market.

Nowadays, the use of UAV is increasing in various fields such as videography, photography, disaster response, leisure, marketing, industry, and leisure. At the same time, the technology for unmanned aerial vehicles are expected

†

E-mail: [email protected]

to continue to develop at a rapid rate. In these fields path planning is the most important and basic element.

The path planning problem is to find the shortest path between two oriented points. For this purpose it should reflect various factors such as movement to the target point, obstacle avoidance, and shortest path setting [3].

Many researches have been proposed. Methods for path planning are proposed using genetic algorithm [4] and TVFG (tangent vector field guidance) algorithm [5]. Some algorithms used multi steps methods [6-8]. However, the hardware performance of UAV is limited. Therefore in order to operate the path setting module in real time under the limited hardware performance environment of UAV, complex operations must be reduced and optimized [5].

This paper propose a path planning method for an autonomous flight with Q-learning algorithm which is one kind of reinforcement learning algorithm and self- positioning method. Based on the learned results, the UAV determines its own path and suggests a path planning method to avoid the obstacle to the target point and fly.

For the performance evaluation, simulation environment

(2)

A Study of Unmanned Aerial Vehicle Path Planning using Reinforcement Learning 89

Fig. 1. Drone applications.

similar to the indoor environment is constructed the learning results such as the target point arrival time, the obstacle avoidance, the average path length, and the generated path.

The rest of the paper is organized as follows. In Section 2, we review the basic concept of path panning and self- positioning methods. Section 3 introduce a prototype systems architecture for autonomous drone delivery systems for catering. Section 4 introduces the simulation environment and results. Section 5 concludes this paper.

2. Background

In general, it is understood that the most interesting application of UAVs is to perform missions in hazardous environments with the function of the path planning. This mission shall be expected to become more intelligent with the development of drone technology. This section introduces several studies for the path planning of UAV.

Also, this includes the issues of avoiding obstacles and self-positioning.

Genetic algorithms and particle group optimization algorithms to calculate unmanned aircraft flight paths in complex environments [4]. Here, in order to shorten solution execution time parallel programming such as single-program, multi-data paradigms was used. A dynamic routing algorithm was proposed to set the path of an unmanned aerial vehicle to track ground targets in the constraints of wind resistance and obstacle avoidance [5].

For this, tangent vector field guidance (TVFG) and the lyapunov vector field guidance (LVFG) algorithm are

presented. They are implemented in hybrid form and complement each other.

Other efforts were made by making two steps for path planning [6-8]. First, a polygonal path is generated from the Voronoi graph by applying Djiktra’s algorithm; the initial polygonal path is then refined to a navigable path by considering the UAV’s maneuverability constraints or by using the dynamics of a set of virtual masses in a virtual force field emanating from each radar site [9].

As for the UAV positioning, several researches have been introduced for improving the performance. A technique that integrates measurements provided by inertial sensor, GPS, and video systems to estimate the position and attitude of UAVs was proposed [10]. Here, a vision-based system with a low-cost camera device was used. Sensor fusion algorithms integrated the information carried by the camera with the classical data from other sensors in UAVs. Another work proposed a framework for automatic tracking and landing on moving targets using VTOL UAV [11]. This work combined stereo vision-based visual ranging for relatively accurate attitude estimation with IMU data.

3. Proposed Path Planning

3.1 Learning Algorithm

Q-learning algorithm is a kind of model-free reinforcement learning algorithm. It is used for optimal action-selection policy in the Markov decision process (MDP). To do this, Q-learning uses action-value function to learn how to maximize the reward for the current action.

That is, the agent can operate according to the policy, and the optimal policy can be configured by selecting the action having the highest value among the action-value functions as the reward for each action. One of the strengths of Q-learning is that it is able to compare the expected utility of available operations without the need for an environmental model [3]. The specific algorithm is seen in Eq. 1.

Q( , ) = Q( , ) + α ∙ ∙

, , (1)

In this equation is the current state and is the

current action. Then we can know the next state , next

(3)

Cheong Ghil Kim 90

action and its reward ∙ α is the learning rate and is the discount factor. Q is the learned action-value function and it approximates the optimal action-value. Fig.

2 shows the pseudo code.

Fig. 2. Pseudo code of the algorithm.

Fig. 3 shows the operational overview of the proposed path planning with the reinforcement learning on the simulator environment. The simulator is compatible with DQN [12].

Fig. 3. Overview of the proposed path planning.

3.2 UAV Positioning

The current position of UAV can estimated by simply integrating the acceleration value twice. Unfortunately, we are not able to use the measured raw data directly because of inclusion of the acceleration of gravity. The net acceleration value can be obtained with UAV’s orientation in pitch, roll, and yaw value by by subtracting the acceleration of gravity extracted in , , components.

We used Kalman filter to calculate UAV’s orientation from acceleration, gyro, magnetic sensors.

Eq. 2 shows the method. If we let 0, 0 and , , … , be the acceleration value we measured at time , , … , , solving the following expression will get us the velocity at each measured time , , … , and position , , … , , where is the final position of the UAV that we need.

∑ /2 ∆

∑ /2 ∆ (2)

Here, there is a possibility that the acceleration data may accumulate to form big errors in the final stage because the final position should be acquired with double integration of the data measured by acceleration sensor alone. To minimize this problem, the lazy calibration method was appled by re-calibrating at the landing state of the UAV. If the UAV landed at some point, the velocity should be 0. However, if the error stacked during integration, the calculation would not result in 0.

So if 0, we can re-calibrate to 0 and change to accordingly [13].

4. Simulation

We provide learning environment by creating virtual spaces for autonomous flight of UAV shown in Fig. 4 and it shows its results visually. Each component of the simulator has its own reward value. The simulator gives directions to the UAV so that it can be operated. Currently, the simulator does not consider external environmental variables such as air resistance, wind and air temperature are reflected in the simulator. As for self-positioning PX4 based drone was used. Gyroscope and accelerometer sensor data were measured on LG Optimus, an Android smartphone loaded on our UAV.

For the simulation, the path planning results with Q- learning is compared with that of A* algorithm. The learning conditions are set as learning rate: 0.1 and discount factor: 0.9, which are repeated 300, 500, and 1000 times, respectively. According to iteration: 300, 500, 1000, the execution time is 15 sec, 26 sec, 52 sec, respectively. After 500 runs, the results converge. It can be seen that the final path is similar to that of the A*

algorithm. The four learning cases are denote as C1 through C4 in Table 1.

Fig. 4. Drone applications.

(4)

A Study of Unmanned Aerial Vehicle Path Planning using Reinforcement Learning 91

Table 1. Execution time and number of instructions

Execution time (sec)

Iteration 300

İnteration 500

İteration

1000 Instructions

C1 0.8 1 1.5 84

C2 1.5 2.5 5 216

C3 6 10 18 568

C4 15 26 52 1080

We used PX4 based drone as our UAV environment.

Gyroscope and accelerometer sensor data were measured on LG Optimus, an Android smartphone loaded on our UAV. Recording was done in about every 20ms, but since the measuring process also consumed up to 20ms, the measuring took 20-40ms period in practice. We used UAV model AR.Drone2.0 from Parrot to load the sensor hub and fly in our track. The UAV was programmed to move forward by 10m forward, turn left, move 50m forward, turn left, and move 50m forward again. The flight took about total of 198 seconds.

Fig. 5 shows the difference between the original UAV path programmed and the path estimated by the proposed method. Fig. 6 shows the estimated error calculated as distance between actual UAV position and estimated position on each measure. The UAV moved total of 110m and resulted error of 4.8m after the landing. The result showed least errors during first 50 seconds, but the errors are integrated to reach up to 6.37m at 98.93 second. The errors start to decrease in later half of the flight.

Fig. 5. Path difference between actual UAV and estimated UAV.

Fig. 6. Error calculated as distance between actual UAV position and estimated position.

5. Conclusion

In this paper we introduce a path planning strategy for an autonomous flight of unmanned aerial vehicles through reinforcement learning with self-positioning technique. Q- learning algorithm was performed on the simulation environment. At the same time, multi sensors of acceleration sensor, gyro sensor, and magnetic are used to estimate the position. The experimental results show that the learning results converge to a path similar to the A * algorithm for finding existing target points.

Acknowledgment

Funding for this paper was provided by Namseoul University year 2017.

Reference

1. Borenstein, J., Miller, K., “Robots and the Internet:

Causes for Concern,” IEEE Technology and Society Magazine, 32 (1), pp. 60-65, (2013).

2. Weis, A., Smart drones, Project Report, Tallinn University of Technology. March 26, 2017, from http://193.40.244.77/idu0310/wp-content/uploads/2015/

09/140605_IoT-project_Weis.pdf

3. Jinbae Kim, J., Shin, S., Wu, J., Kim, S.D., Kim, C.G.,

“Obstacle Avoidance Path Planning For UAV Using Reinforcement Learning Under Simulated Environment,”

Science and Engineering Letters, 3, pp. 34-36, (2017).

4. Roberge, V., Tarbouchi, M., Labonté, G., “Comparison of

(5)

Cheong Ghil Kim 92

parallel genetic algorithm and particle swarm opti- mization for real-time UAV path planning”, IEEE Transactions on Industrial Informatics, 9(1), pp. 132- 141, (2017).

5. Chen, H., Chang, K., & Agate, C. S., “UAV path planning with tangent-plus-Lyapunov vector field guidance and obstacle avoidance”, IEEE Transactions on Aerospace and Electronic Systems, 49(2), pp. 840- 856, (2013).

6. Bortoff, S. A., “Path planning for UAVs,” Proc. of the 2000 American Control Conference, pp. 364-368, (2000).

7. Zabarankin, M., Uryasev, S., and Pardalos, P., “Optimal risk path algorithm,” Technical Report 2001-4, Depart- ment of Industrial and Systems Engineering, University of Florida, (2001).

8. McLain, T.W., Beard, R. W., “Trajectory planning for coordinated rendezvous of unmanned air vehicles,” Proc.

of AIAA Guidance, Navigation and Control Conference, pp. 1-8, (2000).

9. Jun, M., D’Andrea, R., “Path Planning for Unmanned Aerial Vehicles in Uncertain and Adversarial Envi- ronments”, Cooperative Control: Models, Applications

and Algorithms, Springer, 1, pp. 95-110, (2003).

10. Angelino, C. V., Baraniello, V. R., Cicala, L., “UAV position and attitude estimation using IMU, GNSS and camera,” Proc. of 15th International Conference on Information Fusion, pp. 735-742, (2012).

11. Xu, L., Luo, H., “Towards autonomous tracking and landing on moving target,” Proc. of IEEE International Conference on Real-time Computing and Robotics (RCAR), pp. 620-628, (2016).

12. Lee, K.H., Kim, S.D., “Path Planning of Unmanned Aerial Vehicle based Reinforcement Learning using Deep Q Network under Simulated Environment”, Journal of the Semiconductor & Display Technology, 16(3): pp. 127-130, (2017).

A Study of Unmanned Aerial Vehicle Path Planning using Reinforcement Learning