저작자표시

(1)

저작자표시-비영리-변경금지 2.0 대한민국 이용자는 아래의 조건을 따르는 경우에 한하여 자유롭게

l 이 저작물을 복제, 배포, 전송, 전시, 공연 및 방송할 수 있습니다. 다음과 같은 조건을 따라야 합니다:

l 귀하는, 이 저작물의 재이용이나 배포의 경우, 이 저작물에 적용된 이용허락조건 을 명확하게 나타내어야 합니다.

저작권법에 따른 이용자의 권리는 위의 내용에 의하여 영향을 받지 않습니다. 이것은 이용허락규약(Legal Code)을 이해하기 쉽게 요약한 것입니다.

Disclaimer

저작자표시. 귀하는 원저작자를 표시하여야 합니다.

비영리. 귀하는 이 저작물을 영리 목적으로 이용할 수 없습니다.

변경금지. 귀하는 이 저작물을 개작, 변형 또는 가공할 수 없습니다.

(2)

Rotational motion estimation with contrast maximization using an event camera only

A Thesis

by

HARAM KIM

Presented to the Faculty of the Graduate School of Seoul National University

in Partial Fulfillment of the Requirements

for the Degree of MASTER OF SCIENCE

Department of Mechanical & Aerospace Engineering Seoul National University

Supervisor : Professor H. Jin Kim

(3)

to my

FAMILY

with love

(4)

Abstract

Rotational motion estimation with contrast maximization using an event camera only

Haram Kim Department of Mechanical & Aerospace Engineering The Graduate School Seoul National University

Event cameras, bio-inspired vision sensors, are new concept cameras that store the position, time, and pole information by measuring the brightness change of light. Event cameras have a high dynamic range (HDR), no motion blur, and have a microsecond latency by operating in a different way from conventional cameras. However, asynchronous event information cannot be applied to existing image processing methods and new research is needed. In order to utilize the advantages of the vision sensor, feature point extraction, depth value, and optical flow estimation studies for event camera have been performed. In recent years, this study focuses on the intensity image reconstruction, and simultaneous localization and mapping (SLAM) system. Because, it is difficult to perform visual navigation using only asynchronous event information, many studies have used external sensors such as conventional cameras and an inertial sensor unit (IMU). The method would cost a lot to improve performance. (e.g. computational load, platform weight, expense) In this paper, we suggest an algorithm that estimates angular motion using only event information in order to maximize the advantage of event camera. We can get images by accumulating the event points for a certain period of time. We can obtain the event points at the edge, since the brightness change mainly occurs due to the motion rather than the illumination change of environments . Therefore, we can get meaningful information by aligning event points on edges. We propose an algorithm that estimates the angular motion accurately by extending the existing research to find the angular velocity that maximizes the contrast of the image. We create a spherical map using warped event points with an estimated angular velocity, and maximize the contrast between warped event image and spherical map projected onto 2D image. We also apply the Lie algebra

(5)

on angle variables and utilize the spherical map instead of the spherical mosaic map, in order to estimate the omni-directional angle without gimbal lock problem.

Keyword : Event camera (EC), Visual odometry (VO), Rotational motion estimation.

Student Number : 2017-21960

(6)

List of Figures

1.1 Data examples by conventional cameras and event cameras: (a) gray image (b)

event points top view (c) event points diagonal view . . . 2

2.1 Polarity-time graph for the fixed time interval method. . . 6

2.2 Pinhole camera geometry. . . 7

3.1 Algorithm flow . . . 12

3.2 Event data element-wise warping. (a),(c) event points before warping. (b),(d) event points warping with angular velocity which maximize the event image contrast . . 13

3.3 The iteration lapse of optimization for angular velocity estimation. Event points are warped to maximize the contrast. . . 15

4.1 Polarity time graph for mapping . . . 17

4.2 3D spherical map example for 360^◦ motion . . . 18

4.3 Polarity time graph for rotation motion estimation . . . 19

4.4 (a) The projected 3D mapπ(M) and (b) the corresponding grey image. The image in the red square of (a) is corresponded to (b) . . . 21

4.5 The iteration lapse of optimization for rotational motion estimation. The blue points represent the projected 3D map π(M) and the red points represent the event image I. The projected map less changes due to the accurate initial pose from Eq. (4.8). . . 22

5.1 The equipment for obtaining data set. (a) DAVIS240C, (b) VICON tracker . . . . 24

5.2 The grey image of VICON-free data sets. . . 27

5.3 The projected map from [1] and the proposed method in the room. . . 28

5.4 The projected map from [1] and the proposed method in the lobby. . . 29

5.5 The projected map from [1] and the proposed method outside of the building. . . . 30

(9)

5.6 The grey image and the projected map with the proposed method in sequence,turn around. . . 31 5.7 The rotational position estimation result of turn around. (a) rotational position

and (b) rotational position error . . . 32 5.8 The grey image and the projected map with the proposed method in sequence,high

speed. . . 33 5.9 The rotational position estimation result ofhigh speed. (a) rotational position and

(b) rotational position error . . . 34

(10)

List of Tables

5.1 Root mean square error of rotational position in sequenceturn around. . . 25 5.2 Root mean square error of rotational position in sequencehigh speed. . . 25

(11)

1

Introduction

In autonomous systems, it is essential to find out their position and attitude. In the field of computer vision, many studies have been conducted to estimate the position and the attitude by utilizing images which contain rich information [2–6]. Research using the existing camera can be applied only in a limited situation because the quality of the image is lowered when the camera or target object moves fast, and the brightness range is large. To broaden the applicability of vision algorithms, new bio-inspired cameras called event cameras have been developed which may enable visual navigation for those fast motion, and high dynamic range.

Conventional cameras should adjust the sensitivity of a photographic emulsion (ISO), aperture size, exposure time according to the surrounding environment to measure absolute light intensity.

Since event cameras measure the brightness change, and are free from frame data form, event cameras have characteristics of HDR, no motion blur and low latency [7]. Fig. 1.1 shows the difference in recording method between conventional cameras and event cameras. The event cameras record the brightness change as the pixel position, the time and the polarity. The event cameras record brightening pixel polarity as +1 (blue points in Fig. 1.1b and Fig. 1.1c), and darkening pixel polarity as -1 (red points). Because of the asynchronous form of event data as shown in Fig. 1.1c, it is difficult to apply existing computer vision methods. Thus, new approaches are needed.

(12)

(a) (b) (c)

Figure 1.1: Data examples by conventional cameras and event cameras: (a) gray image (b) event points top view (c) event points diagonal view

In this paper, we propose a method of estimating angular motion using an event camera. We obtain an event image by warping event points with an angular velocity model, and the angular position is estimated by creating a spherical map in the three-dimensional space and localizing the event points on the map. We also propose a simple method to reconstruct the intensity image.

The experimental data were acquired using DAVIS240C and VICON tracker.

1.1 Literature review

In [8], the ego-motion was estimated using a black square pattern. They estimated the ego-motion by tracking the line with the event points which are measured at the edge of the black square, and minimizing the distances between the tracked line and the events belonging to it. It performed well on ego-motion estimation of the fast moving drones that flip at a maximum of 1200 ^◦/s.

Some studies [9, 10] estimated the camera ego-motion through a probabilistic approach. The authors of [9] estimated 6-DOF motion using an external RGB-D camera. They built photometric depth map by applying existing dense reconstruction method with conventional RGB-D camera, and the state for filter consisted of camera poses, contrast threshold and inlier parameters of the sensor model. They suggested a measurement model that describes how well the measured event points match the constructed map considering resilient sensor model. The study of [10] firstly performed 6-DOF motion estimation using an event camera only, with assumption that map and poses were known. They simultaneously estimated and updated the ego-motion, log intensity map, and inverse depth through the extended Kalman filter (EKF).

(13)

Likewise [10], the authors of [11] utilized event camera without any external sensor. They obtained depth information by applying the disparity space image (DSI) in [12], and constructed a 3D edge map with reliable depth. Then, pose tracking was performed through the image-to- model alignment method instead of probabilistic approaches. In order to bootstrap the system, they assumed that the motion of the camera is in the plane. Firstly, they obtained the ego-motion and then mapping was conducted. The method showed very good performance for given data sets.

There were some papers that estimate 6-DOF motion without using external sensor, but the performance of algorithms is degraded when the event camera moves fast in new environments where map is unable to use. In addition, the algorithms required constraints such as a planar motion assumption or utilizing external sensors in the initialization phase, because it is difficult to interpret and process the event information. Such limitations prevents from taking full advantage of the event camera.

If the motion estimation is limited to the rotational motion, the advantages of the event camera (e.g. low latency, high time resolution) can be fully utilized. Researches such as [1, 13] have been conducted to estimate rotational motion. In [13] the author proposed rotational position estimation and intensity image reconstruction based on filtering theory. Using the constant position motion model, and assuming that Gaussian noise on three axes is independent, the rotational position was estimated through the particle filter event-wisely. The gradient of the intensity image was estimated with the EKF by measuring event at the position at which the pixel is expected to move by rotational motion. Gradient images are recorded on a spherical mosaic map, and the objective function suggested in [14] was solved in a closed form through Euler lagrangian to reconstruct the intensity image. In the paper [1], the authors warped the event points considering the time of the event that occurred. The contrast of the images, which obtained from the warped event points was used as the objective function. The authors estimated the angular velocity by solving an optimization problem, which maximizes the contrast of the image. They extended the idea and proposed a method to calculate the depth and optical flow in addition to the rotational motion through the [15].

(14)

1.2 Thesis contribution

In this paper, we propose a method to obtain high accuracy estimate for angular position as well as angular velocity by modifying the method proposed in [1]. Our main contributions can be summarized as follows:

1. We propose an algorithm that estimates the angular velocity and the angular position in a fast moving situation using only an event camera without utilizing any other external sensors.

2. We design an algorithm that reliably estimates angular positions in all angular sections without a singularity problem through a spherical map in a three-dimensional space rather than a two-dimensional spherical mosaic map.

3. We utilize the spherical map to solve the problem that occurs when the number of events is small in a fixed time interval manner, thus the proposed algorithm robustly estimates angular motion even in slowly moving situations.

1.3 Thesis outline

The remainder of this paper is organized as follows. In Section 2, we propose the background knowledge for motion estimation and event camera. Section 3 explains the angular velocity estimation method. Section 4 introduces the mapping and the rotational motion estimation method, and Section 5 details real-time experiment results. The final section discusses the results and the improved rotational motion performance.

(15)

2

Background

Event cameras have different information recording methods than ordinary cameras. Generally, one event data (event point) includes the spatio-temporal information as follows.

e_k= (x_e_k, t_e_k, p_e_k), where x_e_k = (x_e_k, y_e_k) (2.1) For the kth event point e_k,x is the pixel coordinate, t is the time when the event occurred and p∈ {−1,1}is the event polarity. When the brightness changes at a certain pixel of event camera, event data is recorded as Eq. (2.1) if the following Eq. (2.2) is satisfied.

∆lnI(x, t) := lnI(x, t)−lnI(x, tprev)> p·Cth, (2.2) where I(x, t) is the intensity at a pixelx, and C_th is the event contrast threshold, andt_prev is the time of the most recent event recorded at the very pixel coordinate. If ∆lnI(x, t)> Cth, the event data will be recorded as e= (x, t,+1) (ON-event). Else if ∆lnI(x, t)<−C_th, the event data will be e= (x, t,−1) (OFF-event).

2.1 Event bundle

Since the most of the information in one event point is difficult to distinguish from noise and has little significance, most algorithms process the bundle of event points over the fixed time interval or windows with the constant number of events.

(16)

Figure 2.1: Polarity-time graph for the fixed time interval method.

The fixed time interval methods can be represented as follows.

E|^t_t−∆t={e_k|t−∆t < t_e_k ≤t}, (2.3) where ∆tis the constant time interval, andE|^t_t−∆tis the bundle of event points in timet−∆tto t. The fixed time interval method literally collects events which occurred during a specific time interval ∆t, and executes the collected events on the algorithm. The algorithm outputs results at a constant time, regardless of the environment in which the event occurs (feature richness, camera speed of ego-motion). However, in the optimization algorithm, there is a problem that the algorithm often diverges in some time interval where no event occurs.

The method of windows with a constant number of events can be represented as follows.

E_(N_−∆N):N ={e_k|N−∆N < k ≤N} (2.4) whereE_(N−∆N):N is the bundle of event points, N is the index of the latest event point and ∆N is the constant number of events. The method takes advantage of the asynchronous event cameras.

The method processes only the same number of events so that it can be used for neural network and it does not need to process redundant computation when there is no event. However, it is difficult to augment with other algorithms because there are no continuous results. If the event does not occur for a long time, we cannot obtain any output from algorithm. Also, in the rotational motion estimation problem, It is difficult to find a proper rotational motion model because of its strong non-linearity.

In this paper, we use the fixed time interval method which gives output periodically. The polarity - time graph for the fixed time interval method is depicted as in Section 2.1.

(17)

Figure 2.2: Pinhole camera geometry.

2.2 Pinhole caemra model

The Pinhole camera model maps the point in the three-dimensional world into the two-dimensional image plane by projection as shown in Section 2.2.

When the intensity changes at the 3D pointE, the event camera writes time, polarity on the event e, and records the coordinates of the event,x, by the following projection function.

π(E) =x (2.5)

= (fx

X

Z +cx, fy

Y

Z +cy), (2.6)

wherefx andfy are the focal lengths, andcxandcy are the principal points. We can also represent the projection as a linear mapping between homogeneous coordinates using the camera matrix.

The camera matrix K (with omitted skew coefficient) is represented as Eq. (2.8).





 x y 1







=K·





 X/Z Y /Z 1







(2.7)

(18)

K=







f_x 0 c_x 0 fy cy

0 0 1







(2.8)

In order to express the two-dimensional point acquired by the event camera into a three-dimensional point, it is necessary to know the depth information. In order to obtain depth information, the translational motion is required. However, we deal only with rotational motion, and do not need to know the depth value. Therefore, we assume that the event points in 3D space are all the same distance from the camera. Thus, we express the inverse projection as follows.

E =π⁻¹(x) =K⁻¹·





 x y 1







(2.9)

2.3 Rotational motion

There are many ways to describe rotational motion. In this paper, we use the rotation matrices R∈SO(3) that are also known as special orthogonal matrices. In order to minimize the rotation expression in the numerical optimization, we adopt the Euler vector ω ∈R³. Euler vector ω can be transformed into rotation matrices through the Lie algebra ˆω ∈so(3), the tangent space of the manifold SO(3), as follows.

ω=





 ω₁ ω2

ω₃







(2.10)

R= e^ω^ˆ = exp







0 −ω₃ ω2

ω₃ 0 −ω₁

−ω₂ ω1 0







(2.11)

In order to warp the event points, we inversely project the event point on the image plane into the 3D space, multiply the rotation matrix, and then re-project the rotated points into the image plane. Since the warping method should be modified due to the asynchronous nature of the event, we cover the detailed warping method in Section 3.1

(19)

2.4 Non-linear optimization

We use an optimization method to find rotational motion from the event data. There are methods such as gradient ascent, Newton optimizer, and Levenberg-Marquardt optimizer to solve the optimization equation. This section outlines the methods that are relevant to the root mean square propagation optimization (RMS prop) which we use.

2.4.1 Gradient ascent

The most basic tool to solve optimization problems is the gradient ascent (gradient descent) method [16]. When the objective function J(x) exists, it measures the gradient of the objective function with respect to the current state, ∇_xJ(x), and iteratively updates the state as in Eq. (2.12), so that the objective function becomes the maximum.

x←−x+η∇_xJ(x), (2.12)

where x is the current state, η is a step size. The gradient ascent method is often not used in complex problems, because of the low convergence speeds and optimization performance in non-convex problems.

2.4.2 Adaptive gradient optimization (Adagrad)

The optimization method called Adagrad [17] is a method to independently set the step size of each element at every update. The optimizer increases the step size for elements that have changed little, and decreases the step size for elements that have changed a lot.

G←−G+η(∇_xJ(x)∗ ∇_xJ(x)) (2.13) x←−x+ η

√G+· ∇_xJ(x), (2.14)

where ∗ is an element-wise multiplication operator. prevents the denominator from becoming zero and is usually a small number. G stores the updated size of each variable as an element wise square sum, and the variable G helps to give more weight to elements that have changed little. However, there is an issue on Adagrad that the step size becomes smaller as the iteration progresses, since the variable G monotonically increases.

(20)

2.4.3 Root mean square propagation optimization (RMSprop)

RMSprop have been proposed to solve the problem of slow convergence as the iteration increases in the Adagard. RMSprop calculates G as an exponential average for the squared value of the gradient, instead of summing the squared value. This allows adaptive weights for the element regardless of the iteration.

G←−γG+ (1−γ)(∇_xJ(x)∗ ∇_xJ(x)) (2.15) x←−x+ η

√G+· ∇_xJ(x) (2.16)

where γ is the smoothing factor, and 0< γ <1.

(21)

3

Angular velocity estimation

We extend the study of [1] to estimate rotational motion. We should estimate angular velocity at first in order to initialize our system as shown in the overall algorithm flow, Fig. 3.1.

The cost function for estimating angular velocity is Eq. (3.1).

maximize

ωm

I(w(~x, ω_m, ~δt))

2

(3.1)

I(~x) =

N

X

k=1

p_e_k ·δ_d(x−x_e_k), (3.2)

where I(x) is the event image, w(x, ω_m, δt) is the event data element-wise warping function. In a event bundle E|^t_t^m+1

m = {e_k}^N_k=1, ωm is angular velocity, and δt is the time difference between the event point time t_e_k and the reference time. We set the reference time as t_m which allows δt=tek−tm. The Dirac delta function δd is used to represent the event image.δt~ ∈R^1×N is the stack of δt_e_k, and ~x∈R^2×N is the stack ofx_e_k,

(22)

Figure 3.1: Algorithm flow

3.1 Event data element-wise warping

The warping function for the event point e_k is defined as in Eq. (3.3).

w(x, ω, δt) =K·exp(ˆωδt) ˙x (3.3)

where ˙x = K⁻¹x is a inverse projected 3D point. In the event bundle E|^t_t^m+1_m , we should compute the warping function for event points element-wisely with different δt_e_k. However, iterative calculations for element-wise warping have high computational load. The author in [1] proposed the vector warping function. The vector warping function transforms the iterative operation into matrix operation with first order approximation on rotational matrix as follows Eq. (3.4).

w(~x, ω, ~δt) =K·(~x˙ + ˆω·~x˙∗δt)~ (3.4) Since ˆω ∈R^3×3,~x˙ ∈R^3×N,δt~ ∈R^1×N, the term in the right hand side of Eq. (3.4) satisfies the dimension w(~x, ω, ~δt)∈R^3×N.

In order to compute the vector warping accurately, we adopted Rodrigues’ formula. Our proposed vector warping function Eq. (3.5) not only avoids the iterative operation, but also provides accurate warping results as shown in Fig. 3.2.

w(~x, ω, ~δt) =K·(~x˙ + ωˆ ω ~δt

sin ω ~δt

·~x˙ ∗δt~ + (ˆω)² ω ~δt

2(1−cos ω ~δt

)·~x˙ ∗δt)~ (3.5)

(23)

(a) (b)

(c) (d)

Figure 3.2: Event data element-wise warping. (a),(c) event points before warping. (b),(d) event points warping with angular velocity which maximize the event image contrast

3.2 Jacobian matrix derivation

In order to solve the cost function in Eq. (3.1) with a non-linear optimizer in Section 2.4, we derive the Jacobian matrix as follows.

J =kI(w(x, ω, δt)))k² (3.6)

The cost function is differentiated as Eq. (3.7).

dJ

dω_m = 2I(w(x, ωm, δt)))·dI(w(x, ωm, δt)))

dω_m (3.7)

where

(24)

dI(w(x, ω_m, δt))) dωm

=

N

X

k=1

p_e_k · ∇δ_d(x−w(x_e_k, ω_m, δt_e_k))·dw(x_e_k, ω_m, δt_e_k) dωm

=

N

X

k=1

h

I_x_ek I_y_ek

i·dw(x_e_k, ω_m, δt_e_k)

dω_m (3.8)

Then, we solve the warping function with x˙_e_k = (x, y, z) dw(xek, ωm, δtek)

dωm

= dK·exp( ˆωmδtek)·x˙ek

dωm

=





fx

z 0 −^xf_z2^x

0 ^f_z^y −^yf_z₂^y



·h

xI3×3 yI3×3 zI3×3

i

·dexp( ˆωmδtek)

dω_m (3.9)

Let, ωm = (ωm1, ωm2, ωm3), ande1, e2, e3 are the standard order basis.

dexp( ˆωmδte_k)

dω_m = dexp((ωm1·eˆ1+ωm2·eˆ2+ωm3·eˆ3)δte_k) dω_m

=







dexp((ωm1·eˆ1+ωm2·eˆ2+ωm3·eˆ3)δt_ek) dωm1







=





 ˆ e₁

ˆ e2

ˆ e₃







·exp( ˆωmδte_k)·δte_k

≈





 ˆ e1

ˆ e₂

ˆ e3







·δt_e_k, (3.10)

Therefore, from Eqs. (3.7), (3.9) and (3.10) dJ

dωm

x=x_ek = 2I(w(xek, ωm, δtek)))

·h

Ix_ek Iy_ek

i

·





fx

z 0 −^xf_z₂^x 0 ^f_z^y −^yf_z₂^y



·h

xI3×3 yI3×3 zI3×3

i

·





 ˆ e₁ ˆ e2

ˆ e₃







·δtek (3.11)

= 2I(w(xek, ωm, δtek)))·h

Ix_ek Iy_ek

i

·





−^xy_z2fx (1 +^x_z²2)fx −^y_zfx

−(1 +^y_z²₂)f_y ^xy_z2f_y ^x_zf_y



·δtek

with the Jacobian matrix, we solve the optimization problem with RMSprop optimizer. The optimization results are shown in Fig. 3.3

(25)

(a) iteration 0 (b) iteration 15

(c) iteration 30 (d) iteration 30

Figure 3.3: The iteration lapse of optimization for angular velocity estimation. Event points are warped to maximize the contrast.

(26)

4

Rotational motion estimation

In order to estimate rotational motion, we utilize the event points map as the milestone in new environments. We used aligned event points in Fig. 3.3d for 3D spherical mapping. After mapping with event points, the algorithm maximizes not only the contrast with respect to angular velocity, which is defined in Eq. (3.1), but also the contrast between 3D spherical map M which is in 3D domain, and event image I which is in the 2D image plane. The cost function for estimating rotational motion is Eq. (4.1)

maximize

ωm,Rm−1

I(w(~x, ω_m, ~δt))

2

+λ

I(w(~x, ω_m, ~δt)) +π(M(w(E

tm

t0

, Rm−1)))

2

(4.1) where λis the weight parameter, and the warping function inM isKRK⁻¹x.

(27)

Figure 4.1: Polarity time graph for mapping

4.1 3D spherical mapping

There are several methods to construct the map. In [13], the authors used spherical mosaic map to estimate the motion and to reconstruct the intensity image. However, the spherical mosaic map method, which projects the sphere image into 2D plane, does not represent area uniformly.

The upper and lower regions are distorted severely in spherical mosaic map. Thus, we used 3D spherical map to estimate rotational motion stably regardless of the angle position.

We assume that all event points have the same distance to origin, since the depth of the event point is not necessary in the pure rotational motion estimation. Our algorithm continuously construct the map with respect to the initial pose. The algorithm warps the event points into reference time t_ref element-wisely and then the global warping into initial time t0 is processed as shown in Fig. 4.1. Where 3D spherical map M is set of aligned event points, and the aligned event bundle in the time interval is defined as (tm, tm+1) as ¯E|^t_t^m+1

m . The example of 3D spherical map is depicted in Fig. 4.2.

(28)

Figure 4.2: 3D spherical map example for 360^◦motion

(29)

Figure 4.3: Polarity time graph for rotation motion estimation

4.2 rotational position estimation

The proposed algorithm solves the cost function Eq. (4.1) to estimate the angular velocity and the rotational position simultaneously. The first term in cost function Eq. (4.2) is for estimating angular velocity, and the second term of Eq. (4.3) is for estimating rotational position. We warp the 3D map points into the reference timetm. Also, the event points in (tm, tm+1) are also warped element-wise into the reference time t_m. We project the 3D map points into the 2D image plane and compute the contrast by squaring the sum of the event image I and the 3D map M. As the projected map image overlaps the event image, the contrast value in Eq. (4.3) becomes larger.The projected spherical map and the time lapse of the optimization are illustrated in Figs. 4.4 and 4.5.

Jω =

I(w(~x, ωm, ~δt))

2

(4.2) J_R=

I(w(~x, ω_m, ~δt)) +π(M(w(E

tm

t0

, Rm−1)))

2

(4.3)

(30)

The Jacobian matrix can be derived in Eq. (3.7) to Eq. (3.11) is used in Eq. (4.4)

dJ_R dRm−1

x= 2(I+π(M))·h

M_x M_y i·





fx

z 0 −^xf_z₂^x 0 ^f_z^y −^yf_z2^y



·h

xI3×3 yI3×3 zI3×3

i·





 ˆ e1

ˆ e₂ ˆ e3







(4.4)

= 2(I+π(M))·h

Mx My

i

·





−^xy_z₂f_x (1 +^x_z2²)f_x −^y_zf_x

−(1 + ^y_z²₂)fy xy

z²fy x zfy



 (4.5)

Using the Jacobian matrices, we adopt the RMSprop optimizer in Eq. (2.16) to update the ωm, Rm−1 as follows.

ω ←−ω+ η_ω

√Gω+· ∇_ωJω (4.6)

R←−R+√ ηR

GR+· ∇_RJR (4.7)

The initial state is crucial in non-linear optimization. We set the initial state to avoid local maximum as follows in Eqs. (4.8) and (4.9).

R_m^init=Rm−1·exp( ˆω_m∆t) (4.8)

ω_m+1^init =ω_m (4.9)

(31)

(a)

(b)

Figure 4.4: (a) The projected 3D map π(M) and (b) the corresponding grey image. The image in the red square of (a) is corresponded to (b)

(32)

(a) iteration 0 (b) iteration 20

(c) iteration 40 (d) iteration 60

Figure 4.5: The iteration lapse of optimization for rotational motion estimation. The blue points represent the projected 3D mapπ(M) and the red points represent the event imageI. The projected map less changes due to the accurate initial pose from Eq. (4.8).

(33)

5

Experimental results

Because there are no existing event data sets for pure rotational motion, we obtain the rotational motion data set using DAVIS240C, VICON tracker. DAVIS240C is the vision sensor that can operate as a dynamic vision sensor(DVS, event camera) and an active pixel sensor (APS, grey camera). The pixel resolution of the vision sensor is 240×180.

Inertial measurement unit (IMU) is also supported in DAVIS240C. The IMU sensor provides translation acceleration and gyro velocity data at 2500Hz Fig. 5.1a. We compare the proposed algorithm to the integrated gyro velocity data for the rotational position estimation accuracy.

We use the VICON tracker as the ground truth pose, which can accurately estimate the position within millimeter error range at 100Hz Fig. 5.1b. We quantitatively evaluate the proposed algorithm with VICON ground truth data. However, the event camera detects near infra red (IR) ray, so unwanted artifacts are recorded in obtained data sets. Event points with negative polarity are continuously spiked in the shadow region, and event points with positive polarity are continuously spiked when the event camera faces the VICON camera which emits the IR light.

Thus, we acquire the event data sets without VICON tracker, and also evaluate the proposed algorithm qualitatively on the data sets.

(34)

(a)

(b)

Figure 5.1: The equipment for obtaining data set. (a) DAVIS240C, (b) VICON tracker

5.1 Qualitative evaluation: rotational position estimation

We acquire VICON-free data sets from the room, the lobby and outside of the building to evaluate the proposed algorithm in various environments. Since the raw event images do not help recognizing the environments, we illustrate the data sets as grey image sequences in Fig. 5.2.

The sequencethe room is the normal environment that suitable for imaging. Inthe room, we reveal the advantages of the 3D spherical mapping by moving the event camera at various angles.

The sequence the lobby contains high dynamic range scene. The grey image recorded form the conventional camera cannot illustrate the intensity of image regularly on very bright sunlight and the stairs in the dark. Thus, the proposed method reveals the advantage of the HDR property of the event camera in the sequence the lobby.

In the sequenceOutside of the building, the proposed method performs on the abundant and less noisy event data. We compare the proposed method which estimates rotational position with the method which integrates the angular velocity proposed in [1]. The event mapping results are shown in Figs. 5.3 to 5.5. The last row in each figure shows the scene where a map is constructed already. By comparing the image of the table captured at the beginning and end of the lobby sequence in Fig. 5.4, marked with blue (current) and red (previous) ellipsoids, the proposed algorithm shows smaller drifted results while the method with integrated angular velocity from [1]

shows the significant drift.

(35)

RMS error Proposed Angular velocity integration [1]

IMU

X-axis 1.4772 17.8808 3.1826

Y-axis 0.3603 10.7691 12.2468

Z-axis 2.0894 13.6386 4.1243

Table 5.1: Root mean square error of rotational position in sequenceturn around.

RMS error Proposed Angular velocity integration [1]

IMU

X-axis 1.6620 9.3317 3.2792

Y-axis 1.0380 4.4999 7.6538

Z-axis 1.3823 28.2835 8.8019

Table 5.2: Root mean square error of rotational position in sequence high speed.

5.2 Quantitative evaluation: rotational position estimation

We acquire data sets with the ground truth VICON pose in the experimental room. We evaluate the proposed algorithm with the 360 ^◦ rotational motion data set, i.e. the sequence named turn around, and also evaluate the algorithm with the high speed rotational motion data set called the sequence high speed. The grey images and mapping results for turn around are shown in Fig. 5.6 and those high speed are shown in Fig. 5.8. We use the root mean square (RMS) error of rotational position as the evaluation metric, with VICON data for ground truth. We compare the proposed algorithm with the other method such as IMU, integrated angular velocity from [1] for each sequence. The overall RMS error tables are shown in Table 5.1 and Table 5.2.

In the sequenceturn around, the event camera rotates 360^◦ with respect to the Y-axis. There are many events in the direction perpendicular to the Y-axis, resulting in much smaller drift error on the Y axis than on the other axes in Table 5.1. In contrast, IMU accumulated more drift error on the Y-axis and less drift error in the other axes.

(36)

The grey images of the sequence are severely blurred as shown in Fig. 5.8a, which could not be processed by the conventional vision algorithms. In the sequencehigh speed, the proposed method estimates rotational position stably despite the high speed ego motion, while angular velocity is not properly integrated to estimate the rotational position.

(37)

(a) The room (b) The lobby (c) Outside of the building

Figure 5.2: The grey image of VICON-free data sets.

(38)

(a) The result from [1] (b) The proposed method

Figure 5.3: The projected map from [1] and the proposed method in the room.

(39)

Figure 5.4: The projected map from [1] and the proposed method in the lobby.

(40)

Figure 5.5: The projected map from [1] and the proposed method outside of the building.

(41)

(a) The grey images (b) The brightened grey images (c) The projected map images

Figure 5.6: The grey image and the projected map with the proposed method in sequence, turn around.

(42)

(a)

‘

(b)

Figure 5.7: The rotational position estimation result of turn around. (a) rotational position and (b) rota-

(43)

(a) The grey images (b) The projected map images

Figure 5.8: The grey image and the projected map with the proposed method in sequence,high speed.

(44)

(a)

(b)

Figure 5.9: The rotational position estimation result ofhigh speed. (a) rotational position and (b) rotational

(45)

6

Conclusion

This paper presented the rotational motion estimation method using only an event camera. We used the 3D spherical map in order to reliably estimates rotational angles in all angular regions without a singularity or the gimbal lock problem. We evaluated the proposed algorithm with various data sets including the high speed ego motion sequence and the high dynamic range sequences.

The proposed method gives more accurate results than the integral of angular velocity in angular position estimation, and shows even higher accuracy than the IMU, within the maximum error of 2 degrees. In conclusion, the proposed method maximizes the advantage of the event camera on estimating rotational motion. The algorithm can be fully applied to the fast angular motion estimation for AR/VR applications where pure rotational motion occurs frequently. As a future work, we will conduct the 6-DOF motion estimation using both event cameras and grey cameras, which also takes the advantage of the properties of event cameras.

(46)

References

[1] G. Gallego and D. Scaramuzza, “Accurate angular velocity estimation with an event camera,”

IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 632–639, April 2017.

[2] R. Mur-Artal, J. M. M. Montiel, and J. D. Tard´os, “Orb-slam: A versatile and accurate monocular slam system,”IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, Oct 2015.

[3] R. Wang, M. Schw¨orer, and D. Cremers, “Stereo dso: Large-scale direct sparse visual odometry with stereo cameras,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct 2017, pp. 3923–3931.

[4] F. Steinbr¨ucker, J. Sturm, and D. Cremers, “Real-time visual odometry from dense rgb-d images,” in 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Nov 2011, pp. 719–722.

[5] G. Klein and D. Murray, “Parallel tracking and mapping for small ar workspaces,” in2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nov 2007, pp. 225–234.

[6] R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “Dtam: Dense tracking and mapping in real-time,” in2011 International Conference on Computer Vision, Nov 2011, pp. 2320–2327.

[7] P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128×128 120 db 15µs latency asynchronous temporal contrast vision sensor,” IEEE Journal of Solid-State Circuits, vol. 43, no. 2, pp.

566–576, Feb 2008.

[8] E. Mueggler, B. Huber, and D. Scaramuzza, “Event-based, 6-dof pose tracking for high-speed maneuvers,” in2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sep. 2014, pp. 2761–2768.

(47)

[9] G. Gallego, J. E. A. Lund, E. Mueggler, H. Rebecq, T. Delbruck, and D. Scaramuzza, “Event- based, 6-dof camera tracking from photometric depth maps,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 10, pp. 2402–2412, Oct 2018.

[10] H. Kim, S. Leutenegger, and A. J. Davison, “Real-time 3d reconstruction and 6-dof tracking with an event camera,” vol. 9910, 10 2016, pp. 349–364.

[11] H. Rebecq, T. Horstschaefer, G. Gallego, and D. Scaramuzza, “Evo: A geometric approach to event-based 6-dof parallel tracking and mapping in real time,”IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 593–600, April 2017.

[12] H. Rebecq, G. Gallego, and D. Scaramuzza, “Emvs: Event-based multi-view stereo,” Tech.

Rep., 2016.

[13] H. Kim, A. Handa, R. Benosman, S.-H. Ieng, and A. Davison, “Simultaneous mosaicing and tracking with an event camera,” in Proceedings of the British Machine Vision Conference.

BMVA Press, 2014.

[14] J. Tumblin, A. Agrawal, and R. Raskar, “Why i want a gradient camera,” in2005 IEEE Com- puter Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, June 2005, pp. 103–110 vol. 1.

[15] G. Gallego, H. Rebecq, and D. Scaramuzza, “A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation,” in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp. 3867–

3876.

[16] P. Debye, “Näherungsformeln für die zylinderfunktionen für große werte des arguments und unbeschränkt veränderliche werte des index,” Mathematische Annalen, vol. 67, no. 4, pp.

535–558, 1909.

[17] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” Journal of Machine Learning Research, vol. 12, no. Jul, pp. 2121–

2159, 2011.

(48)

국 문 초 록

생체모방형카메라인 이벤트카메라는빛의밝기변화를측정하여위치,시간그리고극정보를저장 하는새로운개념의카메라이다.기준시간동안빛의밝기를측정하는일반카메라와다른방식으로 동작하여높은동적범위(HDR)를가지며,동작으로인한흐림흐림(motion blur)이없고,마이크로 초단위의대기시간을가진다는장점이 있다.그러나비동기이벤트정보는기존의영상처리기법을 적용할수없어새로운연구가필요하다.이러한영상센서의 장점을활용하기위해이벤트카메라를

활용한 특징점 추출, 깊이 값, 광 흐름 (Optical Flow) 추정 연구가 앞서 수행되었으며, 최근에는

영상강도(Image Intensity)복원연구를비롯하여공간상의카메라움직임을추정하는영상항법을

중심으로연구되고있다.그러나비동기적이벤트정보만활용하여영상항법을수행하는데어려움이 있어대다수의연구는일반카메라,관성 센서(IMU)와같은외부센서를 사용하였다.외부센서를 통합하는방법은적은성능향상에많은비용(계산처리량,중량,금전적비용등)을요구할수있다.

따라서, 본 논문에서는 이벤트 카메라의 장점을 최대한 활용하기 위해 다른 센서와 통합하지 않은 순수한이벤트카메라정보만사용하여각운동을추정하는알고리즘을제안한다.이벤트카메라로는 일정시간동안들어온이벤트 점들을누적시켜이미지를얻을 수있는데,주로 영상의밝기변화는 조도변화보다움직임에의해발생하기때문에영상내의밝기가 변화하는 모서리에서주로이벤트 점을얻을수있다.따라서이벤트점들을모서리에정렬시키게되면유의미한정보를얻을수있다. 본 논문에서는이미지의대비가 최대로되는각속도를찾는 기존의연구를 확장하여 정확한각운동 을추정하는알고리즘을제안한다.이미지의대비를최대화하는각속도로와핑(warping)된이벤트 점을활용하여구형지도를작성하였고,와핑된이벤트점들로얻어진이미지와구형지도를2차원으 로 투영시킨 이미지의대비를 다시최대화하여각운동을추정하였다.구면평면화 지도 (Spherical Mosaicing Map)대신3차원공간상의구형지도를사용하고,리대수(Lie algebra)를활용하여짐벌 락문제없이모든방향의각도를추정할수있는알고리즘을제안한다.

주요어:이벤트카메라,영상항법,각운동추정.

학번: 2017-21960