Superpixel-based Vehicle Detection using Plane Normal Vector in Dispar ity Space

(1)

1. INTRODUCTION

Over the past decade, the incidence of urban traffic accidents still has increased, while the in- cidence of an overall traffic accident has slowly decreased. According to “European accident re- search and safety report 2013” [1], more than 90 percent of those driving accidents originated from the mistakes of drivers. In order to decrease the accidents occurred by the mistakes, there is a ten- dency to obligatory installation of intelligent ve- hicle system using various sensors such as Lidar, Laser, Radar, and camera.

Those sensor-based ADASs have achieved great developments, which provide better under- standing of the environment in order to improve traffic safety and efficiency. ADAS mainly assists human drivers by alerting them in the situation such as lane and road departure, obstacle collision, pedestrian and vehicle detection, and parking assistance. Among the various sensing modalities, imaging technology has extremely been developed in recent years. Vision-based sensors are cheaper, smaller, and of higher quality than ever before.

Simultaneously, computing power has dramatically increased. Vision-based sensors especially give

Superpixel-based Vehicle Detection using Plane Normal Vector in Disparity Space

Jeonghyun Seo

^†

, Kwanghoon Sohn

^††

ABSTRACT

This paper proposes a framework of superpixel-based vehicle detection method using plane normal vector in disparity space. We utilize two common factors for detecting vehicles: Hypothesis Generation (HG) and Hypothesis Verification (HV). At the stage of HG, we set the regions of interest (ROI) by estimating the lane, and track them to reduce computational cost of the overall processes. The image is then divided into compact superpixels, each of which is viewed as a plane composed of the normal vector in disparity space. After that, the representative normal vector is computed at a superpixel-level, which alleviates the well-known problems of conventional color-based and depth-based approaches.

Based on the assumption that the central-bottom of the input image is always on the navigable region, the road and obstacle candidates are simultaneously extracted by the plane normal vectors obtained from K-means algorithm. At the stage of HV, the separated obstacle candidates are verified by employing HOG and SVM as for a feature and classifying function, respectively. To achieve this, we trained SVM classifier by HOG features of KITTI training dataset. The experimental results demonstrate that the proposed vehicle detection system outperforms the conventional HOG-based methods qualitatively and quantitatively.

Key words: Normal Vector Computation, Stereo Matching, Superpixel Segmentation, Free Space Estimation, Vehicle Detection

※ Corresponding Author : Kwanghoon Sohn, Address:

C129, The 3rd Engineering Building, Yonsei University, 50 Yonsei-ro, Seodaemun-Gu, Seoul 120-749, Korea, TEL : +82-2-2123-2879, FAX : +82-2-2123-7712, E-mail : [email protected]

Receipt date : Dec . 25, 2015 Revision date : Apr . 10, 2016 Approval date : May . 18, 2016

††

Dept. of Electrical and Electronic Engineering, Yonsei University (E-mail : [email protected])

††

Dept. of Electrical and Electronic Engineering, Yonsei University

※ This work was supported by the National Research

Foundation of Korea(NRF) grant funded by the Korea

government(MSIP) (NRF-2013R1A2A2A01068338)

(2)

lots of information such as colors, shapes, depth and have advantages that driver can effectively understand surrounding environment by seeing image.

With advances in camera sensing and computa- tional technologies, advances in vehicle detection using monocular / stereo vision, and sensor fusion has been an extremely active research area in the intelligent vehicles community. Basically, resear- chers use the monocular vision-based system for vehicle detection. Most monocular vision-based systems utilize the low-level information such as color, texture, edges, and motion information ob- tained from difference images between temporal neighborhoods. The low-level information is pow- erful to understand environment around vehicles and to detect obstacles. However, there are some limitations which cause the loss of the depth and structure information in monocular vision- based ones. To overcome these limitations, the stereo vi- sion-based system is introduced and researched in recent years. Not just supplement the depth and structure information, stereo vision-based systems can also provide the real distance to obstacles and inform the accurate location of the obstacles to drivers. However, the quality of these information is extremely dependent on depth information. The depth is measured by calculating the difference be- tween the position of a feature in a reference image and the one of the target image. In this paper, ve- hicle detection method is proposed to alleviate the problems of conventional color-based and depth- based approaches. It is based on a superpixel- based plane normal vector in disparity space, where Histogram of Gradient (HOG) and Support Vector Machine (SVM) are used as a feature and classi- fier, respectively.

The rest of this paper is organized as follows.

Chapter 2 provides a brief review of vision-based vehicle detection. We then describe the proposed vehicle detection methods in Chapter 3. Chapter 4 shows the experimental results and their evalua-

tions on the KITTI dataset. Finally, we conclude the paper with discussion for further works in Chapter 5.

2. RELATED WORKS

Vehicle detection methods can be categorized into two groups: 1) correlation based approaches using template matching and 2) learning based ap- proaches using an object classifier.

1) Template Matching: The correlation based method uses a predefined template and determines a similarity measure by calculating correlation be- tween the ROI and the template. Since there are various models of vehicles with different appear- ances, a general template with common features of a vehicle is used. These features include the rear window and number plate, a rectangular box with specific aspect ratio and a “U” shaped pattern with one horizontal and two vertical edges [2], [3]. A vehicle could appear in different sizes depending on its distance from the camera; therefore, the cor- relation test is usually performed at several scales of ROI. Intensity of the image is also normalized before the correlation test to get a consistent result.

2) Classifier-Based Vehicle Verification: This

approach uses two-class image classifiers to dif-

ferentiate vehicle from non-vehicle candidates. A

classifier learns the characteristics of a vehicle’s

appearance from training images. Training is

based on a supervised learning approach where a

large set of labeled positive (vehicle) and negative

(non-vehicle) images are used. The most common

classification schemes for vehicle verification in-

clude SVM [4,20], AdaBoost [5]. To facilitate the

classification, training images are first prepro-

cessed to extract descriptive features. Selection of

features is important for achieving good classi-

fication results. A fine set of features should cap-

ture most of the variability in the appearance of

a vehicle. Numerous features for vehicle classi-

fication have been proposed such as HOG [6],

(3)

Gabor [7], Principal Component Analysis (PCA) [8].

HOG feature captures local histogram of an im- age’s gradient orientations in a dense grid and it was first proposed by Dalal [6] for human classi- fication. System developed using a linear SVM classifier trained on HOG features was able to spot vehicles in different traffic scenarios [9].

Gabor features had been known for texture analysis of images [10] as they capture local lines and edges information at different orientations and scales. SVM classifier trained on Gabor features were tested for vehicle detection and achieved 94.5% detection rate at 1.5% false detection [7]. It was shown that the classifier outperformed a PCA feature based ANN classifier. A systematic and general evolutionary Gabor filter optimization (EGFO) approach to select optimal Gabor parame- ters (orientation and scale) was investigated to im- prove the performance of vehicle detection [11].

SVM classifier trained on Boosted Gabor features with parameters (orientation and scale) selected from learning had reported 96% detection rate [12].

3. PROPOSED METHOD

3.1 Algorithm overview

The proposed vehicle detection method aims at effective collision avoidance by identifying not only the vehicle but also the road. The overview of the proposed framework is illustrated in Fig. 1. We uti- lize the two factors common to vision-based ve- hicle detection methods: Hypothesis Generation (HG) and Hypothesis Verification (HV).

At the stage of HG, the lane estimation is per- formed for an adaptive ROI selection. We employ Canny edge detection and Hough transform, where the lane is tracked every frame. Simultaneously, the disparity estimation is performed using a Semi Global Matching (SGM) [14], and Simple Linear Iterative Clustering (SLIC) superpixels [13] are then extracted based on the color information, which divides the estimated depth map to compute the plane normal vectors in disparity space. A clustering is employed to the normal vector of each plane at a superpixel level. With the assumption that the central-bottom of the input image is al- ways on the road region, the K-means clustering algorithm is utilized to decide whether each plane

Fig. 1. Overview of the proposed framework.

(4)

belongs to the road or obstacle. In this process, we extract the obstacle candidates in the selected ROI.

Next, at the stage of HV, the obstacle candidate regions are verified by employing HOG and SVM as a feature and classifying function, respectively.

To achieve this, we trained SVM classifier with HOG features of KITTI training dataset. We then extract HOG features of the obtained obstacle can- didates and classify the vehicles by the trained SVM classifier. Each step of the proposed frame- work is described in the following subsections.

3.2 Selection of a region of interest (ROI) using lane detection and tracking

The definition of vanishing point is the image point at which the projection of parallel line intersect. It is invariant feature of the image and has been used for both qualitative and quantitative image analysis. The methods using local oriented textures are able to detect more reliable vanishing points for unstructured road, but edge-based van- ishing point detection method is suitable for re- al-time implementation. In this paper, since we mainly deal with vehicle detection for structured environments such as highway and urban scene in KITTI dataset, we employ an edge-based vanish- ing point detection. First, the edges are detected by Canny edge detection method. Lines are esti- mated by Hough transform. The detection of the straight line is a necessary prerequisite process for all the methods of vanishing point detection. Lines are comprised of the basis of grouping connected regions of pixels which have similar gradient orientations. In many lines, we extract two domi- nant lines by angle of lines. We assume that the two dominant lines is lane, and designate un- der-region as a region of interest (ROI). These ROIs are essential to reduce the computational cost of the following process. Vehicle detection in ROI has advantages that are robust to damaged roads, various traffic environment. Thus, the ROI using vanishing point detection is able to provide a

strong constraint in challenging traffic environ- ment.

3.3 Road and obstacle region extraction using plane normal vector in disparity space Disparity is the measure of difference between the position of a feature in a reference image and the target image. Since it provides depth in- formation robust to illumination variation, we esti- mate disparity to distinguish the road from the obstacles. In this paper, dense disparity maps are computed using a semi-global matching (SGM) [14]. Although the SGM is one of the most reliable stereo matching algorithms, the estimated dis- parity map is still sparse in challenging traffic scene including low-textured road surfaces or shady regions. As shown in Fig. 2(a), it can be seen that the disparity values in low-textured region are presented sparse due to the failure of estimation.

In order to handle this problem, the image is seg- mented by superpixels and the representative dis- parity value is allocated to each superpixel. Notic- ing that the disparity estimation technique relies on the smoothness assumption, i.e., nearby pixels with similar appearance have similar disparities.

The low-texture and shadow problem thus can be handled effectively by the manner of the over-seg- mentation since superpixels are likely to be uni- form in color and texture. We perform the over- segmentation of the input image into superpixels using SLIC [13]. As shown in Fig. 2(b), superpixels produce spatially appealing segments of road and tend to preserve boundaries effectively.

In order to estimate geometric information of the input stereo pair, we find the best-fit plane for each

(a) (b)

Fig. 2. Scene including low textured road surface. (a)

Original image, (b) its corresponding disparity

map.

(5)

superpixel. The best-fit plane is obtained from a set of points which consists of image coordinates and disparity. Let us denote given stereo pairs



_

 I →R

^

and ^



 I →R

^

, and their corresponding disparity map ^{  I →L} that assigns each pixel index p to a disparity ^



∈L , where ^{I ⊂ N}

^

is a dense dis- crete image domain and ^L is a discrete candidate.

Our goal is to estimate plane normal vectors N

_

 I →R

^

and ^N



 I →R

^

. Without loss of general- ity, we concentrate on deriving ^N



in the following.

Let ^S



denote the  -th index set of pixels in ^



. The plane is defined as follows;

  A

_

  B

_

  C

_

(1)

where the ^A



, ^B



, and ^C



represent the constants of the plane equation in the ^ -th superpixel, and u, v, and d are the variables of X, Y, and Z coor- dinates, respectively. We then yield the error func- tion ^



to fit the best-fit plane in ^S



as follows;



_

 

∈ S_

 ^A





_

 B

_



_

 C

_

 ^{ }





^

(2) where ^ ^S



denotes the ^ -th index set of non-error disparity pixels in ^S



. Since equation (2) shows the quadratic form, the optimal values for ^A



B

_

C

_

 can be computed by its derivative as follows;

∇

_

  

∈ S_

 ^A





_

 B

_



_

 C

_

 ^{ }



 ^





_



 

(3)

Then, the representative normal vector ^ ^



of the plane in ^S



is estimated as follows;

 

_

  ^ ^C



A

_

  C

_

B

_

  C

_

  ⁽⁴⁾

As a result, the representative normal vector in equation (4) is assigned to each superpixel. Noting that the normal vector is estimated by non-error disparity values in each superpixel, the proposed framework shows robustness to the error of stereo matching as shown in Fig. 3.

Here we are targeting to determine the region of road for each superpixel by considering the sim- ilarity of ^ ^



. Based on the assumption that the

central-bottom of input image is included in the road region, we apply the K-means clustering al- gorithm for ^ ^



to decide whether each superpixel belongs to the road or not [15]. Let us denote ^ ^



is clustered into a set of K clusters, where ^



is an index set of the ^ -th cluster. K-means algorithm finds a partition which minimizes the squared error between the mean of a cluster and the normal vec- tors in the cluster. Let ^



be the mean of normal vectors in the ^ -th cluster. The sum of squared er- ror between ^



and the normal vectors in the ^ -th cluster is defined as

  ^



 ^

_{∈ }





∥ ^ ^



 

_

∥

^

(5)

The goal of K-means clustering is to minimize the sum of squared error over all the K clusters as follows:

  

  

  ^



 (6)

Then, the K sets of normal vectors is extracted.

Let the cluster of normal vectors on the cen- tral-bottom of given scene be ^



. Finally, the road region ^S



be determined as follows;

S

_

  obstace ^road otherwise ^∈

^

(7) As illustrated in Fig. 4. the proposed scheme effectively classifies the road since the clustering is performed in 3D space. In the obtained ROI as shown in Fig. 5(a), the road and obstacle are rep- resented as in Fig. 5(b).

3.4 Vehicle detection using active learning HOG descriptors presented in [6] provides ex- cellent performance relative to other existing

(a) (b)

Fig. 3. Effectiveness of the superpixel-based computa-

tion. (a) pixel-wise estimation, (b) superpixel-

based estimation.

(6)

feature sets including wavelets. The basic hy- pothesis is that local object appearance and shape can often be characterized rather well by the distribution of local intensity gradient or edge directions, even without precise knowledge of the corresponding gradient or edge positions.

In [6], each detection window is divided into cells of size 8 × 8 pixels and each group of 2 × 2 cells is integrated into a block in a sliding fashion, so blocks overlap with each other. For each pixel I(x,y), the gradient magnitude m(x,y) (equation (10)) and orientation ^ (x,y) (equation (11)) is computed in these cells. Then, a local one-dimensional orientation histogram of gra- dients is formed from the gradient orientations of sample points within a cell. Each histogram di- vides the gradient angle range into a predefined number of bins (e.g. 9 bins). The gradient mag- nitudes vote into the orientation histogram.

Each block contains a concatenated vector of all its cells. In other words, each block is repre- sented by a 36-D feature vector that is normal- ized to an L2 unit length (equation (12)). Each 64

× 128 detection window is represented by 7 × 15 blocks, producing a total of 3780 features per de- tection window. Apparently this feature extrac- tion is a dense representation that map local im-

age regions to high-dimension feature spaces.

These features are then used to train a linear SVM classifier.

         (8)

         (9)

   ^ 

^

 

^

(10)

  tan

^{ }

 ^ 

  ⁽¹¹⁾

v←v  ^ ^∥v∥



 

^

(12)

We use linear SVM other than Radial Basis Function (RBF) kernel SVM as our binary classi- fier because the number of HOG features is large, one may not require map data into a higher dimen- sional space and linear SVM is computationally faster. It trains with easy case, in which the train- ing patterns are linearly separable. It is a linear function of the form as

  

^

   (13)

such that for each training example ^



, the func- tion yields ^



 ≥  for ^y



  , ^



   for y

_i

  , and ^{  }

^

^{    } is the hyperplane.

The SVM [16] is a supervised learning model with related learning algorithms that examine data and recognize patterns based on the structure risk minimization principle. In the simple binary classi- fication case, the objective of the SVM is to find a best separating hyperplane with a maximum margin. The simple definition of a SVM classifier form as

y 

_{  }



^

^y

^

^

^

^{}

^

^{  } ⁽¹⁴⁾

where ^ is feature vector of an observation data, y    is a class label, ^



is feature vector of the ^ -th training sample, N is a total number of training samples, and ^{}



 is a kernel function.

In this SVM learning process, ^{ }  ^





_

⋯

_

 is computed.

The training data include positive and negative examples which are fixed resolution image win- dows. Each positive window usually contains only Fig. 4. Colorization of plane normal vector.

(a) (b)

Fig. 5. Extraction of obstacle and road region in ROI.

(a) ROI, (b) Result.

(7)

one centered instance of the vehicle, and negative windows are usually randomly sub-sampled and cropped from set of images not containing any in- stances of the vehicle. We extract HOG features from all the training data, and train the linear SVM classifier using these high-dimension feature vec- tors as shown in Fig. 1.

Actually in the set of images not containing any instances of the object used in the training, running the preliminarily trained classifier would generate many false positives. To reduce false positives and make full use of the training images, we use the preliminary detector exhaustively scan the neg- ative training images for hard examples (false pos- itives), and then the classifier is re-trained using this augmented training set (original positives and negatives, and hard examples) to produce the final detector.

During the detection phase, the binary window classifier is scanned across the regions of obstacle candidates obtained in HG part. This typically pro- duces multiple overlapping detections for each ob- ject instance. Thus, we separate from each obstacle candidates by using segmented disparity map. The representative disparity value of the superpixel is estimated by dominant value as shown in Fig. 6(a), and the regions of each obstacle candidates are ex- tracted by similarity of disparity. They are verified by our classifier, and we detect vehicles as shown in Fig. 6(b).

4. EXPERIMENTAL RESULTS

Our local processing platform is a standard PC with a CPU Intel I7 of 3.4GHz clock and 8GB of RAM, and the computation environment is

MATLAB R2014a. The KITTI data set [17] is se- lected as the test set for performance evaluation, because it provides scenes which are captured by driving around the city, in rural areas and on highways. In addition to providing all the data in raw format, they give benchmarks for each task.

The KITTI data set contains 7,500 images of street scenes which are divided into 150 sequences with varying duration. The stereo cameras are mounted approximately level with the ground plane. The camera images are cropped to a size of 1382 × 512 pixels which are presented a calibrated, synchron- ized and rectified autonomous driving dataset.

The KITTI dataset consists of 7,500 images of various traffic scenes, which is comprising a total of 40,000 labeled objects with various aspects, poses, and illumination conditions. Among them, we use the vehicle-labeling image patch as the positives and randomly cropped image patch as the negatives. The positive dataset includes 1,701 im- age patches containing front, rear, and side view vehicles as shown in Fig. 7. Since the resolution of each image patch is various, we normalized the size of each image to 128 × 128 pixels.

One very important issue in the classifier train-

(a) (b)

Fig. 6. Result of vehicle detection. (a) Representative disparity value of the superpixel, (b) Result of vehicle detection.

Fig. 7. Training examples Posotives(left) and negtives (right).

Fig. 8. The bootstrap training diagram.

(8)

(a) (b)

Fig. 9. Examples of the proposed vehicle detection: (a) Region of vehicle candidates, (b) Detection result.

(9)

ing for one object class is how to select effective negative training samples. As negative training samples include all kinds of images, a prohibitively large set is needed in order to be representative, which would also require infeasible amount of

computation in training. To alleviate this problem, a bootstrapping method proposed by Sung and Poggio [18] is used to incrementally training the classifier as illustrated in Fig. 8.

Fig. 9 shows some of the results of detection ex-

(a) (b)

Fig. 10. Comparison with simple HOG-based vehicle detection methods.

Fig. 11. False positive rate of vehicle detection methods. (a) HOG-based vehicle detection [6], Result of the proposed

vehicle detection.

(10)

amples using the proposed method. Fig. 9(a) shows example images which extract obstacle candidates, and Fig. 9(b) shows the corresponding detection results. Fig. 10 shows comparison results for the proposed method with simple HOG-based vehicle detection methods. Vehicle detection performance is evaluated by using the measure discussed in [17]. As summarized in Fig. 11, the average false positive rate of the proposed method decrease by 6.15% compared with the HOG-based method [6].

5. CONCLUSION

This paper proposed the framework of super- pixel-based vehicle detection using normal vector in disparity space. We utilize the two factors com- mon to detect vehicles: HG and HV. Conventional stereo-based HG methods provide unsatisfactory results for stereo pairs under an uncontrolled envi- ronment such as illumination distortions. A ma- jority of efforts to address this problem have de- voted to develop a robust post process. The main advantage of our HG method is that the dis- parity-based normal vector yields road geometry insensitive to different illumination, and the super- pixel-based processing enables a robust disparity estimation on low textured road regions and re- flective vehicle surfaces. The experiments per- formed on the various road scenes have shown that the proposed framework robustly extract the road and obstacle candidates under various real traffic scenes, and outperforms current state of the art methods both qualitatively and quantitatively.

Subsequently, it makes the proposed vehicle de- tection system possible to assist efficiently de- tection on the vehicle, since regions to be verified are dramatically reduced. This framework is also applicable to other feature-based object detection problems. The performance of the proposed vehicle detection system is evaluated on the KITTI data- set, and it shows the reduced rate of false detection by over 6.15% compared to conventional HOG- based method.

REFERENCE

[ 1 ] Volvo Trucks, European Accident Research and Safety Report, 2013.

[ 2 ] P. Parodi and G. Piccioli, “A Feature-based Recognition Scheme for Traffic Scenes,”

Proceeding of IEEE Intelligent Vehicle Symposium, pp. 229-234, 1995.

[ 3 ] A. Bensrhair, M. Bertozzi, and A. Broggi, “A Cooperative Approach to Vision-based Vehicle Detection,” P roceeding of IEEE Intelligent Transportation Systems, pp. 207-212, 2001.

[ 4 ] V.N. Vapnik, The Nature of Statistical Learn- ing Theory, Springer, NewYork, 1995.

[ 5 ] Y. Freund and R.E. Schapire, “A Decision- Theoretic Generalization of On-line Learning and an Application to Boosting,” Proceeding of Conference Computational Learning Theory, pp. 23-37, 1995.

[ 6 ] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,”

P roceeding of IEEE Computer Vision and P attern Recognition, pp. 886-893, 2005.

[ 7 ] S. Zehang, G. Bebis, and R. Miller, “On-road Vehicle Detection Using Gabor Filters and Support Vector Machines,” Proceeding of International Conference Digital Signal Processing, Vol. 2, pp. 1019-1022, 2002.

[ 8 ] Q. Truong and B. Lee, “Vehicle Detection Algorithm Using Hypothesis Generation and Verification,” Emerging Intelligent Computing Technology and Applications, pp. 534-543, 2009.

[ 9 ] L. Mao, M. Xie, Y. Huang, and Y. Zhang,

“Preceding Vehicle Detection Using Histo- grams of Oriented Gradients,” P roceeding of IEEE International Conference Communica- tions, Circuits and Systems, pp. 354-358, 2010.

[10] S.E. Grigorescu, N. Petkov, and P. Kruizinga,

“Comparison of Texture Features Based on

Gabor Filters,” IEEE Transactions on Image

P rocessing, Vol. 11, No. 10, pp. 1160-1167,

2002.

(11)

[11] S. Zehang, G. Bebis, and R. Miller, “On-road Vehicle Detection Using Evolutionary Gabor Filter Optimization,” IEEE Transactions on Intelligent Transportation Systems, Vol. 6, No. 2, pp. 125-137, 2005.

[12] H. Cheng, N. Zheng, and C. Sun, “Boosted Gabor Features Applied to Vehicle Detection,”

IEEE P roceeding of International Confer- ence P attern Recognition, pp. 662-666, 2006.

[13] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P.

Fua, and S. Susstrunk, “SLIC Superpixels Compared to State-of-the-art Superpixel Methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, No. 11, pp. 2274-2282, 2012.

[14] H. Hirschmuller, “Accurate and Efficient Stereo Processing by Semiglobal Matching and Mutual Information,” Proceeding of IEEE Conference Computer Vision and Pattern Recognition, Vol. 2, pp. 807-814, 2005.

[15] T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, and A.Y. Wu, “An Efficient k-Means Clustering Algorithm:

Analysis and Implementation,” IEEE Tran- sactions on P attern Analysis and Machine Intelligence, Vol. 24, No. 7, pp. 881-892, 2002.

[16] T. Malisiewicz, A. Gupta, and A.A. Efros,

"Ensemble of Exemplar-svms for Object Detection and Beyond," P roceeding of IEEE International Conference Computer Vision, pp. 89-96, 2011.

[17] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun.

“Vision Meets Robotics: The KITTI Dataset,”

The International J . of Robotics Research, 2013.

[18] K.K. Sung and T. Poggio, “Example-based Learning for View-based Human Face Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1, pp. 39-51, 1998.

[19] N. Dalal, B. Triggs, and C. Schmid, “Human Detection Using Oriented Histograms of Flow

and Appearance,” P roceeding of European Conference on Computer Vision, pp. 428-441, 2006.

[20] M.S. Choi, J.H. Lee, J.H. Suk, T.M. Roh, and J.C. Shim, “Vehicle Detection based on the Haar-like feature and Image Segmentation, “ J ournal of Korea Multimedia Society, 13(9), 1314-1321, 2010.

Superpixel-based Vehicle Detection using Plane Normal Vector in Dispar ity Space

1. INTRODUCTION

Simultaneously, computing power has dramatically increased. Vision-based sensors especially give

Superpixel-based Vehicle Detection using Plane Normal Vector in Disparity Space

Jeonghyun Seo

, Kwanghoon Sohn

ABSTRACT

Key words: Normal Vector Computation, Stereo Matching, Superpixel Segmentation, Free Space Estimation, Vehicle Detection

※ Corresponding Author : Kwanghoon Sohn, Address:

C129, The 3rd Engineering Building, Yonsei University, 50 Yonsei-ro, Seodaemun-Gu, Seoul 120-749, Korea, TEL : +82-2-2123-2879, FAX : +82-2-2123-7712, E-mail : [email protected]

Receipt date : Dec . 25, 2015 Revision date : Apr . 10, 2016 Approval date : May . 18, 2016

Dept. of Electrical and Electronic Engineering, Yonsei University (E-mail : [email protected])

Dept. of Electrical and Electronic Engineering, Yonsei University

※ This work was supported by the National Research

Foundation of Korea(NRF) grant funded by the Korea

government(MSIP) (NRF-2013R1A2A2A01068338)

lots of information such as colors, shapes, depth and have advantages that driver can effectively understand surrounding environment by seeing image.

The rest of this paper is organized as follows.

Chapter 2 provides a brief review of vision-based vehicle detection. We then describe the proposed vehicle detection methods in Chapter 3. Chapter 4 shows the experimental results and their evalua-

tions on the KITTI dataset. Finally, we conclude the paper with discussion for further works in Chapter 5.

2. RELATED WORKS

Vehicle detection methods can be categorized into two groups: 1) correlation based approaches using template matching and 2) learning based ap- proaches using an object classifier.

2) Classifier-Based Vehicle Verification: This

approach uses two-class image classifiers to dif-

ferentiate vehicle from non-vehicle candidates. A

classifier learns the characteristics of a vehicle’s

appearance from training images. Training is

based on a supervised learning approach where a

large set of labeled positive (vehicle) and negative

(non-vehicle) images are used. The most common

classification schemes for vehicle verification in-

clude SVM [4,20], AdaBoost [5]. To facilitate the

classification, training images are first prepro-

cessed to extract descriptive features. Selection of

features is important for achieving good classi-

fication results. A fine set of features should cap-

ture most of the variability in the appearance of

a vehicle. Numerous features for vehicle classi-

fication have been proposed such as HOG [6],

Gabor [7], Principal Component Analysis (PCA) [8].

HOG feature captures local histogram of an im- age’s gradient orientations in a dense grid and it was first proposed by Dalal [6] for human classi- fication. System developed using a linear SVM classifier trained on HOG features was able to spot vehicles in different traffic scenarios [9].

SVM classifier trained on Boosted Gabor features with parameters (orientation and scale) selected from learning had reported 96% detection rate [12].

3. PROPOSED METHOD

3.1 Algorithm overview

Fig. 1. Overview of the proposed framework.

belongs to the road or obstacle. In this process, we extract the obstacle candidates in the selected ROI.

Next, at the stage of HV, the obstacle candidate regions are verified by employing HOG and SVM as a feature and classifying function, respectively.

To achieve this, we trained SVM classifier with HOG features of KITTI training dataset. We then extract HOG features of the obtained obstacle can- didates and classify the vehicles by the trained SVM classifier. Each step of the proposed frame- work is described in the following subsections.

3.2 Selection of a region of interest (ROI) using lane detection and tracking

strong constraint in challenging traffic environ- ment.

In order to estimate geometric information of the input stereo pair, we find the best-fit plane for each

(a) (b)

Fig. 2. Scene including low textured road surface. (a)

Original image, (b) its corresponding disparity

map.

superpixel. The best-fit plane is obtained from a set of points which consists of image coordinates and disparity. Let us denote given stereo pairs



 I →R

and 

 I →R

, and their corresponding disparity map   I →L that assigns each pixel index p to a disparity 

∈L , where I ⊂ N

is a dense dis- crete image domain and L is a discrete candidate.

Our goal is to estimate plane normal vectors N

 I →R

and N

 I →R

. Without loss of general- ity, we concentrate on deriving N

in the following.

Let S

denote the  -th index set of pixels in 

. The plane is defined as follows;

  A

  B

  C

(1)

where the A

, B

, and C

represent the constants of the plane equation in the  -th superpixel, and u, v, and d are the variables of X, Y, and Z coor- dinates, respectively. We then yield the error func- tion 

and ^

, and their corresponding disparity map ^{  I →L} that assigns each pixel index p to a disparity ^

∈L , where ^{I ⊂ N}

is a dense dis- crete image domain and ^L is a discrete candidate.

and ^N

. Without loss of general- ity, we concentrate on deriving ^N

Let ^S

denote the  -th index set of pixels in ^

where the ^A

, ^B

, and ^C

represent the constants of the plane equation in the ^ -th superpixel, and u, v, and d are the variables of X, Y, and Z coor- dinates, respectively. We then yield the error func- tion ^

to fit the best-fit plane in ^S

 ^A

 ^{ }

(2) where ^ ^S

denotes the ^ -th index set of non-error disparity pixels in ^S

. Since equation (2) shows the quadratic form, the optimal values for ^A

 ^A

 ^{ }

 ^

Then, the representative normal vector ^ ^

of the plane in ^S

  ^ ^C

  ⁽⁴⁾

Here we are targeting to determine the region of road for each superpixel by considering the sim- ilarity of ^ ^

central-bottom of input image is included in the road region, we apply the K-means clustering al- gorithm for ^ ^

to decide whether each superpixel belongs to the road or not [15]. Let us denote ^ ^

is clustered into a set of K clusters, where ^

is an index set of the ^ -th cluster. K-means algorithm finds a partition which minimizes the squared error between the mean of a cluster and the normal vec- tors in the cluster. Let ^

be the mean of normal vectors in the ^ -th cluster. The sum of squared er- ror between ^

and the normal vectors in the ^ -th cluster is defined as

  ^

 ^

∥ ^ ^