• 검색 결과가 없습니다.

Methodology of Spatio-temporal Matching for Constructing an Analysis Database Based on Different Types of Public Data

N/A
N/A
Protected

Academic year: 2021

Share "Methodology of Spatio-temporal Matching for Constructing an Analysis Database Based on Different Types of Public Data"

Copied!
10
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

https://doi.org/10.7848/ksgpc.2017.35.2.81

Methodology of Spatio-temporal Matching for Constructing an Analysis Database Based on Different Types of Public Data

Jung, In taek

1)

ㆍChong, Kyu soo

2)

Abstract

This study aimed to construct an integrated database using the same spatio-temporal unit by employing various public-data types with different real-time information provision cycles and spatial units. Towards this end, three temporal interpolation methods (piecewise constant interpolation, linear interpolation, nonlinear interpolation) and a spatial matching method by district boundaries was proposed. The case study revealed that the linear interpolation is an excellent method, and the spatial matching method also showed good results. It is hoped that various prediction models and data analysis methods will be developed in the future using different types of data in the analysis database.

Keywords : Public Data, Analysis Database, Spatio-Temporal Matching, Interpolation

Original article

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://

Received 2016. 11. 28, Revised 2016. 12. 20, Accepted 2017. 04. 30

1) Member, Korea Institute of Civil Engineering and Building Technology (E-mail: [email protected])

2) Corresponding Author, Member, Korea Institute of Civil Engineering and Building Technology (E-mail: [email protected])

1. Introduction

In accordance with the Government 3.0 policy declared in 2013, all public data have been opened to the private sectors in various fields, and have been actively utilized until now.

According to NISA (National Information Society Agency), the number of open data increased from 5,272 in 2013 to 15,912 in 2015. The utilization of open data also increased from 13,923 in 2013 to 783,773 in 2015. This trend means that the utilization of open data will continuously increase for many years to come.

A data analysis platform for utilizing open data was developed by KICT (Korea Institute of Civil Engineering and Building Technology) in 2015. This platform aims to provide real-time and prediction information about road weather and traffic conditions using individual vehicle sensing data and different types of public data related to the weather observation and road transport fields. The public data in this platform are collected and stored in real time using the Open

API (Application Program Interface) provided by various public portal services. It is difficult, however, to construct an analysis database utilizing predictor variables because the public data being collected by the KICT platform have different temporal cycles and spatial units for information provision. For example, KMA (Korea Meteorological Administration) offers weather data in real time every 1 hour, with the administrative district units. NTIC (National Transport Information Center), on the other hand, offers traffic data in real time every 5 minutes with ITS (Intelligent Transport System) standard links. Due to the difference in the collection type, as shown above, the public data cannot be used as variables for developing prediction models using temporal and spatial data. Thus, there is a need to change the data used from different types of public data to data with the same spatio-temporal unit.

This study aimed to suggest effective methods of spatial

matching and temporal data interpolation for constructing

an analysis database using different types of public data.

(2)

Table 1. Types of public data in the KICT platform

Public institution Public data Provision cycle Spatial unit

NTIS Travel speed

5 minutes ITS standard link

Travel time

KMA

Temperature

1 hour, 3 hours Administrative district Humidity

Precipitation Wind direction

Wind speed UTIS(Urban Traffic

Information System) Incident/Accident 5 minutes ITS standard link

Through the construction of an analysis database, it is expected that various studies related to driving environment analysis can be carried out.

2. Literature Review

This section explains the KICT platform in detail and reviews various interpolation methods for matching the same spatio-temporal unit applied to the previous literatures.

KICT is currently developing a platform system for predicting driving environments using the sensing data observed from individual vehicles and different types of public data. Public data can be fused and analyzed with vehicle sensor data using the web-based analysis tool in the KICT platform. Public data are collected in real time based on the Open API offered by the public sector. These data include traffic, weather, and incident data, as shown in Table 1. In the future, driving environment data (road freezing, bad weather, traffic jam, incident, etc.) are going to be offered in three information types (real-time/prediction/historical) with a more segmentalized spatio-temporal unit (Ha and Chong, 2016).

Interpolation methods can be classified into three kinds:

temporal interpolation, spatial interpolation, and spatio- temporal interpolation. These methods are classified according to the kind of data to be used, and are applied to input the missing values of each data.

Temporal interpolation is used to input the missing values of the time series data at a specific point, without considering

the spatial data. It involves averaging the values among the points using the given discontinuous data points in the time series data. There are basically three kinds of temporal interpolation methods: piecewise constant interpolation, linear interpolation, and nonlinear interpolation (polynomial interpolation, spline interpolation, etc.).

Spatial interpolation is used to estimate the values of an

unknown point using the given actual values nearby in a

specific space. The spatial interpolation techniques that were

analyzed in this study include the commonly used Thiessen

polygon, the classical polynomial interpolation using the least-

square or Lagrange approach, multiquadric interpolation,

and the Kriging technique (Guillermo and Jose, 1985). This

method is used to evaluate physical data in a continuous

domain in many fields. The many different techniques offer

different performances according to the characteristics of the

initial data points. In this study, four different methods were

applied to several test cases, respectively: the inverse square

distance method, the Kriging method, Hardy’s multiquadric

method, and the tension finite-difference method (Caruso and

Quarta, 1998). Many hydraulic model interpolation methods,

such as inverse distance weighting, Kriging, and k-nearest

neighbor (KNN), were carried out to compare the results

for the computer to display the actual terrain in the optimal

interpolation of the digital elevation model through close-

range digital photogrammetry (Choi, 2005). The prediction

errors of the various spatial interpolation methods that were

used to model the values at unmeasured locations were

compared, and the accuracy of the predictions was evaluated.

(3)

Fig 1. Flow chart of this study The RMS (Root Mean Square) was calculated by processing

the different parameters associated with spatial interpolation using techniques like inverse distance weighting, Kriging, local polynomial interpolation, and radial basis function to determine the elevation data of the eastern coastal area under the same condition (Lee, 2010). The noise decrease was calculated according to the distance from the road by applying the ArcGIS interpolation method in terms of the noise level by the lot of land within the residential area (Eo and Yoo, 2011). Spatial analysis was conducted on fine dust with a less than 10㎛ in diameter in Seoul by applying the inverse distance weighting method to interpolate the point-based fine dust (PM 10) observation value with the administrative district unit (Jeong, 2014).

Finally, spatio-temporal interpolation is an interpolation method that considers the temporal elements in spatial interpolation. GIS (Geographic Information System) applications often require the spatio-temporal interpolation of an input dataset, and spatio-temporal interpolation requires the estimation of the unknown values at unsampled location- time pairs with a satisfactory level of accuracy. Using an actual real estate dataset with house prices, these methods were compared with other spatio-temporal interpolation methods based on inverse distance weighting and Kriging (Li and Revesz, 2004).

The three kinds of interpolation methods that were applied by the previous literatures were reviewed, and it was found that various interpolation methods are applied depending on the type of given data. This study proposes a spatio-temporal matching methodology for constructing an analysis database

using various types of public data. In the case of temporal matching, there is the problem of having to convert the data collected every 1 hour to 5-minute data. This study was necessary to compare all the three existing kinds of temporal interpolation methods. In the case of spatial matching, the problem of converting the spatial data of regional units to the spatial data of the ITS standard link was dealt with.

This means that the link data across regional boundaries have a problem in terms of applying the regional spatial data. Therefore, it is difficult to apply the existing spatial interpolation methods, and another spatial matching method was needed for this study.

3. Methodology

3.1 Concept of spatio-temporal matching

This study dealt with the problem of converting macroscopic

public data to microscopic public data. As shown in Table 1, the

problem concerns the conversion of the spatio-temporal unit of

KMA’s data to the spatio-temporal unit of NTIS’s data, which

is the minimum unit of public data. This problem can be solved

using spatio-temporal matching. This method can be divided

into temporal interpolation and spatial matching. Temporal

interpolation is a method of interpolating the real-time weather

data collected every hour with the 5-minute data. There are

three methods of temporal interpolation (piecewise constant

interpolation, linear interpolation, and nonlinear interpolation)

based on the results of the previous literature review. In this

study, the best interpolation method was selected by applying

all the three interpolation methods. Spatial matching is a

(4)

Piecewise constant interpolation is a method of equally applying the weather data of the current cycle every 5 minutes until the weather data of the next cycle is updated.

This method assumes that there is no change in the 5-minute intervals of weather data every hour.

(1)

where, : weather data at time t (every 1 hour) in district intervals (every 5 minutes)

Linear interpolation is a method of interpolating the weather data of the current time and the weather data of the previous time with 5-minute data, using a linear equation.

The current time means the update time of the public data, and the previous time means 1 hour before the current time.

As shown in Fig. 2, the given current time in district is, and the previous time is, the weather data of is

and the weather data of is . The estimated value of time between the two observation points

can be calculated using Eq. (2).

(2)

Finally, cubic spline interpolation was applied as the nonlinear interpolation method in this study. This method is the most widely used, and it is an algorithm that smoothly connects the given discrete points. The curves connecting the two points are called "cubic" because a third-order polynomial method of converting the spatial data of regional units to

spatial-data GIS link (ITS standard link) units. If links are included in a regional boundary, the link data are applied to the regional weather data. Otherwise, the link data are applied to the weighted average of the regional weather data through which the link passes. By applying the methodology of this study, various types of public data were finally constructed as an integrated analysis database in 5-minute intervals and ITS standard links. The data from this database will be used as predictive variables for the development of driving environment prediction models using various public data. The flowchart of this study is shown in Fig. 1.

3.2 Building input data

The input data in this study consisted of public data based on Open API and GIS data. As shown in Table 1, "public data" refer to the eight public data collected in real time from three public institutions. It can be seen that the spatio- temporal collection unit of each public data is different. These public data should be converted to the same collection unit to construct an analysis database. Therefore, the input data that were used in this study were the weather data collected from KMA because it was necessary to convert these to the spatio- temporal collection unit of NTIS among the public data.

In the case of GIS data, the ITS standard node and link provided by MOLIT (Ministry of Land, Infrastructure, and Transport) were used. These GIS data consisted of the node information, rotation information, link information, and link additional information.

3.3 Temporal interpolation method

The public data that were used in this study were collected in different types of information provision cycles. It was necessary to convert those into the same storage unit to construct an analysis database. Thus, the weather data collected every hour in real time had to be interpolated with the 5-minute data. As mentioned earlier, there are three kinds of temporal interpolation methods: piecewise constant interpolation, linear interpolation, and nonlinear interpolation. After the application of all these three methods, the best method was identified through a case study, and was adopted.

Fig. 2. Concept of linear interpolation

(5)

is used as Eq. (3). Cubic spline interpolation minimizes the variation and can provide an excellent approximation from rapidly changing functions.

(3)

4 unknown values must be determined for k(=n+1) data points (k=0,1,2,..., n). A third-order spline equation (n=3) that requires four unknown values was derived from Eq. (4) and Eq. (5) (Cheney and Kincaid, 2008).

The n-1 second-derivative values were calculated using Eq.

(5). The n+1 unknown values were calculated by applying the condition that the second derivative is 0 on both sides. As shown in Fig. 3, f

1

(x), f

2

(x), and f

3

(x) can be calculated using Eq. (4).

3.4 Spatial matching method

The spatial unit for constructing an analysis database is the ITS standard link. The weather data for each administrative district should be converted to the weather data for each link.

Therefore, this study adopted a space matching method using administrative district boundaries for this purpose. This method can be applied in two ways, depending on whether

the link is within the administrative district or is not. If the link is completely within the administrative district, the link data applies the weather data of the administrative district.

Otherwise, it means that the link is passing through more than one administrative district. In this case, the link data is determined by calculating the weighted average of the weather data of each administrative district using the link length included in each administrative district, using Eq. (6).

(6)

where, : Link weather data, : weather data of the nth administrative district, : the nth link length divided

As shown in Fig. 4, links 5 and 6 belong to district A, and link 3 belongs to district B. Links 5 and 6 belong to district C. Therefore, the weather data of links 5 and 6 become the weather data of district A, and the weather data of link 3 becomes the weather data of district B. The weather data of links 5 and 6 become the weather data of district C. In the case of links 2, 4,and 7, each link is not completely contained within the administrative districts. The weather data of these links are calculated using Eq. (6). For example, links 2 and 4 are passing through two districts, and link 7 is passing through three districts. These links have link lengths [Link 2 , Link 4 , Link 7 ] that are divided from each district boundary. Thus, the weather data of these links are calculated as a weighted average using the link lengths.

Fig. 3. Concept of cubic spline interpolation

Fig. 4. Concept of spatial matching (4)

(5)

(6)

4. Application and Evaluation

Some road sections of Jayouro (Seongdong IC~Isanpo IC) and Seoul Beltway (Ilsan IC~Tongilro IC) were set up around the cities of Goyang and Paju in the Republic of Korea as the case study sections for the application of this study’s methodology. These road sections are the same as the test section for the current KICT platform development. The GIS data in this study were based on the ITS standard link. Jayouro has 44 links, and Seoul Beltway has 14 links in total. As shown in Table 1, the public data were based on the real-time weather data provided by KMA among the collected public data. These data were the weather data of the administrative district unit, which are collected in real time every 3 hours using Open API. The observation data by administrative district available offline were received from KMA (Paju meteorological observatory) between October 1 and 3, 2016, and these data were used for validation and evaluation. It was necessary to convert these to 5-minute data and to an ITS standard link unit for constructing an analysis database. As shown in Fig. 1, the methodology of this study was applied to the spatio-temporal

matching of the weather data.

In the case of temporal interpolation, the three methods proposed in this study were applied. The method of minimizing the evaluation index was selected as the optimal method. The MAPE (Mean Absolute Percent Error, %), which is the error rate of the observed and estimated values, was used as the evaluation index. For the weather data for evaluation, air temperature (℃) and humidity (%) were used.

The estimated 5-minute data using the three methods were compared with the observation data. As shown in Table 2, the best method was determined to be the linear interpolation, but MAPE was calculated as 1.14~2.82%, and overall error rate is low and there was almost no difference among the two methods except for the piecewise constant interpolation. As shown in Fig. 5 and 6, this is because the temperature and humidity fluctuations at intervals of 5 minutes are not large.

Therefore, it seems that there is no big problem in selecting either of the two methods. But the linear interpolation is more effective than the cubic spline interpolation in terms of the computation speed to be applied to the system.

MAPE(%)

Piecewise constant interpolation Linear interpolation Cubic spline interpolation

Temperature(℃) 2.46 1.14 1.49

Humidity(%) 2.82 1.58 1.79

Table 2. Comparison of error rate

Fig. 5. Comparison of temperature time-series plots (OCT 1, 2016)

(7)

As shown in Table 3, an analysis database was constructed in this study for the development of the future driving environment prediction model. The temporal unit of this database is 5 minutes, and the spatial unit is the ITS standard link. To apply the proposed temporal interpolation method, the most excellent cubic spline interpolation should be applied, but the analysis database was constructed by applying the linear interpolation method considering the

efficient algorithm implementation in the KICT platform.

For the spatial data matching, matching of the weather data of four administrative district units was conducted with each link. The matching of the links included in each administrative district was carried out with the weather data of the administrative district concerned, and the matching of the links at the administrative district boundary was calculated as presented in Fig. 7. As spatial matching was Fig. 6. Comparison of humidity time-series plots (OCT 1, 2016)

Table 3. Results of the analysis database construction Collection

time Link ID Travel

Speed (kph)

Travel time (sec)

Temperature

(℃) Precipitation

(mm) Humidity (%)

Wind direction (degrees)

Wind speed

(m/s) Incident (count)

2016-09-01 0:05 2180005300 80 48 17.1 0 93.4 197 0 0 …

2016-09-01 0:10 2180005300 79 48 17.2 0 93.0 146.6 0 0 …

2016-09-01 0:15 2180005300 80 48 17.0 0 92.9 291 0 0 …

2016-09-01 0:20 2180005300 78 49 16.6 0 93.2 232.6 0 0 …

2016-09-01 0:25 2180005300 80 48 16.4 0 93.2 178.4 0 0 …

2016-09-01 0:30 2180005300 81 47 16.2 0 92.9 173.5 0.4 0 …

2016-09-01 0:35 2180005300 81 47 16.3 0 93.1 180.6 0 0 …

2016-09-01 0:40 2180005300 79 48 16.4 0 93.5 171.5 0.2 0 …

2016-09-01 0:45 2180005300 78 49 16.2 0 94.0 193.5 0.3 0 …

2016-09-01 0:50 2180005300 78 49 16.0 0 94.2 205.3 0.1 0 …

2016-09-01 0:55 2180005300 80 48 16.0 0 94.4 237.3 0 0 …

2016-09-01 1:00 2180005300 80 48 16.0 0 94.5 275.4 0.4 0 …

2016-09-01 1:05 2180005300 82 47 15.9 0 94.6 270.4 0.8 0 …

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ …

(8)

carried out for the case study in this work considering only some road sections, accuracy evaluation was not required because the accuracy could be intuitively judged. When the spatial range is extended nationwide, however, a systematic evaluation method should be established.

5. Conclusion

This study suggested a spatio-temporal matching method for constructing an analysis database using different types of public data. This method consists of temporal interpolation and spatial matching. Three methods (piecewise constant interpolation, linear interpolation, and nonlinear interpolation) were proposed for temporal interpolation.

Cubic spline interpolation was applied for the nonlinear interpolation. A space matching method based on the administrative district boundaries was also applied because the weather data for each administrative district should be converted to the weather data for each link.

As a result of case analysis and evaluation, the linear interpolation was the best among the three methods.

Especially, There was almost no difference among the two methods except for the piecewise constant interpolation.

It seems that there is no big problem in selecting either of the two methods. But the inear interpolation is more effective than the cubic spline interpolation in terms of the

computation speed to be applied to the system. The spatial matching method was also excellent.

It was judged that a driving environment prediction model could be developed using the integrated database consisting of various types of public data that was constructed in this study. Besides, nationwide spatial-range expansion is needed, and a comparative analysis will be required by applying various spatio-temporal data interpolation methods in addition to the methods suggested in this study. Finally, further study is needed to verify the feasibility of the spatial matching method after the future national network expansion.

Acknowledgments

This research was supported by a grant from a Strategic Research Project (Development of Driving Environment Observation, Prediction and Safety Technology Based on Automotive Sensors) funded by the Korea Institute of Civil Engineering and Building Technology.

References

Caruso, C. and Quarta, F. (1998), Interpolation methods comparison, Computers & Mathematics with Applications, Vol. 35, No. 12, pp. 109-126.

Cheney, W. and Kincaid, D. (2008), Numerical Mathematics and Computing 6th Edition, Thomson Brooks/Cole, a part of The Thomson Corporation.

Choi, H. (2005), Evaluation of the optimum interpolation for creating hydraulic model form close range digital photogrammetry, Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, Vol. 23, No. 3, pp. 251-260. (in Korean with English abstract)

Eo, J. and Yoo, H. (2011), Noise mapping of residential areas by estimating urban traffic noise, Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, Vol. 29, No. 3, pp. 229-235. (in Korean with English abstract)

Guillermo, Q. and Jose, D. (1985), A comparative analysis

of techniques for spatial interpolation of precipitation,

the Water Resources Bulletin American Water Resources

Fig. 7. Result of spatial matching

(9)

Association, Vol. 21, No. 3, pp. 365-380.

Ha, J. and Chong, K. (2016), Development of a road big-data storage platform for predicting the driving environment, Journal of Emerging Trends in Computing and Information Sciences, Vol. 7, No. 1, pp. 14-22.

Jeong, J. (2014), A spatial distribution analysis and time- series change of PM10 in Seoul city, Journal of the Korean Association of Geographic Information Studies, Vol. 17, No. 1, pp. 61-69. (in Korean with English abstract) Lee, H. (2010), Comparison and evaluation of root mean

square for parameter settings of spatial interpolation method, Journal of the Korean Association of Geographic Information Studies, Vol. 13, No. 3, pp. 29-41. (in Korean with English abstract)

Li, L. and Revesz, P. (2004), Interpolation methods for spatio-

temporal geographic data, Computers, Environment and

Urban Systems, Vol. 28, No. 3, pp. 201-227.

(10)

수치

Table 1. Types of public data in the KICT platform
Fig 1. Flow chart of this studyThe RMS (Root Mean Square) was calculated by processing
Fig. 2. Concept of linear interpolation
Fig. 4. Concept of spatial matching (4)
+3

참조

관련 문서