Construction of Spatiotemporal Big Data Using Environmental Impact Assessment Information

(1)

1. Introduction

An environmental impact assessment (EIA) collects information on development sites pertaining to natural environmental factors, such as air, quality, and soil quality; ecological factors, such as the current state of animals and plants; and socio-economic factors, such as population and housing status. Although previously collected environmental, statistical, and spatial data can be used (Sung et al., 2019; Cho et al., 2017; Song et al., 2015), it is also necessary to collect information

directly via field surveys to acquire detailed data on the target site (Kim et al., 2017; Yoo et al., 2011). Preparing and reviewing environmental impact statements (EIS) necessitate considerable time and cost.

Since an EIS is used as reference material for a specific project, it contains very detailed information about the surrounding area. Accordingly, information on past projects in the same area can be used as reference data when preparing an EIA, although the practical use of such information requires institutional improvement (Cho et al., 2019).

Construction of Spatiotemporal Big Data Using Environmental Impact Assessment Information

Namwook Cho

¹⁾

· Yunjee Kim

²⁾

· Moung-Jin Lee

^3)†

Abstract: In this study, the information from environmental impact statements was converted into spatial data because environmental data from development sites are collected during the environmental impact assessment (EIA) process. Spatiotemporal big data were built from environmental spatial data for each environmental medium for 2,235 development sites during 2007-2018, available from public data portals. Comparing air-quality monitoring stations, 33,863 measurement points were constructed, which is approximately 75 times more measurement points than that 452 in Air Korea’s real-time measurement network. Here, spatiotemporal big data from 2,677,260 EIAs were constructed. In the future, such data might be used not only for EIAs but also for various spatial plans.

Key Words: Environmental Impact Assessment, Spatial Information, Big Data, Data Science

https://doi.org/10.7780/kjrs.2020.36.4.11 ISSN 2287-9307 (Online)

Letter

Received August 13, 2020; Accepted August 18, 2020; Published online August 25, 2020

1)

Invited Research Fellow, Environmental Assessment Group, Korea Environment Institute

2)

Researcher, Environmental Assessment Group, Korea Environment Institute

3)

Research Fellow, Center for Environmental Data Strategy, Korea Environment Institute

†

Corresponding Author: Moung-Jin Lee ([email protected])

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License

(http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in

any medium, provided the original work is properly cited.

(2)

The existing EIA information system is based on the Environmental Impact Assessment Support System, which is an information system operated by the Ministry of the Environment of Korea to collect and disclose information generated during the EIA process.

This system not only discloses information according to administrative procedure, such as the original text of the EIS and its annexes, but also supports the preparation of the EIS by providing basic spatial data necessary for the EIA (Yoo, 2018; Lee et al., 2018).

To establish a system that uses more detailed EIA information, we constructed spatiotemporal big data of spatial and attribute information using the measured values for each environmental medium in the projects and adjacent areas subject to EIA during 2007-2018 (Lee, 2018; Ahn et al., 2013). These big data are based on location, measurements, and other administrative information that the developer must provide in the process of submitting the EIS. Under the current system, when promoting a new project subject to EIA, the relevant data for the area surrounding the target site

can be derived from a literature survey. It is possible to reduce the time and cost of the EIA process and improve the quality of the assessment by allowing the use of information obtained in other projects. This paper also introduces and discusses methods that can be used in fields such as spatial planning (Kim et al., 2016) and existing EIA.

2. Method of EIA spatial big data construction

EIA spatiotemporal big data were constructed from 2,235 EIA projects and 541 EIA follow-up reports from the original EIS texts collected from the Environmental Impact Assessment Support System from 2007 to 2018.

The data construction process is divided into three steps.

First, to extract attribute information, the environmental quality measurement information included in the EIS is classified and standardized for each environmental medium, item, and substance and constructed as data.

Fig. 1. Flow Chart of data processing.

Collecting raw data (EIA Statement)

Data extraction using OCR

Does the location

information exist? Does the coordinate information exist?

Yes

No No

Geocode the address

Construct

Measurement data Construct

Location data Open Database

(via Public Data Portal)

(3)

To this end, the original EIS data in pdf files are converted using optical character recognition (OCR).

Then, to extract spatial information, the standard spatial big data is defined and geocoded through standardization of the coordinates or address data extracted in the first step. Finally, the big data are stored as an open database (DB) to facilitate processing and utilization. The attribute and spatial information extracted in this process is integrated, refined into an open DB, and stored in a form that can be used in conjunction with OpenAPI and CSV format (Ahn et al., 2009). These processes are summarized in Fig. 1.

1) Extraction of attribute information To extract attribute information from an EIS, first identify the form of the attribute information. When the entire attribute information is composed of text layers, the original pdf file is defined as Text PDF, and the text is extracted by OCR. However, when the attribute

information includes only some or no text layers, the text is extracted using options such as alphabet (letter) + number or Korean + Chinese. The text extracted by dividing it into two categories is constructed as an attribute DB via verification processes, such as data- attribute value, address typo, and null value checks (Table 1).

2) Extraction of spatial information

First, the data must be cleaned to extract accurate spatial information. This proceeds in the following order: coordinate verification, address verification, address cleaning, and checking the shapefile format. To determine the exact location of spatial information, location data are constructed after verifying the coordinates and checking the accuracy of the addresses written in the original text. Coordinate verification is a process of checking whether the coordinate system is correct, such as longitude and latitude or transverse Table 1. Example of Environmental quality measurement data in EIS (Cheongju Ochang Technopolis General Industrial

Complex Development)

Classification Measurement item

PM-10

(µg/m

³

) PM-2.5

(µg/m

³

) SO

2

(ppm) NO

2

(ppm) CO

(ppm) O

3

(ppm) Pb

(µg/m

³

) Benzene A-1

1st 34 – 0.005 0.009 0.2 0.011 N/D* N/D

2nd 33 – 0.002 0.010 0.1 0.013 N/D N/D

3rd 35 16 0.003 0.006 0.2 0.027 N/D N/D

A-2

1st 32 – 0.004 0.010 0.2 0.011 N/D N/D

2nd 33 – 0.002 0.011 0.1 0.014 N/D N/D

3rd 34 16 0.003 0.004 0.1 0.026 N/D N/D

A-n

1st … … … … … … … …

2nd … … … … … … … …

3rd … … … … … … … …

* N/D : Non-Detection

Table 2. Example of measuring location information in EIS (Cheongju Ochang Technopolis General Industrial Complex Development)

Investigation

spot Address Coordinates (TM)

X Y

A-1 535, Hugi-gil, Ochang-eup, Cheongwon-gu, Cheongju-si, Chungcheongbuk-do 232332 361855 A-2 77-6, Ochanggajwa 4-gil, Ochang-eup, Cheongwon-gu, Cheongju-si, Chungcheongbuk-do 233672 360038

A-n Address of spot A-n … …

(4)

Mercator. This is the most important process when amassing spatial information because geocoding cannot be performed when there are coordinate errors. When transverse Mercator coordinates are used, it is necessary to identify the origin point and check the map index information. After coordinate verification, standardized spatial information is constructed by unifying the coordinate system of all spatial data (Table 2).

3. Results of constructing EIA spatial big data

After building the big data, 2,677,260 projects were included in a DB by integrating attribute and spatial information for 15 assessment items, such as air quality.

The result was deemed national key data and made available as an open DB through public data portals (https://www.data.go.kr/).

Fig. 2. (a) Development Project Area and (b) Air Quality Mesurement Point in EIA Spatial Big Data, (c) Case study of 「Cheongju Ochang Technopolis General Industrial Complex Development」 Project.

Development Project Area Air Quality Measurement Point Local 2nd class river

Dongnam-gu, Cheonan-si

Jincheon-gun

Cheongwon-gu, Cheongju-si

Heungdeok-gu, Cheongju-si

0 0.75 1.5 3 Kilometers 0 35 70 140 Kilometers

(a) (b)

(c)

(5)

Open DB is provided in OpenAPI or csv format and can be mapped as shown in Fig. 2. Details of the EIA spatial big data are shown in Table 3.

The EIA spatial big data built here can be characterized as follows. First, the measurement outcome for each environmental medium in the EIS includes the address and coordinate data of the measurement location. Since spatial information can be created and used based on this, continuous updates can be made as EIA projects are implemented. Second, the big data contain detailed measurements for a specific area; the environmental quality measured during the EIA of a project covers the project site and surrounding areas. For example, in terms of the air environment in Fig. 2(c), the air quality data provided by the national monitoring network, i.e., the Air Korea real-time measurement network, has been measured at 452 locations as of August 2020 (https://www.airkorea.

or.kr/), while the time-series EIA DB currently provides data measured at 33,863 locations and contains denser spatiotemporal data. Third, the big data contain the outcomes of various environmental quality measurements at a measurement point. Since the existing environmental

spatial information is established separately according to the environmental medium, much time and cost were involved in collecting and pre-processing the data before using the information due to differences in the resolution and precision of the data and spatial information standards. In comparison, the EIA spatial big data include data for various media for the same region; when analyzing various environmental quality measurements, detailed data for each environmental substance can be obtained (Lee, 2018). This has the advantage of allowing insight via preliminary predictions of the EIA and saving additional measurement costs (Cho et al., 2019).

4. Conclusion and Discussion

This study examined the advantages of spatiotemporal accumulation of environmental information recorded in environmental impact assessments and used it to construct spatiotemporal big data. Over the past 12 years, big data from 2,677,260 EIAs, including 160,663 on air quality, 163,338 on water quality, and 73,685 on Table 3. Example of EIA spatial Big Data

Category EIA spatial big data

Item Contents Volume

Atmospheric

Air Quality 139 types including fine dust 160,663

Foul Odor 69 types including compound malodor 18,857

Greenhouse Gas 13 types including Emissions during construction 894 Water

Water Quality (Surface or ground water) 114 types including pesticides 163,338 Hydrology and Hydraulicity 31 types including total length 45,913 Marine Environment 84 types including chemical oxygen demand 175,967 Land

Land Use 92 types including mine location 297,067

Soil 39 types including development restricted area 15,714

Topography and Geology 44 types including cadmium 73,685

Natural Ecology Flora and Fauna 98 types including Scientific name 1,487,986 Life

Noise

22 types including noise 197,135 Vibration

Hygiene and Public Health 25 types including formaldehyde 20,885

Green Resources Cycle 18 types including household waste 3,952

Socio-economic Population and Residence 20 types including population 15,204

(6)

soil quality, have been established. In the construction process, data in the form of existing books were extracted using OCR and implemented as spatial information based on coordinates. The results were stored as an open DB to increase the data usability.

An EIA requires efficient analysis of accumulated environmental impacts and damage. This process can use the EIA spatial big data established here.

Consequently, there are more data available than the measurement information for each environmental medium provided by the public sector. To develop this, it is necessary to discover cases via application of the actual EIAs and institutional supplements to increase the usability of the data for EIA.

Acknowledgements

This research was conducted at Korea Environment Institute (KEI) with support from Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1B07041203), and R&D Program of Responding Technology to Climate Disaster such as Heat wave (20010017) funded by the Ministry of Interior and Safety (MOIS, Korea).

References

Ahn, J.S., H.T. Kim, H.W. Kim, and Y.H. Lim, 2009.

A Study on the Implementation Method for Distributing Public Sector Real Estate Information based on OpenAPI using FOS GIS, The Geographical Journal of Korea, 43(2):

173-185 (in Korean with English abstract).

Ahn, J.W., M.S. Lee, and D.B. Shin, 2013. Study for Spatial Big Data Concept and System Building, Spatial Information Research, 21(5): 43-51 (in Korean with English abstract).

Cho, N.W., M.J. Lee, and J.G. Choi, 2019. Evaluation and Improvement of EIA Information Disclosure System – Focused on the Aarhus Convention –, Journal of Environmental Impact Assessment, 35(4): 400-412 (in Korean with English abstract).

Cho, N.W., J.H. Maeng, and M.J. Lee, 2017. Use of Environmental Geospatial Information to Support Environmental Impact Assessment Follow-Up Management, Korean Journal of Remote Sensing, 33(5): 799-807 (in Korean with English abstract).

Kim, G.H., C.M. Jun, H.C. Jung, and J.H. Yoon, 2016. Providing Service Model Based on Concept and Requirements of Spatial Big Data, Journal of the Korean Society for Geospatial Information Science, 24(4): 89-96 (in Korean with English abstract).

Kim, H.J., S.H. Han, S.J. Kim, H.M. Yun, S.C. Jun, and Y. Son, 2017. Spatio-Temporal Monitoring of Soil CO

2

Fluxes and Concentrations after Artificial CO

2

Release, Journal of Environmental Impact Assessment, 26(2): 93-104 (in Korean with English abstract).

Lee, M.J., 2018. Opening of environmental assessment monitoring DB for providing environmental information, Ministry of the Interior and Safety, Sejong, Korea.

Lee, M.J., J.H. Maeng, Y.J. Lee, J.H. Yoon, J.H. Lee, S.M. Lee, and N.W. Cho, 2018. Establishment of Spatial Information Application System for Advanced Environmental Impact Assessment, Korea Environment Institute, Sejong, Korea.

Song, D.H., J.W. Ryu, and E.H. Jung, 2015. A Study on Application of Open Platform of Spatial Information for Improvement of Environment Impact Assessment Supporting System, Journal of the Korean Association of Geographic Information Studies, 18(1): 105-119 (in Korean with English abstract).

Sung, H.C., Y.Y. Zhu, and S.W. Jeon, 2019. Study on

(7)

Application Plan of Forest Spatial Information Based on Unmanned Aerial Vehicle to Improve Environmental Impact Assessment, Journal of the Korean Society of Environmental Restoration Technology, 22(6): 63-76 (in Korean with English abstract).

Yoo, H.S., 2018. Operation of Environmental impact assessment support system 2018, Ministry of