1. Introduction
The analysis of land cover is a key in land-use plans, and land cover is closely connected with various human/physical phenomena. The study and interpretation of land covers requires much detail for the
understanding of the underlying processes. However, it is difficult to get accurate data about land cover features due to both the necessary manual activity and statistical analysis of large data sets. Nevertheless, it becomes a common knowledge that land cover classification can be greatly facilitated by the use of remotely sensed data
Integration of Multi-spectral Remote Sensing Images and GIS Thematic Data for
Supervised Land Cover Classification
Dong-Ho Jang* and Chung, Chang-Jo F**
National Research Laboratory (Harmful Algal Blooming Control), Kongju National University*
Geological Survey of Canada**
Abstract : Nowadays, interests in land cover classification using not only multi-sensor images but also thematic GIS information are increasing. Often, although useful GIS information for the classification is available, the traditional MLE (maximum likelihood estimation techniques) does not allow us to use the information, due to the fact that it cannot handle the GIS data properly. This paper propose two extended MLE algorithms that can integrate both remote sensing images and GIS thematic data for land- cover classification. They include modified MLE and Bayesian predictive likelihood estimation technique (BPLE) techniques that can handle both categorical GIS thematic data and remote sensing images in an integrated manner. The proposed algorithms were evaluated through supervised land-cover classification with Landsat ETM+ images and an existing land-use map in the Gongju area, Korea. As a result, the proposed method showed considerable improvements in classification accuracy, when compared with other multi-spectral classification techniques. The integration of remote sensing images and the land-use map showed that overall accuracy indicated an improvement in classification accuracy of 10.8% when using MLE, and 9.6% for the BPLE. The case study also showed that the proposed algorithms enable the extraction of the area with land-cover change. In conclusion, land cover classification results produced through the integration of various GIS spatial data and multi-spectral images, will be useful to involve complementary data to make more accurate decisions.
Key Words : Land Cover Classification, Maximum Likelihood Estimation, Bayesian Predictive Likelihood Estimation, Classification Accuracy.
Received 24 August 2004; Accepted 16 September 2004.
owing to their wide area coverage, synchronism, periodicity and economical efficiency.
To perform land cover classification, the integration of data from multi-sources data including remote sensing images and GIS thematic information can be utilized to reduce the classification error obtained by single-source classification. Conventional parametric classification methods require that the multi-spectral data be described by a common statistical model. Such the models cannot be easily established for combining different data types, e.g., spectral data from remote sensing image and categorical data from a GIS. Another problem with the conventional approach is that the different data sources might not be equally reliable. The development of an appropriate model for the classification of data from multi-sources and different numeric mode (e.g., a mixture of continuous and categorical data) is essential for land cover classification.
Several new methodologies for multi-source integration in land cover classification have been proposed in the recent years. In a methodological point of view, they include the Bayesian probabilistic approach (Solberg et al., 1996; Tso and Mather, 1999;
Warrender and Augusteijn, 1999), the neural networks (Serpico et al., 1996; Benediktsson and Sveinsson, 1997; Dai and Khorram, 1999), fuzzy sets (Solaiman et al., 1999) and evidence theory (Peddle, 1995; Peddle and Ferguson, 2002; Franklin et al., 2002). These studies include land cover classification: the development of new land cover classification techniques through the integration of multi-sensor/source remote sensing images, and the accuracy verification of these techniques (Solberg, 1999).
Only a limited set of studies have involved contextual multi-source classification. Richards et al. (1982) extended the methods used for spatial contextual classification based on probabilistic relaxation to incorporate ancillary data. Binaghi et al. (1997) presented a knowledge-based framework for con-textual
classification based on fuzzy set theory. Wan and Fraser (1994) used multiple self-organizing maps for contextual classification. Le H’ egarat-Mascle et al.
(1997) combined the use of a Markov random field model with Dempster-Shafer theory. Smits and Dellepiane (1997) used a multichannel image segmentation method based on Markov random fields with adaptive neighborhoods.
In relation to the use of categorical data in the classification of remote sensing data, Solberg et al.
(1996) proposed a Markov Random Field model for the classification of multisource data including Landsat TM images, ERS-1 SAR images, and categorical GIS ground cover data. Solberg (1999) also proposed a methodology for forest map revision by using a Markov Random Field Model, an existing forest map and remote sensing data.
Also, recent progress in data acquisition technology has enabled the simultaneous use of high-resolution remote sensing images and GIS-based thematic maps.
These all techniques, in tern, require a new analytical method that is capable of synchronous analysis of GIS multi-source data.
In this paper, we have extended the Maximum Likelihood Estimation technique (MLE) to accommodate both remote sensing imageries and GIS data for land cover classification. MLE is based on the assumption that the distribution function of remote sensing images of each land cover class is normal distribution with known parameters, the mean vector and the variance-covariance matrix (Press, 1972).
However, in practice, although the normality
assumption of the distribution functions is reasonable, it
is not possible to assume that we know the mean vectors
and the variance-covariance matrices of the distribution
functions. In reality, these parameters are estimated from
the input database. When the parameters are estimated
from the data, theoretically MLE is not a right procedure
(Press, 1972; McLallen, 1990) and the proper procedure
is Bayesian predictive likelihood estimation technique (BPLE). If there are small samples, it is known that BPLE shows better performance than that from MLE.
While, for large samples, the performance between BPLE and MLE is similar.
We also have extended BPLE to handle both remote sensing images and GIS categorical data. The accuracy of this classification integration technique was verified by comparisons made with actual-measured validation data. The proposed procedure was evaluated using Landsat ETM+ images and an existing land-use maps, in the Gongju-si, in Korea.
2. Methodologies
1) Basic idea
Consider r land cover types to be constructed using a classification method in a study area. We will use k layers of categorized data layers and h layers of remotely sensed image data layers for the classification.
We also assume that we have sufficient training data which is available for each land cover type in the study area. For a pixel p, let (x
1, …, x
k, y
1, …, y
h) denote pixels values at p where the first k values, x
1, …, x
kcorrespond to the categorical data layers and subsequent h values, y
1, …, y
hrepresent the continuous data layers.
P{M
i| x
1, …, x
k, y
1, …, y
h}, (i = 1, …, r) (1) As usual, let denote the conditional probability that the pixel p belongs to the i
thclass, M
igiven that (x
1, …, x
k, y
1, …, y
h) are pixels values at p. Often, the probability in (1) is also called “posterior” probability meaning that it is the probability that p belongs to M
iafter observing k+h pixel values, (x
1, …, x
k, y
1, …, y
h) at p.
If we know how to compute P{M
i| x
1, …, x
k, y
1, …, y
h} for every class i (i = 1, …, r) for a given pixel p with (x
1, …, x
k, y
1, …, y
h), then the classification problem becomes very simple, in that, we assign the pixel p to
the class g, if P{M
g| x
1, …, x
k, y
1, …, y
h} is the largest among all P{M
i| x
1, …, x
k, y
1, …, y
h} (i = 1, …, r).
Using Bayes’ theorem (Richards, 1995), we express the probability in (1) in the following form:
P{M
i| x
1, …, x
k, y
1, …, y
h} =
, (2)
where P{M
i} is the “prior” probability that the pixel p belongs to the i
thclass, M
ibefore observing (x
1, …, x
k, y
1, …, y
h) at p and it contrasts to the posterior probability in (1) and
P{x
1, …, x
k, y
1, …, y
h} =
P{x
1, …, x
k, y
1, …, y
h| M
i}P{M
i}. (3)
There are several ways to estimate P{M
i| x
1, …, x
k, y
1, …, y
h}, (i = 1, …, r) from the training data. Instead of estimating P{M
i| x
1, …, x
k, y
1, …, y
h} directly in (1), we can try to obtain it by estimating, equivalently, P{x
1, …, x
k, y
1, …, y
h| M
i} and P{M
i} from the training data by using (2). Consider P{x
1, …, x
k, y
1, …, y
h| M
i}, the conditional probability that the pixel p has (x
1, …, x
k, y
1, …, y
h) observations, assuming that the pixel p comes from the i
thclass, M
i. In other words, it is the k+h multivariate frequency distribution function of the pixel values in the M
i. Therefore, to estimate the posterior probability in (1) is equivalent to estimating the frequency distribution functions of the pixel values in r classes using the training data. The frequency distribution function of the class, M
iis usually expressed as f{x
1, …, x
k, y
1, …, y
h| M
i} instead of P{x
1, …, x
k, y
1, …, y
h| M
i}.
2) Separation of categorized data layers and continuous data layers
To handle the multivariate frequency distribution functions of two different types of data layers, we have made the following assumption of conditional independence:
S
r i=1P{x
1, …, x
k, y
1, …, y
h| M
i}P{M
i}
P{x
1, …, x
k, y
1, …, y
h}
f{x
1, …, x
k, y
1, …, y
h| M
i} =
f{x
1, …, x
k| M
i} f{y
1, …, y
h| M
i}, (i = 1, …, r). (4) Under the assumption of (4), the k + h dimensional multivariate frequency distribution function, f{x
1, …, x
k, y
1, …, y
h| M
i} is expressed as a multiple of the k dimensional multivariate discrete distribution function for the categorical data and the h dimensional multivariate continuous distribution function for the continuous data and hence we will estimate the frequency distribution function as a multiple of two distribution functions, one for categorical data layers and the other for continuous data layers.
Under most practical circumstances, the conditional independent assumption in (4) is reasonable except for some of the categorized data layers were directly deducted from the continuous data layers
3) Estimation of frequency distribution function for categorized data layers
Among r frequency distribution functions, consider one f{x
1, …, x
k| M
i} for a pixel p with k pixel values (x
1, …, x
k), each x
jrepresenting a category in the j
thdata layer where the pixel p belongs to M
iclass. f{x
1, …, x
k| M
i} is the m
1x m
2x … x m
kcross-classified contingency table, where mj represents the number of categories in the j
thlayer. The x
1x x
2x … x x
kcell of the k-dimensional table represents the number of M
iclass pixels that belongs to x
1category in the 1
stlayer, x
2category in the 2
ndlayer and x
kcategory in the k
thlayer. Let us denote U
x1…
xkas the unique condition sub-area (Chung et al., 1995; Clerici et al., 2002) with the classes, x
1, …, x
kand the pixels in the sub-area belong to the x
1category in the 1
stlayer, the x
2category in the 2
ndlayer and the x
kcategory in the k
thlayer. # of M
iclass pixels within U
x1…
xkis the x
1x x
2x … x x
kcell of the k-dimensional table.
The first non-parametric estimate of f{x
1, …, x
k| M
i} is the empirical frequency distribution function from the
training data:
˘f{x
1, …, x
k| M
i} =
# of M
iclass pixels within U
x1…
xkfrom the
training data. (5)
However, if we assume that f{x
1, …, x
k| M
i} is the multivariate multinomial frequency distribution functions (Johnson and Kotz, 1972). Then, the maximum likelihood estimate of this multinomial distribution function is the empirical k-dimensional contingency table in (5). Although it is simple to compute, but we strongly suggest NOT using the estimate in (5), when the number of categorized data layers is more than two.
The second procedure to estimate f{x
1, …, x
k| M
i}, (i
= 1, …, r) assumes, as in (4), that the k categorical layers are conditionally independent and hence, we have:
f{x
1, …, x
k| M
i} = f{x
1| M
i} … f{x
k| M
i}, (i = 1, …, r).(6) Instead of considering all the k layers at the same time, we may estimate the distribution function as a multiple of k separate distribution functions. Each separate function is estimated by the empirical frequency distribution function using each single categorical data layer:
˜f{x
j| M
i} = # of M
iclass pixels within x
jcategory
within the j
thlayer (7)
Using (6) and (7) under the conditional independence assumption, we obtain an empirical estimate of f{x
1, …, x
k| M
i} by:
˜f{x
1, …, x
k| M
i} = ˜f{x
1| M
i} … ˜f{x
k| M
i}, (i = 1, …, r).(8)
As before, if we assume that f{x
1, …, x
k| M
i} is the
multivariate multinomial frequency distribution
functions and the conditional independence in (6), then,
the empirical frequency function in (8) is also the
maximum likelihood estimate of f{x
1, …, x
k| M
i}. We
strongly recommend the use of equation (8) rather than
(5), when the number of categorized data layers is more
than two. When the number is greater than two and the
number of the classes in each layer is more than five, then the sizes of many of the unique condition sub-areas become very small and consequently many of the ˘f{x
1,
…, x
k| M
i} are equal to zero. When we integrate the zero value with the estimate of the frequency distribution function from the continuous data layers, they generate an undesirable negative impact on the classification maps.
4) Estimation of frequency distribution function for continuous data layers
Consider the frequency distribution function for continuous data layers, f{y
1, …, y
h| M
i} for a pixel p with h pixel values (y
1, …, y
h), where each y
jrepresents a real value at y within the j
thcontinuous data layer. The simplest non-parametric estimate for the function is an empirical frequency distribution function based on either histogram or kernel method. When the number of layers is more than two, then such empirical distribution is not recommended as an estimate. The next non-parametric procedure requires conditional independence as in (4).
We assume that the h continuous layers are conditionally independent and hence, we have:
f{y
1, …, y
h| M
i} = f{y
1| M
i} … f{y
h| M
i}, (i = 1, …, r).(9) Instead of considering k-dimensional multivariate distribution functions, we may estimate the frequency distribution function as a multiple of k separate univariate frequency distribution functions, where each univeriate individual distribution is estimated by an empirical frequency distribution:
¨f{y
1, …, y
k| M
i} = ¨f{y
1| M
i} … ¨f{y
k| M
i}, (i = 1, …, r).(10) In many situations, the estimate in (7) is an effective estimate of f{y
1, …, y
h| M
i}. A parametric estimate of f{y
1, …, y
h| M
i} is obtained by assuming that it is the multivariate normal distribution functions, N(m
Mi, S
Mi) where m
Miand S
Miare the k-dimensional mean vectors for M
iclass and the k x k dimensional covariance
matrix for M
i, respectively. Suppose that ˆ m
Miand ˆS
Miare the sample mean vector and the sample covariance matrix computed from the training data, then they are the maximum likelihood estimates of m
Miand S
Mi(Hubert-Moy et al., 2001). Using this MLE (Richards, 1995), we obtain the following estimate:
…