Bayesian inference in finite population sampling under measurement error model †
You Mee Goo 1 · Dal Ho Kim 2
12 Department of Statistics, Kyungpook National University
Received 3 October 2012, revised 7 November 2012, accepted 12 November 2012
Abstract
The paper considers empirical Bayes (EB) and hierarchical Bayes (HB) predictors of the finite population mean under a linear regression model with measurement errors.
We discuss how to calculate the mean squared prediction errors of the EB predictors using jackknife methods and the posterior standard deviations of the HB predictors based on the Markov Chain Monte Carlo methods. A simulation study is provided to illustrate the results of the preceding sections and compare the performances of the proposed procedures.
Keywords: Empirical Bayes, finite population mean, Gibbs sampler, hierachical Bayes, jackknife method, mean squared prediction error, posterior standard deviation.
1. Introduction
We consider a finite population U with units labeled 1, 2, ..., N . Let y i denote the value of a single characteristic attached to the unit i. The vector y = (y 1 , ..., y N ) T is the unknown state of nature, and is assumed to belong to Θ = R N . Here we concern exclusively about the finite population mean γ = N −1 P N
i=1 y i . A subset s of {1, 2, ..., N } is called a sample.
A sample s of size n is selected from U according to some specified sampling plan. And let
¯
s = U − s be the unobserved part of U .
In this paper we use the superpopulation approach (Cassel et al., 1977) to survey sam- pling. Under this perspective, according to the conditionality principle (Basu, 1975), the sampling plan is not relevant for inference. Extensive bibliographies on the model-based su- perpopulation approach are given in Bolfarine and Zacks (1992). Throughout, we will use the notations y = (y 1 , ..., y n , y n+1 , ..., y N ) T and y = (y T s , y T s ¯ ) T with y s = (y 1 , ..., y n ) T , y ¯ s = (y n+1 , ..., y N ) T . Also write 1 N −n as the N − n dimensional column vector of 1’s, J N −n = 1 N −n 1 T N −n and I N −n as the identity matrix of order N − n.
Measurement errors may occur when the measuring device is biased or inaccurate. Regard- ing human populations, the respondents may not possess accurate information or they may give biased information. As shown in Table 1.1 of Fuller (1987), simple characteristics like sex
† This Research was supported by Kyungpook National University Research Fund, 2010.
1
Ph.D. candidate, Department of Statistics, Kyungpook National University, Daegu 702-701, Korea.
2
Corresponding author: Professor, Department of Statistics, Kyungpook National University, Daegu
702-701, Korea. E-mail: [email protected]
or age may also present some measurement errors. More complex population characteristics like unemployment, income or salary may present a much more serious measurement bias.
According to Fuller (1987), measurement error is about 15 percent of the total variation for income. The effect of such kind of errors upon estimated regression coefficients has long been recognized as a serious problem. Cochran (1968) and Fuller (1975, 1987) are references reporting distortions that are introduced into the regression coefficient estimates when the variables in the regression equation are measured with errors. There has been little explicit analytical treatment of prediction problem for the finite population mean under regression superpopulation models with measurement errors.
We consider the superpopulation model
y i = α + βx i + e i , i = 1, ..., N ; (1.1) X i = x i + η i , i = 1, ..., N. (1.2) It is assumed that the x i , η i and e i are mutually independent with x i iid ∼ N (µ x , σ 2 x ), η i iid ∼ N (0, σ 2 η ) and e i
iid ∼ N (0, σ 2 e ). The vector of model parameters is denoted by φ = (α, β, µ x , σ 2 x , σ η 2 , σ 2 e ) T . Also we assume that X = (X 1 , X 2 , ..., X N ) T is known. Similarly to y, we use the notations X = (X T s , X T s ¯ ) T with X s = (X 1 , ..., X n ) T , X ¯ s = (X n+1 , ..., X N ) T .
Bolfarine and Cordani (1993) provided the likelihood-based inferences on the slope pa- rameter of a simple linear regression model with measurement errors when the reliability ratio is known. Bolfarine et al. (1996) considered the frequentist approach to the prediction of the finite population total under regression superpopulation model with measurement errors. They study the asymptotic behavior of the naive predictor based on the ordinary least squares estimator as well as a bias-adjusted estimator, establishing the asymptotic normality. Later Torabi et al. (2009) justified that the naive predictor given in Bolfarine et al. (1996) is essentially identical to the EB predictor.
In this article, we consider EB and HB predictors of finite population mean when the covariate, say x, is measured with error. We also assume that x is stochastic. Here the EB procedure need to estimate the hyperparameters, and does not require any approximation of the posterior. The HB procedure also does not rely any normal approximation.
The outline of the remaining section is as follows. In Section 2, we provide the EB and HB predictors of finite population mean where the covariates are measured with error. Also we discuss how to calculate the mean squared prediction error (MSPE) of the EB predictors using jackknife methods and the posterior standard deviation of the HB predictors based on the Markov Chain Monte Carlo (MCMC) methods. In Section 3, a simulation study is provided to illustrate the results of the preceding sections and compare the performances of the proposed procedures. Finally we provide the summary and conclusion.
2. EB and HB predictors of the finite population mean
A sample s of size n is drawn from the finite population and the sample data is denoted by (y i , X i ; i ∈ s). From (1.1) and (1.2), the incidental parameters x i can be eliminated in such a way that (y i , X i ) has a bivariate normal distribution with
y i X i
∼ N α + βµ x µ x
, β 2 σ 2 x + σ e 2 βσ 2 x βσ x 2 σ 2 x + σ η 2
, i = 1, ..., N.
Using the well-known properties of the bivariate normal distribution, it follows that y i |X i
ind ∼ N [ µ y + βk x (X i − µ x ), σ e 2 + β 2 σ x 2 (1 − k x )], i = 1, ..., N
where µ y = α + βµ x and k x = σ x 2 /(σ 2 x + σ 2 η ). We are interested in the estimation of finite population mean γ = N −1 P N
i=1 y i from the sample data. It can be rewritten by γ = N −1 ( 1 T n y s + 1 T N −n y s ¯ ).
First we derive the EB predictor of γ. The Bayes predictor of γ under squared error loss is
ˆ
γ B = E(γ|y s , X, φ) = (1 − f )¯ y s + N −1 1 T N −n E(y s ¯ |y s ),
where f = (N − n)/N is the finite population correction factor and ¯ y s = n −1 P n i=1 y i . The basic problem in finite population sampling is to draw predictive inference about y ¯ s conditional on y s . Since the conditional distribution of y s ¯ given y s is given by
y ¯ s |y s ∼ N [µ y 1 N −n + βk x (X ¯ s − µ x 1 N −n ), {σ 2 e + β 2 σ 2 x (1 − k x )}I N −n ],
we have E(y ¯ s |y s ) = µ y 1 N −n + βk x (X s ¯ − µ x 1 N −n ) for all i ∈ ¯ s. Thus the Bayes predictor of γ is given by
ˆ
γ B = (1 − f )¯ y s + f µ y + f βk x ( ¯ X ¯ s − µ x ). (2.1) Also, the posterior variance of γ given φ is
V (γ|y s , X, φ) = 1
N f {σ e 2 + β 2 σ x 2 (1 − k x )}. (2.2) Note that V (γ|y s , X, φ) does not depend on y s and X. Hence the MSPE of ˆ γ B , E(ˆ γ B − γ) 2 , is equal to the posterior variance of γ. Also, note that the posterior variance of γ depends only on δ = (β, σ 2 x , σ η 2 , σ e 2 ) T . We denote g 1 (δ) ≡ M SP E(ˆ γ B ) = E(ˆ γ B − γ) 2 . If N is large and n/N ≈ 0, then g 1 (δ) ≈ 0.
The EB predictor ˆ γ EB of γ is obtained by replacing φ in the Bayes predictor ˆ γ B by a consistent estimator ˆ φ. The components of φ are unknown and need to be estimated from the data. Let ¯ X s = n −1 P n
i=1 X i , SS X = P n
i=1 (X i − ¯ X s ) 2 , SS y = P n
i=1 (y i − ¯ y s ) 2 , S yX = P n
i=1 (X i − ¯ X s )(y i − ¯ y s ), M S X = (n − 1) −1 SS X , M S y = (n − 1) −1 SS y and M S yX = (n − 1) −1 S yX .
Under some regularity conditions, ¯ y s and ¯ X s are consistent estimator of µ y and µ x , re- spectively, i.e., ˆ µ y = ¯ y s , ˆ µ x = ¯ X s . Under the superpopulation model (1.1) and (1.2), it can be shown that (see Fuller, 1987) E[ ˆ β OLS ] = E( SS S
yXX
) = k x β, where ˆ β OLS is the ordinary least-squares estimator of β, and thus k x β is consistently estimated by ˆ β OLS . Thus the EB predictor of γ is given by
ˆ
γ EB = ¯ y s + f ( ¯ X s ¯ − ¯ X s ) ˆ β OLS . (2.3) Now we obtain a nearly unbiased estimator of MSPE(ˆ γ EB ) = E(ˆ γ EB − γ) 2 , using the jackknife methods proposed by Jiang et al. (2002) and Chen and Lahiri (2002). We have the following orthogonal decomposition:
M SP E(ˆ γ EB ) = E(ˆ γ B − γ) 2 + E(ˆ γ EB − ˆ γ B ) 2 = M 1 + M 2 (2.4)
where M 1 = g 1 (δ) is given by (2.2).
A plug-in estimator of g 1 (δ) is g 1 (ˆ δ). Following Fuller (1987), when we assume σ 2 η to be known, the estimators are given by ˆ σ x 2 = M S X − σ 2 η and ˆ σ e 2 = M S y − ˆ βM S yX where β = SS ˆ yX /{SS X − σ η 2 }. We apply the jackknife method of bias reduction to g 1 (ˆ δ) to get a nearly unbiased estimator of M 1 = g 1 (δ). Let ˆ φ −l be the estimator of φ obtained by deleting the lth data set (y (l) s , X (l) s ) from the full data set (y i , X i ; i ∈ s) and then applying the method-of-moments. This calculation is done for each l in turn to get n estimators of φ : ( ˆ φ −l ; l = 1, ..., n). A jackknife estimator of M 1 is given by
M ˆ 1J = g 1 (ˆ δ) −
n
X
l=1
n − 1
n {g 1 (ˆ δ −l ) − g 1 (ˆ δ)}. (2.5) Turning to jackknife estimation of the last term, M 2 , in (2.4), let ˆ γ B = k(y s , X s , φ) be the Bayes predictor expressed as a function of y s , X s and φ so that ˆ γ EB = k(y s , X s , ˆ φ). Now replace ˆ φ by ˆ φ −l to get ˆ γ EB −l = k(y s , X s , ˆ φ −l )(l = 1, ..., n). Then an jackknife estimator of M 2 is given by
M ˆ 2J =
n
X
l=1
n − 1
n (ˆ γ −l EB − ˆ γ EB ) 2 . (2.6) By taking the sum of (2.5) and (2.6), a jackknife estimator of M SP E(ˆ γ EB ) is obtained as
mspe J (ˆ γ EB ) = ˆ M 1J + ˆ M 2J . (2.7) Next, we consider a hierarchical Bayesian framework to predict the population means γ.
To this end, we begin with the following model:
I. y i |α, β, σ 2 e ind ∼ N (α + βx i , σ e 2 ), i = 1, ..., n where e i
iid ∼ N (0, σ e 2 ).
II. X i |x i , σ 2 η ind ∼ N (x i , σ η 2 ), i = 1, ..., n where η i
iid ∼ N (0, σ η 2 ).
III. x i iid ∼ N (µ x , σ x 2 ).
IV. α, β, µ x , σ 2 e , σ η 2 , σ 2 x are mutually independent with α, β, µ x
iid ∼ uniform(−∞, ∞), σ 2 e ∼ IG(a e /2, b e /2), σ η 2 ∼ IG(a η /2, b η /2), σ x 2 ∼ IG(a x /2, b x /2). Here IG(a, b) denotes an inverse gamma distribution with pdf f a,b (z) ∝ exp(−a/z)z (−b−1) I [z>0] .
The implementation of the Bayesian procedure is greatly facilitated by the MCMC numeri-
cal integration technique, in particular the Gibbs sampler. This requires generating samples
from the full conditionals of each of x i , α, β, µ x , σ 2 e , σ η 2 and σ x 2 given the remaining
parameters and the data. The details are given below.
By the HB model I to IV, the joint posterior distribution is given by
π(α, β, µ x , σ e 2 , σ 2 η , σ x 2 |y s , X s ) ∝ (σ e 2 ) −
n2exp[− 1 2σ e 2
n
X
i=1
(y i − α − βx i ) 2 ]
×(σ 2 η ) −
n2exp[− 1 2σ η 2
n
X
i=1
(X i − x i ) 2 ] × (σ x 2 ) −
n2exp[− 1 2σ 2 x
n
X
i=1
(x i − µ x ) 2
× exp[− a x
2σ x 2 ](σ x 2 ) −(b
x/2+1) × exp[− a η
2σ 2 η ](σ η 2 ) −(b
η/2+1) × exp[− a e
2σ e 2 ](σ e 2 ) −(b
e/2+1) . Then the full conditionals are obtained as follows:
(i) [α | x, β, µ x , σ 2 e , σ η 2 , σ 2 x , y, X] ∼ N (¯ y − β ¯ x, σ n
2e);
(ii) [β | x, α, µ x , σ e 2 , σ 2 η , σ x 2 , y, X] ∼ N (
P
ni=1
(y
i−α)x
iP
ni=1