Bayesian pattern mixture model under nonignorable nonresponse for binary data

(1)

Bayesian pattern mixture model under nonignorable nonresponse for binary data

Sukyoung An ¹ · Balgobin Nandram ² · Dal Ho Kim ³

13 Department of Statistics, Kyungpook National University

2 Department of Mathematical Sciences, Worcester Polytechnic Institute

Received 24 June 2019, revised 12 July 2019, accepted 12 July 2019

Abstract

We consider a Bayesian pattern mixture model to estimate the proportion of the finite population with missing data. The pattern mixture approach is a way to model missing data. We describe the Bayesian model considering two cases for the parameter of a prior distribution. To fit the model, we use Markov chain Monte Carlo methods.

We use the Gibbs sampler with grid method to get the samples of the parameters.

We use the National Crime Survey data summarized by Stasny (1991) to estimate the proportion of the finite population. When considering two cases of the parameter of a prior distribution, we saw that the inference for the parameter was not sensitive in our proposed model.

Keywords: Bayesian estimation, grid methods, latent variable, pattern mixture model.

1. Introduction

Many data are collected to identify what is happening in society. In the data collection process, missing data are included for a variety of reasons. If you don’t know the information about missing data, the inference for the data becomes difficult and you get inaccurate conclusions. Missing data can be ignored if there is no significant information, but data should be taken into account if missing data has significant information. Many studies have been done from the past to deal with missing data. To make inference or estimation using missing data, we should have various background knowledge such as understanding the missing data mechanism, considering the missing data model, and the methods for analysis model. Ma and Chen (2018) reviewed developments and applications for handling missing data in Bayesian methods.

The missing data mechanism is divided into three types (Little and Rubin, 2002). When the missing values do not depend on the missing or observed data, the values are called missing

1

Ph.D. candidate, Department of Statistics, Kyungpook National University, Daegu 41566, Korea.

2

Professor, Department of Mathematical Sciences, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA.

3

Corresponding author: Professor, Department of Statistics, Kyungpook National University, Daegu

41566, Korea. E-mail: [email protected]

(2)

completely at random (MCAR). When the missing values depend only on the observed data, the values are called missing at random (MAR). When the missing values depend on the missing data, the values are called not missing at random (NMAR). The MAR and MCAR are called ignorable missing mechanism, and the NMAR is called nonignorable missing mechanism.

Then there are a variety of nonignorable models for missing data (Little, 1993; Little and Rubin, 2002). The selection model and the pattern mixture model form the joint distribution of random variable X and missing indicator M . The joint distribution of the selection model consists of the marginal distribution X and the conditional distribution M given X, that is p(X, M ) = p(X)p(M |X). The joint distribution of the pattern mixture model consists of the marginal distribution M and the conditional distribution X given M , that is p(X, M ) = p(X|M )p(M ). If the missing value is not MCAR and the distribution assumption is added, the selection model and the pattern mixture model are different.

So far, there have been many studies on the selection model and the pattern mixture model.

The selection model is known to be a useful model for missing data. Stasny (1991) used a selection approach to estimate the finite population proportion. She proposed a hierarchical Bayesian selection model and used the empirical Bayes method. Nandram and Choi (2002) also used the selection approach to estimate the finite population proportion. They proposed an expansion model regardless of the missing mechanism by including the uncertainty pa- rameter in the hierarchical Bayesian selection model. They used the full Bayesian method.

Ekholm and Skinner (1998) compared the selection approach and pattern mixture approach using the Muscatine Coronary Risk Factor Study data. The Little (1993) described pattern mixture models with various examples of multivariate incomplete data. Little explained the selection model is useful, but the pattern mixture model should be considered.

We try to estimate the finite population proportion using the Bayesian pattern mixture model. In section 2 introduces the National Crime Survey (NCS) data we used. Section 3 describes the Bayesian pattern mixture model considering two cases for the parameter of a prior distribution and describes the grid method to obtain the samples of parameters.

Section 4 presents numerical results. We conclude in section 5.

2. National crime survey data

The data we used for estimation is the NCS data summarized by Stasny (1991). The NCS is a primary source of information on criminal victimization collected by the Bureau of Justice Statistics in the United States. Survey respondents provide information about crimes they have experienced victimization.

Stasny has reconstructed a subset of large datasets to make the NCS data easier to han- dle. The subset was a random start at the record of the eighth household and then every fifteenth record after that. The subset was post-stratified according to three characteristics.

The first characteristic is urban (U) and rural (R). The second characteristic is the central

city (C), other incorporated place (I), and unincorporated or not a place (N). The third

characteristic is the low poverty level (L) (9% or fewer of families below the poverty level)

and the high poverty level (H) (10% or more of families below the poverty level). Specifi-

cally, UCL represents a low poverty level in the central city of urban, and UCH represents

a high poverty level in the central city of urban. The other areas can also understand the

three characteristics in combination. Combining the three characteristics creates a total of

(3)

12 areas, but it is virtually impossible for the rural area to become a central city, so it is divided into 10 areas.

Table 2.1 The NCS data summarized by Stasny (1991)

Areas y r − y n − r p ˆ π ˆ

UCL 156 555 104 0.219 0.872

UCH 95 364 73 0.207 0.863

UIL 162 557 101 0.225 0.877

UIH 72 262 36 0.216 0.903

UNL 92 297 79 0.237 0.831

UNH 15 40 9 0.273 0.859

RIL 11 36 7 0.234 0.870

RIH 10 105 20 0.087 0.852

RNL 35 274 32 0.113 0.906

RNH 79 413 64 0.161 0.885

In Table 2.1, y is the number of households who experienced crime, r is the number of households who responded, and n is the number of households. ˆ p = y/r is the proportion of respondents who experienced a crime, and ˆ π = r/n is the proportion of respondents.

3. Pattern mixture model

In this section, we describe the Bayesian pattern mixture model and the procedure for estimating parameters using the Bayesian methods.

Let r j is the response variable of the jth individual, where j = 1, ..., n. Specifically, r _j =

1 if the jth individual is response 0 otherwise.

Let y _j is the characteristic variable of the jth individual. Specifically,

y j =

1 if the jth individual has characteristic 0 otherwise.

And the response proportion is expressed by π, the characteristic proportion of respondents is expressed by p, and the characteristic proportion of nonrespondents is expressed by q. If p and q are equal in the nonignorable nonresponse model, the distributions of the response and nonresponse are equal and become the ignorable model.

The pattern mixture model under nonignorable nonresponse is given by r j | π ^iid ∼ Bernoulli(π),

y _j | r j = 1, p ^iid ∼ Bernoulli(p), y j | r j = 0, q ^iid ∼ Bernoulli(q).

The parameters of interest are π, p, q, and δ, where δ = pπ + q(1 − π) is the proportion of

individuals who have characteristic. The proportion of individuals who have characteristic

in response and nonresponse is also of interest, but it is important to check the overall

(4)

characteristic proportion in the finite population. So we have expressed the characteristic proportion as δ.

For Bayesian analysis, we consider the prior distribution of π and p as a noninformative uniform prior and the prior distribution of q as a beta prior with parameters pτ and (1 − p)τ . And the hyperparameter τ , which corresponds to the sample size of the prior distribution of q, is considered two cases. One is the fixed cases as constants 5, 10, 20, 50, 100 and 200, and the other is the random case as a discrete uniform distribution (5, 10, 20, 50, 100, 200).

Details are as follows.

q | p, τ îid ∼ Beta(pτ, (1 − p)τ ), π îid ∼ Uniform(0, 1), p îid ∼ Uniform(0, 1), τ = 5, 10, 20, 50, 100, and 200,

or τ ^iid ∼ Discrete Uniform(5, 10, 20, 50, 100, 200).

Let r = P n

j=1 r j and y = P n

j=1 y j if r j = 1, where r is the number of respondents, y is the number of respondents with characteristic, and n is the number of individuals. Because we do not know the number of nonrespondents with characteristic, we consider the latent variable z = P n

j=1 y _j if r _j = 0, where z is the number of nonrespondents with characteristic.

Using the latent variable simplifies the computation.

Under the model, the likelihood function is multinomial density,

f (y, r, z|π, p, q) = n!(pπ) ^y ((1 − p)π) ^r−y (q(1 − π)) ^z ((1 − q)(1 − π)) ^n−r−z y!(r − y)!z!(n − r − z)! . The joint posterior density when τ is the fixed cases is

f (π, p, q, z | y, r)

∝ n!π ^r (1 − π) ^n−r p ^y (1 − p) ^r−y q ^{z+pτ −1} (1 − q) n−r−z+(1−p)τ −1

y!(r − y)!z!(n − r − z)!B(pτ, (1 − p)τ )

We must estimate about π, p, and q from the joint posterior density, but the posterior density is complex. So we apply Markov chain Monte Carlo (MCMC) methods. We use Gibbs sampler with grid method. To use Gibbs sampler, we calculate the full conditional densities.

When τ is the fixed cases, the full conditional densities are given below.

(i) π | y, r ∼ Beta(r + 1, n − r + 1);

(ii) f (p | q, z, y, r) ∝ p ^y (1 − p) ^r−y q ^pτ (1 − q) ^−pτ B(pτ, (1 − p)τ ) ; (iii) q | p, z, y, r ∼ Beta(z + pτ, n − r − z + (1 − p)τ );

(iv) f (z | p, q, y, r) ∝ q ^z (1 − q) ^−z

z!(n − r − z)! .

(5)

The parameter π is independent of other parameters. The parameter p, q and z are de- pendent on each other. We use the grid method to obtain the samples. For example, the procedure to generate samples of p is shown below.

Step 1. Divide 100 intervals i k between 0 and 1 (k = 1, ..., 100).

Step 2. Calculate 100 mid-points m k for each intervals (k = 1, ..., 100).

Step 3. Calculate the value a k of the conditional posterior density at the mid-point m k (k = 1, ..., 100).

Step 4. Calculate the value b k = P k

t=1 a t / P 100

k=1 a k (k = 1, ..., 100).

Step 5. Generate u 1 ∼ Uniform(0, 1).

Step 6. Select the k + 1 th interval which satisfied by b k < u 1 ≤ b k+1 (k = 1, ..., 100).

Step 7. Generate u 2 ∼ Uniform(i k+1 ) and set p 1 = u 2 .

Then repeat Step 1 to Step 7 to generate p 2 . Other parameters are also generated samples according to the above procedure.

Then the joint posterior density when τ is the random case is f (π, p, q, τ, z | y, r)

∝ n!π ^r (1 − π) ^n−r p ^y (1 − p) ^r−y q ^{z+pτ −1} (1 − q) n−r−z+(1−p)τ −1

y!(r − y)!z!(n − r − z)!B(pτ, (1 − p)τ ) . The full conditional densities when τ is the random case are given below.

(1) π | y, r ∼ Beta(r + 1, n − r + 1);

(ii) f (p | q, τ, z, y, r) ∝ p ^y (1 − p) ^r−y q ^pτ (1 − q) ^−pτ B(pτ, (1 − p)τ ) ; (iii) q | p, τ, z, y, r ∼ Beta(z + pτ, n − r − z + (1 − p)τ );

(iv) p(τ | p, q, z, y, r) ∝ q ^pτ (1 − q) ^(1−p)τ B(pτ, (1 − p)τ ) ; (v) f (z | p, q, τ, y, r) ∝ q ^z (1 − q) ^−z

z!(n − r − z)! .

The parameter π is independent of other parameters. The parameter p, q, τ , and z are correlated.

4. Numerical study

We estimated the finite population proportion based on the real NCS data summarized by Stasny (1991). We used the grid method to obtain samples of all parameters. We did 150,000 iterations for each area and each τ , and set the first 75,000 samples to the burn-in period. We determined thining for each model to get independent samples and obtained at least 1,000 samples. For example, when τ is a discrete uniform in RNH area, we drew 150,000 iterates, and set the first 75,000 samples as the burn-in, and took every thirty.

We used various methods to diagnose the convergence of the samples obtained in the

Markov chain (Cowles and Carlin, 1996; Sahlin, 2011). The trace plot and Geweke’s diag-

nostics were used to diagnose the convergence of the obtained samples. The autocorrelation

(6)

plot and the effective sample size were used to confirm the correlation of the samples. Figure 4.1 shows the trace plots and the autocorrelation plots of RNH area when τ is a discrete uniform. We confirmed the convergence through the trace plot and confirmed that there was no correlation between the samples through the autocorrelation plot. Table 4.1 shows the p-values of Geweke’s diagnostics and effective sample sizes in RNH area when τ is a discrete uniform. Because there are many the trace plots, the autocorrelation plots, the p- value of Geweke’s diagnostics and the effective sample size used to confirm convergence and correlation, we did not present them in this paper.

Figure 4.1 Trace plots and ACF plots when τ is a discrete uniform in RNH area

Table 4.1 The p-values of Geweke’s diagnostics and effective sample size when τ is a discrete uniform in RNH area.

Parameter p-values of Geweke’s diagnostics effective sample size

π 0.849 2500

p 0.830 2691

q 0.272 2500

τ 0.719 2500

z 0.076 2338

The posterior means (PM), posterior standard deviations (PSD), and 95% highest poste- rior density (HPD) intervals of the parameters were calculated according to τ in each area.

We can see PMs, PSDs, and HPD intervals of all interested parameters in Table 4.2, 4.3, 4.4,

and 4.5. In summary, PMs of the response proportion π are similar to the direct estimates,

and PSDs and HPD intervals are similar for every τ . PMs of the characteristic proportion

in the response p are similar to the direct estimates except for areas where y is small. PSDs

and HPD intervals of p were similar for every τ . And PMs of the characteristic proportion

in the nonresponse q, which we do not know, were similar between fixed τ and random τ .

However, if the τ increase in the fixed case, PSDs decreases and the HPD intervals also

decreases. Estimations for π and p are not very sensitive to the hyperparameter. PMs, PSDs

and HPD intervals of the proportion of individuals who have characteristic δ are similar for

every τ in each area.

(7)

Table 4.2 Posterior means, posterior standard deviations, and highest posterior density intervals for π

Areas τ PM PSD HPD Areas τ PM PSD HPD

UCL 5 0.872 0.012 (0.848, 0.894) UNH 5 0.849 0.043 (0.763, 0.930)

10 0.871 0.012 (0.848, 0.894) 10 0.848 0.043 (0.763, 0.931)

20 0.871 0.012 (0.849, 0.895) 20 0.848 0.044 (0.765, 0.930)

50 0.871 0.012 (0.849, 0.894) 50 0.848 0.043 (0.758, 0.926)

100 0.872 0.012 (0.849, 0.894) 100 0.849 0.044 (0.764, 0.931)

200 0.872 0.012 (0.849, 0.894) 200 0.849 0.044 (0.764, 0.927)

DU 0.871 0.012 (0.849, 0.894) DU 0.849 0.044 (0.762, 0.933)

UCH 5 0.862 0.014 (0.831, 0.887) RIL 5 0.858 0.046 (0.764, 0.939)

10 0.862 0.015 (0.832, 0.890) 10 0.857 0.047 (0.761, 0.941)

20 0.862 0.015 (0.831, 0.890) 20 0.857 0.045 (0.772, 0.945)

50 0.861 0.015 (0.832, 0.889) 50 0.857 0.046 (0.764, 0.942)

100 0.862 0.015 (0.832, 0.890) 100 0.857 0.046 (0.768, 0.942)

200 0.861 0.015 (0.830, 0.888) 200 0.857 0.047 (0.762, 0.939)

DU 0.861 0.015 (0.831, 0.888) DU 0.857 0.046 (0.765, 0.940)

UIL 5 0.875 0.012 (0.853, 0.898) RIH 5 0.846 0.030 (0.782, 0.900)

10 0.876 0.011 (0.854, 0.898) 10 0.847 0.031 (0.788, 0.907)

20 0.876 0.012 (0.852, 0.897) 20 0.847 0.030 (0.783, 0.900)

50 0.876 0.011 (0.854, 0.898) 50 0.846 0.031 (0.786, 0.906)

100 0.876 0.012 (0.852, 0.897) 100 0.847 0.031 (0.784, 0.903)

200 0.876 0.012 (0.852, 0.897) 200 0.847 0.031 (0.786, 0.907)

DU 0.876 0.012 (0.853, 0.898) DU 0.846 0.031 (0.790, 0.908)

UIH 5 0.900 0.015 (0.868, 0.928) RNL 5 0.904 0.016 (0.874, 0.934)

10 0.901 0.016 (0.870, 0.932) 10 0.904 0.016 (0.872, 0.933)

20 0.901 0.016 (0.869, 0.929) 20 0.904 0.016 (0.871, 0.933)

50 0.901 0.015 (0.870, 0.931) 50 0.904 0.016 (0.871, 0.933)

100 0.901 0.015 (0.870, 0.930) 100 0.904 0.016 (0.873, 0.934)

200 0.901 0.015 (0.871, 0.931) 200 0.904 0.016 (0.873, 0.935)

DU 0.901 0.016 (0.872, 0.932) DU 0.904 0.016 (0.873, 0.935)

UNL 5 0.830 0.017 (0.797, 0.865) RNH 5 0.883 0.013 (0.859, 0.911)

10 0.830 0.017 (0.794, 0.862) 10 0.883 0.014 (0.856, 0.909)

20 0.829 0.017 (0.796, 0.862) 20 0.884 0.014 (0.855, 0.908)

50 0.830 0.017 (0.794, 0.862) 50 0.884 0.014 (0.857, 0.910)

100 0.830 0.017 (0.796, 0.864) 100 0.884 0.014 (0.857, 0.910)

200 0.830 0.018 (0.796, 0.864) 200 0.883 0.014 (0.857, 0.910)

DU 0.830 0.017 (0.794, 0.862) DU 0.884 0.013 (0.858, 0.910)

NOTE: DU is a discrete uniform(5, 10, 20, 50 ,100, 200)

(8)

Table 4.3 Posterior means, posterior standard deviations, and highest posterior density intervals for p

Areas τ PM PSD HPD Areas τ PM PSD HPD

UCL 5 0.220 0.015 (0.190, 0.249) UNH 5 0.280 0.059 (0.165, 0.393)

10 0.221 0.016 (0.190, 0.250) 10 0.282 0.060 (0.170, 0.400)

20 0.220 0.016 (0.190, 0.250) 20 0.279 0.060 (0.166, 0.393)

50 0.220 0.016 (0.190, 0.250) 50 0.280 0.059 (0.168, 0.396)

100 0.220 0.016 (0.190, 0.250) 100 0.281 0.059 (0.170, 0.399)

200 0.220 0.016 (0.191, 0.250) 200 0.280 0.060 (0.163, 0.395)

DU 0.220 0.016 (0.188, 0.250) DU 0.281 0.059 (0.171, 0.399)

UCH 5 0.208 0.020 (0.171, 0.250) RIL 5 0.245 0.061 (0.126, 0.363)

10 0.209 0.019 (0.170, 0.243) 10 0.244 0.061 (0.125, 0.361)

20 0.208 0.019 (0.171, 0.246) 20 0.245 0.061 (0.135, 0.367)

50 0.208 0.019 (0.170, 0.245) 50 0.245 0.061 (0.131, 0.366)

100 0.208 0.019 (0.170, 0.244) 100 0.244 0.060 (0.124, 0.358)

200 0.208 0.019 (0.170, 0.244) 200 0.244 0.060 (0.135, 0.369)

DU 0.208 0.019 (0.170, 0.246) DU 0.245 0.061 (0.134, 0.369)

UIL 5 0.226 0.016 (0.199, 0.258) RIH 5 0.094 0.027 (0.045, 0.148)

10 0.226 0.016 (0.198, 0.260) 10 0.094 0.027 (0.040, 0.144)

20 0.226 0.015 (0.200, 0.260) 20 0.094 0.027 (0.044, 0.148)

50 0.226 0.016 (0.198, 0.260) 50 0.095 0.027 (0.046, 0.150)

100 0.226 0.016 (0.197, 0.260) 100 0.094 0.027 (0.047, 0.150)

200 0.226 0.016 (0.199, 0.260) 200 0.094 0.027 (0.042, 0.145)

DU 0.226 0.016 (0.198, 0.260) DU 0.095 0.028 (0.043, 0.150)

UIH 5 0.217 0.023 (0.175, 0.264) RNL 5 0.115 0.018 (0.083, 0.149)

10 0.217 0.023 (0.172, 0.260) 10 0.116 0.018 (0.081, 0.151)

20 0.217 0.023 (0.172, 0.260) 20 0.116 0.018 (0.080, 0.150)

50 0.217 0.023 (0.172, 0.260) 50 0.116 0.018 (0.080, 0.151)

100 0.217 0.023 (0.171, 0.260) 100 0.116 0.018 (0.080, 0.151)

200 0.217 0.023 (0.173, 0.260) 200 0.116 0.019 (0.079, 0.151)

DU 0.217 0.023 (0.172, 0.260) DU 0.116 0.018 (0.080, 0.150)

UNL 5 0.239 0.022 (0.200, 0.287) RNH 5 0.161 0.017 (0.130, 0.195)

10 0.239 0.022 (0.197, 0.280) 10 0.162 0.017 (0.130, 0.196)

20 0.238 0.021 (0.200, 0.283) 20 0.162 0.017 (0.130, 0.195)

50 0.238 0.022 (0.195, 0.278) 50 0.162 0.017 (0.130, 0.196)

100 0.238 0.021 (0.197, 0.280) 100 0.162 0.017 (0.130, 0.194)

200 0.238 0.022 (0.197, 0.280) 200 0.162 0.017 (0.130, 0.196)

DU 0.238 0.022 (0.196, 0.280) DU 0.162 0.017 (0.131, 0.196)

NOTE: DU is a discrete uniform(5, 10, 20, 50 ,100, 200)

(9)

Table 4.4 Posterior means, posterior standard deviations, and highest posterior density intervals for q

Areas τ PM PSD HPD Areas τ PM PSD HPD

UCL 5 0.214 0.167 (0, 0.543) UNH 5 0.280 0.189 (0, 0.633)

10 0.219 0.126 (0.012, 0.455) 10 0.283 0.149 (0.029, 0.567) 20 0.220 0.091 (0.060, 0.398) 20 0.280 0.115 (0.069, 0.500) 50 0.220 0.059 (0.107, 0.335) 50 0.281 0.086 (0.118, 0.445) 100 0.220 0.043 (0.140, 0.306) 100 0.281 0.074 (0.145, 0.428) 200 0.220 0.033 (0.156, 0.284) 200 0.279 0.067 (0.148, 0.405) DU 0.218 0.095 (0.001, 0.389) DU 0.281 0.120 (0.049, 0.531)

UCH 5 0.212 0.165 (0, 0.553) RIL 5 0.247 0.185 (0, 0.603)

10 0.211 0.125 (0.010, 0.452) 10 0.246 0.141 (0.012, 0.516) 20 0.208 0.091 (0.053, 0.393) 20 0.244 0.112 (0.050, 0.463) 50 0.208 0.059 (0.100, 0.327) 50 0.245 0.085 (0.089, 0.416) 100 0.208 0.044 (0.126, 0.295) 100 0.245 0.073 (0.102, 0.386) 200 0.208 0.034 (0.144, 0.278) 200 0.244 0.068 (0.112, 0.374) DU 0.202 0.096 (0.001, 0.354) DU 0.245 0.118 (0.009, 0.455) UIL 5 0.217 0.165 (0.001, 0.550) RIH 5 0.095 0.124 (0, 0.365)

10 0.225 0.127 (0.010, 0.465) 10 0.095 0.092 (0, 0.288) 20 0.227 0.091 (0.067, 0.408) 20 0.095 0.070 (0, 0.232) 50 0.226 0.060 (0.118, 0.348) 50 0.095 0.050 (0.010, 0.190) 100 0.227 0.044 (0.143, 0.315) 100 0.094 0.039 (0.020, 0.170) 200 0.227 0.033 (0.163, 0.292) 200 0.094 0.034 (0.029, 0.156) DU 0.225 0.097 (0.016, 0.411) DU 0.095 0.074 (0, 0.230)

UIH 5 0.217 0.171 (0, 0.547) RNL 5 0.116 0.130 (0, 0.386)

10 0.218 0.129 (0.016, 0.479) 10 0.114 0.097 (0, 0.309)

20 0.217 0.093 (0.044, 0.391) 20 0.116 0.072 (0.004, 0.255)

50 0.217 0.062 (0.106, 0.342) 50 0.115 0.048 (0.032, 0.210)

100 0.217 0.046 (0.125, 0.302) 100 0.116 0.037 (0.048, 0.190)

200 0.217 0.037 (0.148, 0.289) 200 0.116 0.029 (0.058, 0.170)

DU 0.215 0.098 (0.013, 0.401) DU 0.117 0.081 (0, 0.262)

UNL 5 0.234 0.169 (0.001, 0.566) RNH 5 0.160 0.150 (0, 0.476)

10 0.237 0.125 (0.026, 0.478) 10 0.161 0.111 (0.001, 0.372)

20 0.238 0.093 (0.066, 0.416) 20 0.160 0.080 (0.026, 0.315)

50 0.238 0.062 (0.122, 0.362) 50 0.162 0.054 (0.062, 0.267)

100 0.238 0.047 (0.149, 0.329) 100 0.162 0.040 (0.084, 0.237)

200 0.238 0.037 (0.169, 0.315) 200 0.162 0.031 (0.101, 0.220)

DU 0.236 0.105 (0.026, 0.442) DU 0.162 0.088 (0, 0.324)

NOTE: DU is a discrete uniform(5, 10, 20, 50 ,100, 200)

(10)

Table 4.5 Posterior means, posterior standard deviations, and highest posterior density intervals for δ

Areas τ PM PSD HPD Areas τ PM PSD HPD

UCL 5 0.219 0.027 (0.172, 0.270) UNH 5 0.280 0.065 (0.158, 0.405)

10 0.221 0.023 (0.179, 0.266) 10 0.282 0.064 (0.160, 0.408)

20 0.220 0.019 (0.182, 0.256) 20 0.279 0.062 (0.170, 0.406)

50 0.220 0.017 (0.187, 0.254) 50 0.280 0.060 (0.162, 0.392)

100 0.220 0.016 (0.188, 0.251) 100 0.281 0.060 (0.169, 0.401)

200 0.220 0.016 (0.190, 0.252) 200 0.279 0.060 (0.163, 0.397)

DU 0.220 0.020 (0.183, 0.260) DU 0.281 0.061 (0.167, 0.400)

UCH 5 0.209 0.030 (0.157, 0.269) RIL 5 0.245 0.066 (0.118, 0.373)

10 0.209 0.025 (0.161, 0.258) 10 0.244 0.064 (0.123, 0.369)

20 0.208 0.022 (0.164, 0.251) 20 0.245 0.063 (0.128, 0.369)

50 0.208 0.021 (0.169, 0.249) 50 0.245 0.061 (0.126, 0.362)

100 0.208 0.020 (0.171, 0.248) 100 0.244 0.061 (0.129, 0.363)

200 0.208 0.020 (0.172, 0.247) 200 0.244 0.061 (0.128, 0.364)

DU 0.207 0.023 (0.160, 0.253) DU 0.245 0.062 (0.128, 0.369)

UIL 5 0.225 0.027 (0.178, 0.278) RIH 5 0.094 0.033 (0.038, 0.163)

10 0.226 0.022 (0.184, 0.272) 10 0.094 0.030 (0.040, 0.154)

20 0.226 0.019 (0.189, 0.263) 20 0.094 0.029 (0.040, 0.151)

50 0.226 0.018 (0.194, 0.262) 50 0.095 0.028 (0.045, 0.151)

100 0.226 0.017 (0.195, 0.259) 100 0.094 0.027 (0.046, 0.150)

200 0.226 0.016 (0.195, 0.258) 200 0.094 0.027 (0.045, 0.149)

DU 0.226 0.020 (0.187, 0.266) DU 0.095 0.029 (0.042, 0.155)

UIH 5 0.217 0.029 (0.161, 0.273) RNL 5 0.115 0.022 (0.078, 0.160)

10 0.217 0.027 (0.167, 0.271) 10 0.116 0.021 (0.076, 0.156)

20 0.217 0.025 (0.169, 0.265) 20 0.116 0.020 (0.078, 0.154)

50 0.217 0.023 (0.174, 0.264) 50 0.116 0.019 (0.080, 0.154)

100 0.217 0.023 (0.174, 0.264) 100 0.116 0.019 (0.079, 0.152)

200 0.217 0.023 (0.174, 0.263) 200 0.116 0.019 (0.080, 0.152)

DU 0.217 0.025 (0.170, 0.266) DU 0.116 0.020 (0.080, 0.156)

UNL 5 0.238 0.035 (0.175, 0.309) RNH 5 0.161 0.024 (0.125, 0.218)

10 0.238 0.030 (0.182, 0.299) 10 0.162 0.021 (0.121, 0.205)

20 0.238 0.026 (0.187, 0.289) 20 0.162 0.019 (0.127, 0.201)

50 0.238 0.024 (0.193, 0.284) 50 0.162 0.018 (0.129, 0.199)

100 0.238 0.022 (0.192, 0.279) 100 0.162 0.017 (0.128, 0.194)

200 0.238 0.022 (0.197, 0.284) 200 0.162 0.017 (0.130, 0.197)

DU 0.238 0.028 (0.186, 0.295) DU 0.162 0.019 (0.127, 0.201)

NOTE: DU is a discrete uniform(5, 10, 20, 50 ,100, 200)

(11)

5. Concluding remarks

In this paper, we proposed a Bayesian pattern mixture model for binary data, and we applied two cases of the parameter of a prior distribution. When the parameters of the prior distribution are applied differently in the individual area, we saw that the inference for π and p was not sensitive. However, in NCS data, UNH, RIL, and RIH areas are relatively small compared to other areas. If the data is small, the estimation may not be reliable. The following study should consider methods to supplement small data estimates.

References

Cowles, M. K. and Carlin, B. P. (1996). Markov chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association, 91, 883-904.

Ekholm, A. and Skinner, C. (1998). The muscatine children’s obesity data reanalysed using pattern mixture models. Applied Statistics, 47, 251-263.

Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88, 125-134.

Little, R. J. A. and Rubin, D. B. (2002). Statistical analysis with missing data, 2nd Ed., Wiley, New York.

Nandram, B. and Choi, J. W. (2002). Hierarchical Bayesian nonresponse models for binary data from small areas with uncertainty about ignorability. Journal of the American Statistical Association, 97, 381-388.

Sahlin, K. (2011). Estimating convergence of Markov chain Monte Carlo simulations, Master’s Thesis, Mathematical Statistics, Stockholm University.

Stasny, E. A. (1991). Hierarchical models for the probabilities of a survey classification and nonresponse: An example from the national crime survey. Journal of the American Statistical Association, 86, 296-303.

Ma, Z. and Chen, G. (2018). Bayesian methods for dealing with missing data problems. Journal of the

Korean Statistical Society, 47, 297-313.

Bayesian pattern mixture model under nonignorable nonresponse for binary data

Bayesian pattern mixture model under nonignorable nonresponse for binary data

Sukyoung An 1 · Balgobin Nandram 2 · Dal Ho Kim 3

13 Department of Statistics, Kyungpook National University

2 Department of Mathematical Sciences, Worcester Polytechnic Institute

Received 24 June 2019, revised 12 July 2019, accepted 12 July 2019

Abstract

We use the Gibbs sampler with grid method to get the samples of the parameters.

We use the National Crime Survey data summarized by Stasny (1991) to estimate the proportion of the finite population. When considering two cases of the parameter of a prior distribution, we saw that the inference for the parameter was not sensitive in our proposed model.

Keywords: Bayesian estimation, grid methods, latent variable, pattern mixture model.

1. Introduction

The missing data mechanism is divided into three types (Little and Rubin, 2002). When the missing values do not depend on the missing or observed data, the values are called missing

Ph.D. candidate, Department of Statistics, Kyungpook National University, Daegu 41566, Korea.

Professor, Department of Mathematical Sciences, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA.

Corresponding author: Professor, Department of Statistics, Kyungpook National University, Daegu

41566, Korea. E-mail: [email protected]

So far, there have been many studies on the selection model and the pattern mixture model.

Section 4 presents numerical results. We conclude in section 5.

2. National crime survey data

Stasny has reconstructed a subset of large datasets to make the NCS data easier to han- dle. The subset was a random start at the record of the eighth household and then every fifteenth record after that. The subset was post-stratified according to three characteristics.

The first characteristic is urban (U) and rural (R). The second characteristic is the central

city (C), other incorporated place (I), and unincorporated or not a place (N). The third

characteristic is the low poverty level (L) (9% or fewer of families below the poverty level)

and the high poverty level (H) (10% or more of families below the poverty level). Specifi-

cally, UCL represents a low poverty level in the central city of urban, and UCH represents

a high poverty level in the central city of urban. The other areas can also understand the

three characteristics in combination. Combining the three characteristics creates a total of

12 areas, but it is virtually impossible for the rural area to become a central city, so it is divided into 10 areas.

Table 2.1 The NCS data summarized by Stasny (1991)

Areas y r − y n − r p ˆ π ˆ

UCL 156 555 104 0.219 0.872

UCH 95 364 73 0.207 0.863

UIL 162 557 101 0.225 0.877

UIH 72 262 36 0.216 0.903

UNL 92 297 79 0.237 0.831

UNH 15 40 9 0.273 0.859

RIL 11 36 7 0.234 0.870

RIH 10 105 20 0.087 0.852

RNL 35 274 32 0.113 0.906

RNH 79 413 64 0.161 0.885

In Table 2.1, y is the number of households who experienced crime, r is the number of households who responded, and n is the number of households. ˆ p = y/r is the proportion of respondents who experienced a crime, and ˆ π = r/n is the proportion of respondents.

3. Pattern mixture model

In this section, we describe the Bayesian pattern mixture model and the procedure for estimating parameters using the Bayesian methods.

Let r j is the response variable of the jth individual, where j = 1, ..., n. Specifically, r j =

 1 if the jth individual is response 0 otherwise.

Let y j is the characteristic variable of the jth individual. Specifically,

y j =

 1 if the jth individual has characteristic 0 otherwise.

The pattern mixture model under nonignorable nonresponse is given by r j | π iid ∼ Bernoulli(π),

y j | r j = 1, p iid ∼ Bernoulli(p), y j | r j = 0, q iid ∼ Bernoulli(q).

The parameters of interest are π, p, q, and δ, where δ = pπ + q(1 − π) is the proportion of

individuals who have characteristic. The proportion of individuals who have characteristic

in response and nonresponse is also of interest, but it is important to check the overall

characteristic proportion in the finite population. So we have expressed the characteristic proportion as δ.

Details are as follows.

q | p, τ iid ∼ Beta(pτ, (1 − p)τ ), π iid ∼ Uniform(0, 1), p iid ∼ Uniform(0, 1), τ = 5, 10, 20, 50, 100, and 200,

or τ iid ∼ Discrete Uniform(5, 10, 20, 50, 100, 200).

Let r = P n

j=1 r j and y = P n

j=1 y j if r j = 1, where r is the number of respondents, y is the number of respondents with characteristic, and n is the number of individuals. Because we do not know the number of nonrespondents with characteristic, we consider the latent variable z = P n

j=1 y j if r j = 0, where z is the number of nonrespondents with characteristic.

Using the latent variable simplifies the computation.

Under the model, the likelihood function is multinomial density,

f (y, r, z|π, p, q) = n!(pπ) y ((1 − p)π) r−y (q(1 − π)) z ((1 − q)(1 − π)) n−r−z y!(r − y)!z!(n − r − z)! . The joint posterior density when τ is the fixed cases is

f (π, p, q, z | y, r)

∝ n!π r (1 − π) n−r p y (1 − p) r−y q z+pτ −1 (1 − q) n−r−z+(1−p)τ −1

y!(r − y)!z!(n − r − z)!B(pτ, (1 − p)τ )

We must estimate about π, p, and q from the joint posterior density, but the posterior density is complex. So we apply Markov chain Monte Carlo (MCMC) methods. We use Gibbs sampler with grid method. To use Gibbs sampler, we calculate the full conditional densities.

When τ is the fixed cases, the full conditional densities are given below.

(i) π | y, r ∼ Beta(r + 1, n − r + 1);

(ii) f (p | q, z, y, r) ∝ p y (1 − p) r−y q pτ (1 − q) −pτ B(pτ, (1 − p)τ ) ; (iii) q | p, z, y, r ∼ Beta(z + pτ, n − r − z + (1 − p)τ );

(iv) f (z | p, q, y, r) ∝ q z (1 − q) −z

z!(n − r − z)! .

The parameter π is independent of other parameters. The parameter p, q and z are de- pendent on each other. We use the grid method to obtain the samples. For example, the procedure to generate samples of p is shown below.

Step 1. Divide 100 intervals i k between 0 and 1 (k = 1, ..., 100).

Step 2. Calculate 100 mid-points m k for each intervals (k = 1, ..., 100).

Step 3. Calculate the value a k of the conditional posterior density at the mid-point m k (k = 1, ..., 100).

Step 4. Calculate the value b k = P k

t=1 a t / P 100

k=1 a k (k = 1, ..., 100).

Sukyoung An ¹ · Balgobin Nandram ² · Dal Ho Kim ³

Let r j is the response variable of the jth individual, where j = 1, ..., n. Specifically, r _j =

1 if the jth individual is response 0 otherwise.

Let y _j is the characteristic variable of the jth individual. Specifically,

1 if the jth individual has characteristic 0 otherwise.

The pattern mixture model under nonignorable nonresponse is given by r j | π ^iid ∼ Bernoulli(π),

y _j | r j = 1, p ^iid ∼ Bernoulli(p), y j | r j = 0, q ^iid ∼ Bernoulli(q).

q | p, τ îid ∼ Beta(pτ, (1 − p)τ ), π îid ∼ Uniform(0, 1), p îid ∼ Uniform(0, 1), τ = 5, 10, 20, 50, 100, and 200,

or τ ^iid ∼ Discrete Uniform(5, 10, 20, 50, 100, 200).

j=1 y _j if r _j = 0, where z is the number of nonrespondents with characteristic.

f (y, r, z|π, p, q) = n!(pπ) ^y ((1 − p)π) ^r−y (q(1 − π)) ^z ((1 − q)(1 − π)) ^n−r−z y!(r − y)!z!(n − r − z)! . The joint posterior density when τ is the fixed cases is

∝ n!π ^r (1 − π) ^n−r p ^y (1 − p) ^r−y q ^{z+pτ −1} (1 − q) n−r−z+(1−p)τ −1

(ii) f (p | q, z, y, r) ∝ p ^y (1 − p) ^r−y q ^pτ (1 − q) ^−pτ B(pτ, (1 − p)τ ) ; (iii) q | p, z, y, r ∼ Beta(z + pτ, n − r − z + (1 − p)τ );

(iv) f (z | p, q, y, r) ∝ q ^z (1 − q) ^−z

∝ n!π ^r (1 − π) ^n−r p ^y (1 − p) ^r−y q ^{z+pτ −1} (1 − q) n−r−z+(1−p)τ −1

(ii) f (p | q, τ, z, y, r) ∝ p ^y (1 − p) ^r−y q ^pτ (1 − q) ^−pτ B(pτ, (1 − p)τ ) ; (iii) q | p, τ, z, y, r ∼ Beta(z + pτ, n − r − z + (1 − p)τ );

(iv) p(τ | p, q, z, y, r) ∝ q ^pτ (1 − q) ^(1−p)τ B(pτ, (1 − p)τ ) ; (v) f (z | p, q, τ, y, r) ∝ q ^z (1 − q) ^−z