Nonparametric Bayesian test of homogeneity using a discretization approach
MinSup Kim 1 · Balgobin Nandram 2 · Dal Ho Kim 3
13 Department of Statistics, Kyungpook National University
2 Department of Mathematical Sciences, Worcester Polytechnic Institute
Received 24 October 2017, revised 9 November 2017, accepted 13 November 2017
Abstract
In this paper, we consider nonparametric Bayesian test of homogeneity using a hierarchical multinomial model with Dirichlet process priors in small area setup. If we discretize a continuous variable properly, the discretization approach could find some association between the groups and the variable even if the groups are homogeneous through k-sample tests involving one-way ANOVA. It could also be used to look at heterogeneity at specific levels of the variable of interest among groups. We use the clustering by the k-means and Dirichlet process to discretize the continuous variable.
When we discretize the continuous variable, it can be treated as an analysis of the contingency table. Then the chi-squared test is the most common thought. If more slices are added, however, chi-squared test is less accurate. So we use the Bayes factor through the nonparmetric Bayesian model and apply it to the test of homogeneity.
Keywords: Bayesian nonparametrics, contingency table, Dirichlet process prior, dis- cretization, homogeneity test, small areas.
1. Introduction
In the parametric Bayesian model, the use of conjugate priors is easy to understand the results and simplifies the calculation, but it limits the flexibility of the model by fixing the characteristics of the data. So we can think of a nonparametric Bayesian model if we need to avoid the parametric approach to parameters and consider a more robust and flexible approach to the model. In this case, we use Dirichlet process as priors in the hierarchical Bayesian setup. The Dirichlet process was introduced by Ferguson (1973). The Dirichlet process define a distribution over distributions. When a random probability distribution G follows the Dirichlet process, we denote G ∼ DP (α, G 0 ) where α > 0 is a scaling parameter, and G 0 is the base distribution. For all (A 1 , ..A k ) finite partitions of a measurable space Θ, G ∼ DP (α, G 0 ) means that (G(A 1 ), ..., G(A k )) ∼ Dir(αG 0 (A 1 ), ..., αG 0 (A k )). Blackwell
1
Ph.D. candidate, Department of Statistics, Kyungpook National University, Daegu 41566, Korea.
2
Professor, Department of Mathematical Sciences, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA.
3
Corresponding author: Professor, Department of Statistics, Kyungpook National University, Daegu
41566, Korea. E-mail: [email protected]
and MacQueen (1973) explained the Dirichlet process as the P´ olya urn scheme. That is the prediction rule as follows. When there are the sequences {X n , n = 1}, for a measurable A ∈ Θ, P (X n+1 ∈ A|X 1 , ..., X n ) = (αG 0 (A) + P n
i=1 I(X i ∈ A))/α + n.
Sethuraman (1994) described the Dirichlet process as a stick-breaking representation, and now this is the most common way of representing Dirichlet process. G can be expressed in stick-breaking representation as follow
G = P ∞
j=1 w j δ φ
j, (1.1)
where φ j are independent and identically distributed random elements from G 0 , w j are random weights for φ j chosen to be independent of φ j , 0 < w j < 1, and P ∞
j=1 w j = 1 almost surely. Here w j are given by
w 1 = ν 1 , w j = ν j
Y
l<j
(1 − ν l ), j = 2, 3, ... (1.2)
with the ν j iid ∼ Beta(1, α). This Dirichlet process is widely used as a prior for the nonpara- metric Bayesian model.
Recent methods for sampling the Dirichlet mixture model include blocked Gibbs sampler (Ishwaran and James, 2001), retrospective sampler (Papaspiliopoulos and Roberts, 2008), and slice sampler (Walker, 2007; Kalli et al., 2011). These methods are called conditional methods, avoiding the P´ olya urn scheme and sampling a sufficient but finite number of variables at each iteration of a Markov chain. In this study, we use the slice sampling method.
This method was introduced by Walker (2007) using the latent variable which allows a finite number of variables to be sampled at each iteration of a Gibbs sampler. Kalli et al. (2011) proposed a more efficient and general version of the slice sampler by Walker (2007). So, one of our main concerns is to perform the test of homogeneity between groups by the Dirichlet mixture model using the slice sampling method.
Another concern is on the discretization of continuous variables. Miller and Siegmund (1982) studied a discretization approach based on the maximally selected chi-square statis- tic. That selects a cut point to maximize the standard chi square statistic and then forming a 2×2 contingency tables of the numbers of observations. In the k-sample test, discretization approaches can be also considered. Jiang et al. (2015) studied the dynamic slicing method based on the regularized likelihood ratio testing. This regularized likelihood-ratio test statis- tic is mutual information with penalty terms, to prevent overslice when cutting a continuous variable. It can be seen as a generalized version of what Miller and Siegmund (1982) studied.
This kind of a discretization approach could find some associations between the groups and the variables. Even though they appear to be homogeneous among groups, the proportions of variables at a particular level vary from group to group. So appropriate discretization of continuous variables can support the alternative hypothesis in homogeneity test even if the null hypothesis was not rejected through k-sample tests involving ANOVA.
In our study, we use the clustering by the k-means and Dirichlet process to discretize the continuous variable. This clustering method minimizes the difference within the cluster and maximizes the difference between the clusters. When clusters are present in the data, discretization using clustering represents heterogeneity between groups at a particular level.
And the discretization results in a homogeneity test on l ×c tables. In l ×c tables, the general
homogeneity test is χ 2 test. However, the more discretization, the less data in the cell, which
makes it difficult to perform accurate test. So we use Bayes factor for the homogeneity test and compare the p-values of the classic χ 2 test and Cressie-Read test (1984) with Bayes factor. The Cressie-Read test means the test with the following power divergence statistic with r = 2/3 applied.
P 2 = 2 r(r + 1)
X
ij
n ij
n ( n ij
λ ˆ ij
) r − 1 o
, −∞ < r < ∞
where n ij are cell counts and ˆ λ ij = P
i n ij P
j n ij / P P n ij . In general, the Cressie-Read statistic (P 2 with r = 2/3), is known to be less sensitive to the small amount of data in the cell than χ 2 .
There is a well known literature on Bayesian methods for analyzing data with contingency tables. Agresti and Hitchcock (2005) studied Bayesian methods for categorical data analysis, with focus on contingency table. Recently hierarchical Bayesian models in the contingency tables from small areas have been studied in Woo and Kim (2015, 2016).
In this paper, we construct Bayesian nonparametric test of homogeneity using a hierar- chical multinomial model with Dirichlet process priors in small area setup. In Section 2, we establish a nonparametric Bayesian hierarchical model with Dirichlet process priors in small areas and its computational procedure based on the slice sampler. Then the Bayes factor is calculated to perform the test of homogeneity. Section 3, we show the results of numerical studies with the comparable frequentist methods. Finally, we provide concluding remarks in Section 4.
2. Nonparametric Bayesian test of homogeneity
We consider the l × c contingency tables with cell count n ij and cell probability π ij (i = 1, 2, ...l, j = 1, 2, ..c). We denote n i = (n i1 , n i2 , ..., n ic ) 0 and π i = (π i1 , π i2 , ..., π ic ) 0 . Here n i+ = P c
j=1 n ij is sum of count in ith row, n +j = P l
i=1 n ij is sum of count in jth column, and n = P l
i=1 n i+ is total sum of cell count.
Table 2.1 l × c contingency tables
Areas 1 2 · · · c
1 n
11, π
11n
12, π
12· · · n
1c, π
1c2 n
21, π
21n
22, π
22· · · n
2c, π
2c. . .
. . .
. . .
. . .
. . . l n
l1, π
l1n
l2, π
l2· · · n
lc, π
lcn ij : cell count, π ij : cell probability
We consider the test of homogeneity such that
H 0 : π 1 = π 2 = ... = π l = π vs H 1 : not H 0 (2.1)
And we consider the Dirichlet process prior for probability parameters of multinomial dis-
tribution about the contingency table.
Under H 1 , our hierarchical Bayesian model is n i |π i
ind ∼ Multinominal(n i+ , π i ), i = 1, 2, ..., l, π i |G iid ∼ G, i = 1, 2, ..., l,
G ∼ DP(α, G 0 ), (2.2)
where α > 0 is a concentration parameter and G 0 is the base distribution. We assume the hyperprior π(α) = 1/(1 + α) 2 , α > 0 and take G 0 ≡ Dirichlet(µ). Note that π i |µ ∼ Dirichlet(µ) has the density f (π i |µ) = Q c
j=1 π ij µ
j−1 /D(µ), 0 < π ij < 1, P c
j=1 π ij = 1 where D(µ) = Q c
j=1 Γ(µ j )/Γ( P c
j=1 µ j ). And we have f 1 (n i |π i ) = n i+ !
Q c j=1 n ij !
Q c
j=1 π n ij
ij(2.3)
where π i = (π i1 , ..., π ic ) 0 .
To apply the Dirichlet process prior, we use the efficient version of the slice sampler proposed by Kalli et al. (2011). Then the joint posterior distribution under H 1 is
P 1 (n, d, u|ν, π) = Q l
i=1 I (u
i<ξ
di)
w d
iξ d
in i+ !f 1 (n i |π d
i)
= Q l
i=1 I (u
i<ξ
di)
w d
iξ d
in i+ ! Q c j=1
π d n
iji
j
n ij ! . (2.4) Notice that d = (d 1 , ..., d l ) is the index of cluster, u = (u 1 , ..., u l ) is the latent variable for cluster, ν = (ν 1 , ..., ν R ) is the variable for stick breaking prior (d i 5 R ), and ξ r = (1 − k)k r − 1. Here ν r |α iid ∼ Beta(1, α), r = 1, 2, ..., R. Then the joint density function for all variables under H 1 is
P 1 (n, d, u, ν, π, α) = p 1 (n, d, u|ν, π)π 0 (π)π(ν|α)π(α) (2.5)
=
l
Y
i=1
I (u
i<ξ
di)
w d
iξ d
in i !
c
Y
j=1
π d n
iji