Bayesian inference for nonlinear functions of means in normal distributions
Woo Dong Lee 1 · Dal Ho Kim 2 · Sang Gil Kang 3
1 Pre-major of Cosmetics and Pharmaceutics, Daegu Haany University
2 Department of Statistics, Kyungpook National University
3 Department of Computer and Data Information, Sangji University
Received 13 February 2019, revised 22 February 2019, accepted 28 February 2019
Abstract
We consider the objective Bayesian inference for the product of different powers of two means in the normal distributions. In order to perform the objective Bayesian inference, the noninformative priors are essential. In this paper, we develop the match- ing prior as the noninformative priors for the product of different powers of two means in the normal distributions. We develop the first order matching priors using the or- thogonal parameterization. Then we reveal that Jeffreys’ prior and the matching prior have the different forms. Also we prove the propriety of posteriors under general priors based on the developed priors. We investigate that the matching prior provides better performance than Jeffreys’ prior in view points of coverage probability from numerical studies.
Keywords: Matching prior, normal distribution, product of powers of two means.
1. Introduction
Suppose that X i1 , · · · , X in
iare independent normal random variables with mean µ i and variance σ i 2 for i = 1, 2. The parameter of interest is θ 1 = µ a 1
1µ a 2
2, the product of different powers of two means with a 1 and a 2 are integers. We know that the parameter θ 1 contains a well-known function of means, such as the product of means and the ratio of means, etc.
The inference for the product of means has been initiated by Southwood (1978) and studied by Yfantis and Flatman (1991). In Bayesian works, Berger and Bernardo (1989) developed the reference prior for product of two normal means under common known variance. For several normal populations with positive means and common known variance, the two group reference prior derived by Sun and Ye (1995). Sun and Ye (1999) developed the two group reference prior in several normal distributions with positive means and unknown variances
1
Professor, Pre-major of Cosmetics and Pharmaceutics, Daegu Haany University, Kyungsan 38610, Korea.
2
Professor, Department of Statistics, Kyungpook National University, Daegu 41566, Korea.
3
Corresponding author: Professor, Department of Computer and Data Information, Sangji University,
Wonju 26339, Korea. E-mail: [email protected]
as a general situation. Also Kim (2006) and Raubenheimer (2012) studied the development of the probability matching prior and the reference prior for the nonlinear product of several Poisson rates, respectively. For the nonlinear product of several exponential means, Kang et al. (2015) derived the reference prior and the probability matching prior.
Inference about the ratio of means involves the Fieller-Creasy problem (Fieller, 1954;
Creasy, 1954). Fieller (1954) proposed a confidence interval for the ratio of means based on the pivot. However, this procedure has a limitation, in that a confidence interval for the ratio of means may not exist for a given set of data. For Bayesian work of the Fieller-Creasy problem, Kappenman et al. (1970) analyzed using noninformative priors, and this problem was later studied by Bernardo (1977), Stephens and Smith (1992), Liseo (1993), Phillipe and Robert (1994) and Berger et al. (1999). In particular, Yin and Ghosh (2001) showed that the likelihood based inference has severe limitations in the generalized Fieller-Creasy problem. But the Bayesian analysis based on the noninformative priors usually provided ideal solution. Lee, Kim and Kang (2007) developed the noninformative priors for the Fieller- Creasy problem using unbalanced data.
In this study, we are interested in the development of noninformative priors for the prod- uct of different powers of means in two normal distributions. There are two concepts for developing noninformative priors. The first is a probability matching prior suggested by Welch and Peers (1963), and more recently studied by Stein (1985) and Tibshirani (1989), Mukerjee and Dey (1993), DiCiccio and Stern (1994), Datta and Ghosh (1995, 1996), Muk- erjee and Ghosh (1997). The basic idea is to find a one-sided Bayesian credible interval of which the coverage probability matches asymptotically the frequentist coverage probability of the credible interval. The second is to use a suitable divergence measure between the prior and the posterior, and select a prior which maximizes this divergence. This idea was studied in several articles of Bernardo (1979), Berger and Bernardo (1989, 1992) and Ghosh and Mukerjee (1992). This prior is known as reference priors and are based on maximizing the Kullback-Leibler divergence between the prior and the posterior. This prior showed a good performance in many statistical problems (Lee et al. 2016; Ko et al. 2018).
The remaining studies organized as follows. We derive the first order matching priors for the product of different powers of two normal means using the orthogonal parameterization in Section 2. Next we reveal that Jeffreys’ prior and the first order matching prior are different. In Section 3, we find the condition of propriety in the posterior distribution under a general prior based on Jeffreys’ prior and the first order matching prior. The frequentist coverage probabilities in posterior distributions based on the derived priors are calculated in Section 4. An example with real data is illustrated.
2. The probability matching prior
Consider that X i1 , · · · , X in
iare independent normal random variables with mean µ i and variance σ 2 i for i = 1, 2. Then we want to develop the noninformative priors for the product of different powers of two means, θ 1 = µ a 1
1µ a 2
2.
Let θ = (θ 1 , · · · , θ t ) T denote the parameter vector. Suppose that the parameter θ 1 is
of interest and (θ 2 , · · · , θ t ) is nuisance parameter vector. Also let θ 1 1−α (π; X) denote the
(1 − α)th posterior quantile of θ 1 based on the prior π. We consider priors π for which the
relation
P [θ 1 ≤ θ 1−α 1 (π; X)|θ] = 1 − α + o(n −
r2), r > 0, (2.1) holds for r = 1 and r = 2. Then priors satisfying (2.1) for r = 1 and r = 2 are called a first order matching prior and a second order matching prior, respectively.
We want to develop the matching priors for θ 1 . Let
θ 1 = µ a 1
1µ a 2
2, θ 2 = a 2 n 1 µ 2 1 σ 2 2 − a 1 n 2 µ 2 2 σ 2 1 , θ 3 = σ 1 and θ 4 = σ 2 , (2.2) where a 1 and a 2 are a integers . Then we calculate the Jacobian matrix of the transformation (2.2) and then the Jacobian matrix is given by
∂(θ 1 , θ 2 , θ 3 , θ 4 )
∂(µ 1 , µ 2 , σ 1 , σ 2 ) =
a 1 µ a 1
1−1 µ a 2
2a 2 µ a 1
1µ a 2
2−1 0 0 2a 2 n 1 µ 1 σ 2 2 −2a 1 n 2 µ 2 σ 1 2 −2a 1 n 2 µ 2 2 σ 1 2a 2 n 1 µ 2 1 σ 2
0 0 1 0
0 0 0 1
. (2.3)
And its determinant is |2µ (1−a 1
1) µ (1−a 2
2) |(a 2 2 n 1 µ 2 1 σ 2 2 + a 2 1 n 2 µ 2 2 σ 1 2 ). Also the Fisher information matrix for (µ 1 , µ 2 , σ 1 , σ 2 ) is
I(µ 1 , µ 2 , σ 1 , σ 2 ) =
n
1σ
120 0 0
0 n σ
222
0 0
0 0 2n σ
21 10 0 0 0 2n σ
222
. (2.4)
Thus, the inverse matrix of the expected Fisher information from (2.3) and (2.4) is given as follows.
I −1 (θ 1 , θ 2 , θ 3 , θ 4 )
= ∂(θ 1 , θ 2 , θ 3 , θ 4 )
∂(µ 1 , µ 2 , σ 1 , σ 2 )
I −1 (µ 1 , µ 2 , σ 1 , σ 2 ) ∂(θ 1 , θ 2 , θ 3 , θ 4 )
∂(µ 1 , µ 2 , σ 1 , σ 2 )
t
=
a
22n
1µ
21σ
22+a
21n
2µ
22σ
12n
1n
2µ
2(1−a1)1µ
2(1−a2)20 0 0
0 I 22 −a
1n n
2µ
22σ
131
a
2n
1µ
21σ
32n
20 −a
1n n
2µ
22σ
311
σ
122n
10
0 a
2n
1n µ
21σ
322
0 2n σ
222
, (2.5)
where
I 22 = 2a 2 2 n 2 1 µ 2 1 σ 2 4 (2n 2 σ 2 1 + n 1 µ 2 1 ) + 2a 2 1 n 2 2 µ 2 2 σ 4 1 (2n 1 σ 2 2 + n 2 µ 2 2 ) n 1 n 2
.
Therefore, the Fisher information matrix from (2.5) is computed and is given by
I(θ 1 , θ 2 , θ 3 , θ 4 ) =
n
1n
2µ
2(1−a1)1µ
2(1−a2)2a
22n
1µ
21σ
22+a
21n
2µ
22σ
210 0 0 0 I 22 I 23 I 24 0 I 32 I 33 I 34 0 I 42 I 43 I 44
, (2.6)
where
I 22 = 1
4a 2 2 n 1 µ 2 1 σ 2 1 σ 2 4 + 4a 2 1 n 2 µ 2 2 σ 4 1 σ 2 2 , I 23 = I 32 = a 1 n 2 µ 2 2
2a 2 2 n 1 µ 2 1 σ 1 σ 4 2 + 2a 2 1 n 2 µ 2 2 σ 1 3 σ 2 2 , I 24 = I 42 = −a 2 n 1 µ 2 1
2a 2 2 n 1 µ 2 1 σ 2 1 σ 3 2 + 2a 2 1 n 2 µ 2 2 σ 1 4 σ 2
, I 33 = 2a 2 2 n 2 1 µ 2 1 σ 4 2 + a 2 1 n 2 µ 2 2 σ 2 1 (2n 1 σ 2 2 + n 2 µ 2 2 ) a 2 2 n 1 µ 2 1 σ 1 2 σ 2 4 + a 2 1 n 2 µ 2 2 σ 1 4 σ 2 2 , I 34 = I 43 = −a 1 a 2 n 1 n 2 µ 2 1 µ 2 2
a 2 2 n 1 µ 2 1 σ 1 σ 3 2 + a 2 1 n 2 µ 2 2 σ 3 1 σ 2 , I 44 = 2a 2 1 n 2 2 µ 2 2 σ 1 4 + a 2 2 n 1 µ 2 1 σ 2 2 (2n 2 σ 1 2 + n 1 µ 2 1 ) a 2 2 n 1 µ 2 1 σ 1 2 σ 4 2 + a 2 1 n 2 µ 2 2 σ 4 1 σ 2 2 . Thus, θ 1 and (θ 2 , θ 3 , θ 4 ) from (2.6) are orthogonal (Cox and Reid, 1987). Therefore un- der the orthogonal parametrization, the first order probability matching priors by Tibshi- rani(1989) is of the form
π m (θ 1 , θ 2 , θ 3 , θ 4 ) ∝ µ 2(1−a 1
1) µ 2(1−a 2
2) a 2 2 n 1 µ 2 1 σ 2 2 + a 2 1 n 2 µ 2 2 σ 1 2
!
12g(θ 2 , θ 3 , θ 4 ), (2.7)
where g(θ 2 , θ 3 , θ 4 ) > 0 is any smooth function of θ 2 , θ 3 and θ 4 .
Next Jeffreys’ prior for the (θ 1 , θ 2 , θ 3 , θ 4 ) obtained by the determination of the Fisher information (2.6) and is given by
π J (θ 1 , θ 2 , θ 3 , θ 4 ) ∝
"
µ 2(1−a 1
1) µ 2(1−a 2
2) σ 4 1 σ 4 2 (a 2 2 n 1 µ 2 1 σ 2 2 + a 2 1 n 2 µ 2 2 σ 1 2 ) 2
#
12. (2.8)
Remark 2.1 We note that many priors in the matching priors (2.7) are given according to the selection of the function g. However according to the function g, the coverage probabilities does not actually seem to be improved. Thus we choose the function θ −2 3 θ 4 −2 , and then the matching prior becomes the simple form from this choice. Then the matching prior is of the form
π m (θ 1 , θ 2 , θ 3 , θ 4 ) ∝ µ 2(1−a 1
1) µ 2(1−a 2
2) a 2 2 n 1 µ 2 1 σ 2 2 + a 2 1 n 2 µ 2 2 σ 2 1
!
12θ −2 3 θ −2 4 . (2.9)
Remark 2.2 In the original parametrization (µ 1 , µ 2 , σ 1 , σ 2 ), the matching prior (2.9) is characterized by
π m (µ 1 , µ 2 , σ 1 , σ 2 ) ∝ σ −2 1 σ −2 2 (a 2 2 n 1 µ 2 1 σ −2 1 + a 2 1 n 2 µ 2 2 σ 2 −2 )
12. (2.10)
Also with the original parametrization, Jeffreys’ prior (2.8) is of the form
π J (µ 1 , µ 2 , σ 1 , σ 2 ) ∝ σ −2 1 σ 2 −2 . (2.11) Remark 2.3 We know that Jeffreys’ prior (2.11) is not a first order matching prior (2.10).
3. Property of the posterior distribution
We investigate the proper conditions of the posterior distribution under a general class of priors including Jeffreys’ prior and the matching priors. We consider the class of priors
π m (µ 1 , µ 2 , σ 1 , σ 2 ) ∝ σ 1 −a σ 2 −b (a 2 2 n 1 µ 2 1 σ 1 −2 + a 2 1 n 2 µ 2 2 σ −2 2 ) c , (3.1) where a > 0, b > 0 and c ≥ 0. The following general theorem can obtained.
Theorem 1. The proper condition of the posterior distribution with the general prior (3.1) is n 1 + a − 2 > 0 and n 2 + b − 2 > 0.
Proof. For a given prior (3.1), the joint posterior distribution of µ 1 , µ 2 , σ 1 and σ 2 is of the form
π(µ 1 , µ 2 , σ 1 , σ 2 |x) ∝ σ −n 1
1−a σ −n 2
2−b (a 2 2 n 1 µ 2 1 σ −2 1 + a 2 1 n 2 µ 2 2 σ 2 −2 ) c (3.2)
× exp
− 1
2σ 1 2 s 2 1 + n 1 (¯ x 1 − µ 1 ) 2 ) − 1
2σ 2 2 s 2 2 + n 2 (¯ x 2 − µ 2 ) 2 )
,
where s 2 i = P n
ij=1 (x ij − ¯ x i ) 2 and ¯ x i = P n
ij=1 x ij /n i . Now we know that
(a 2 2 n 1 µ 2 1 σ 1 −2 + a 2 1 n 2 µ 2 2 σ 2 −2 ) c exp
− (¯ x 1 − µ 1 ) 2
2σ 1 2 − (¯ x 2 − µ 2 ) 2 2σ 2 2
< ∞. (3.3) Thus if n 1 > 1 and n 2 > 1 then we obtain the following equation.
π(µ 1 , µ 2 , σ 1 , σ 2 |x) ∝ σ 1 −n
1−a σ −n 2
2−b exp
− 1
2σ 2 1 s 2 1 + (n 1 − 1)(¯ x 1 − µ 1 ) 2 )
× − 1
2σ 2 2 s 2 2 + (n 2 − 1)(¯ x 2 − µ 2 ) 2 )
. (3.4)
Integrating with respect to µ 1 and µ 2 , then
π(σ 1 , σ 2 |x) ∝ σ 1 −n
1−a+1 σ −n 2
2−b+1 exp
− s 2 1 2σ 2 1 − s 2 2
2σ 2 2
. (3.5)
Thus
Z ∞ 0
Z ∞ 0
π(σ 1 , σ 2 |x)dσ 1 dσ 2
∝ Z ∞
0
Z ∞ 0
σ 1 −n
1−a+1 σ −n 2
2−b+1 exp
− s 2 1 2σ 2 1 − s 2 2
2σ 2 2
dσ 1 dσ 2 < ∞, (3.6) if n 1 + a − 2 > 0 and n 2 + b − 2 > 0.
In Section 4, we compare the matching prior π m and Jeffreys’ prior π J in respect of frequentist coverage probabilities. Since the marginal posterior distribution of θ 1 has not the closed form, we use Markov chain Monte Carlo method to compute the coverage probabilities and the marginal moments.
4. Numerical studies
We calculate the frequentist coverage probability by investigating the credible interval of the marginal posteriors density for the product of different powers of two normal means, that is θ 1 = µ a 1
1µ a 2
2, under the noninformative prior π given in (3.1) under several configurations (µ 1 , σ 1 ), (µ 2 , σ 2 ) and (n 1 , n 2 ). Therefore we want to investigate whether the frequentist coverage of a (1 − α)th posterior quantile matches 1 − α. Since the closed form of posterior does not exist, the posterior quantiles are computed via the Markov Chain Monte Carlo method. The implementation details for Markov Chain Monte Carlo method are given as follows.
We can easily computed the conditional posterior distributions under Jeffreys’ prior π J . However the conditional posterior distributions of µ 1 , µ 2 , σ 1 and σ 2 with the matching prior π m do not exist in the closed forms, and so the detailed conditional posterior distributions are given as follows.
The joint posterior of µ 1 , µ 2 , σ 1 , σ 2 given x is
π(µ 1 , µ 2 , σ 1 , σ 2 |x) ∝ σ −n 1
1−a σ −n 2
2−b (a 2 2 n 1 µ 2 1 σ −2 1 + a 2 1 n 2 µ 2 2 σ 2 −2 ) c
× exp
− 1
2σ 1 2 s 2 1 + n 1 (¯ x 1 − µ 1 ) 2 ) − 1
2σ 2 2 s 2 2 + n 2 (¯ x 2 − µ 2 ) 2 )
,
where s 2 i = P n
ij=1 (x ij − ¯ x i ) 2 and ¯ x i = P n
ij=1 x ij /n i . This leads to the full conditionals as
follows.
(µ 1 |µ 2 , σ 1 , σ 2 , x) ∝ exp
− 1
2σ 1 2 (n 1 − 1)(¯ x 1 − µ 1 ) 2 )
× (a 2 2 n 1 µ 2 1 σ −2 1 + a 2 1 n 2 µ 2 2 σ 2 −2 ) c exp
− (¯ x 1 − µ 1 ) 2 2σ 1 2
, (4.1)
(µ 2 |µ 1 , σ 1 , σ 2 , x) ∝ exp
− 1
2σ 2 2 (n 2 − 1)(¯ x 2 − µ 2 ) 2 )
× (a 2 2 n 1 µ 2 1 σ −2 1 + a 2 1 n 2 µ 2 2 σ 2 −2 ) c exp
− (¯ x 2 − µ 2 ) 2 2σ 2 2
, (4.2)
(σ 1 |µ 1 , µ 2 , σ 2 , x) ∝ σ 1 −n
1−a exp
− s 2 1 2σ 1 2
× (a 2 2 n 1 µ 2 1 σ −2 1 + a 2 1 n 2 µ 2 2 σ 2 −2 ) c exp
− n 1 (¯ x 1 − µ 1 ) 2 2σ 1 2
, (4.3) (σ 2 |µ 1 , µ 2 , σ 1 , x) ∝ σ 2 −n
2−b exp
− s 2 2 2σ 2 2
× (a 2 2 n 1 µ 2 1 σ −2 1 + a 2 1 n 2 µ 2 2 σ 2 −2 ) c exp
− n 2 (¯ x 2 − µ 2 ) 2 2σ 2 2
. (4.4) Under the matching prior, the above conditionals of µ 1 , µ 2 , σ 1 and σ 2 have not standard distribution. Therefore for sampling from these marginal densities, we adapt the Metropolis- Hasting algorithm according to Chib and Greenberg (1995).
In each case, we consider 5 independent chains with a sample of size 22,000 discarding the first 20,000. Thus from the generated samples, we compute the θ 1 = µ a 1
1µ a 2
2each time, and find numerically the 5% and 95% posterior quantiles of µ a 1
1µ a 2
2. We repeat the entire process 10, 000 times, and we calculate the proportion of times the true µ a 1
1µ a 2
2belong to this interval. That is the estimated frequentist coverage probability of the Bayesian credible interval. For 0.05 and 0.95 posterior quantiles under the developed priors, the numerical values of the frequentist coverage probabilities are given in Tables 4.1, 4.2 and 4.3.
From result of Tables 4.1, 4.2 and 4.3, we can think that the matching prior has a good coverage probabilities than Jeffreys’ prior. The matching prior for small sample size gives good coverage results. Also for the change of the values of (µ 1 , µ 2 , σ 1 , σ 2 ), the matching prior is not sensitive. Thus, the use of the matching prior is recommended in Bayesian analysis.
Example 4.1 We consider a bioequivalence work of two formulations of a drug product provided in Jiang and Sarkar (1998). In this study, a standard 2 × 2 crossover experiment was conducted with 25 subjects for comparison of a new test formulation and a reference formulation. The primary endpoints of interest were the estimates of the pharmacokinetic parameters specifying the bioavailability of each formulation, such as the area under the plasma concentration-time curve (AUC) and the maximum plasma concentration (Cmax).
The current practice of assessing the average bioequivalence of two formulations involves the construction of the 90% confidence intervals for the ratio of the population means of the test and reference formulations for both AUC and Cmax.
For the Cmax data set, the sample means are ¯ x 1 = 32.78 (n 1 = 12) and ¯ x 2 = 28.70 (n 2 = 13), and the sample variances are s 2 x
1
= 72.24 and s 2 x
2