EXPONENTIAL FAMILIES RELATED TO CHERNOFF-TYPE INEQUALITIES
G. R. Mohtashami Borzadaran
Abstract. In this paper, the characterization results related to Chernoff-type inequalities are applied for exponential-type (contin- uous and discrete) families. Upper variance bound is obtained here with a slightly different technique used in Alharbi and Shanbhag [1]
and Mohtashami Borzadaran and Shanbhag [8]. Some results are shown with assuming measures such as non-atomic measure, atomic measure, Lebesgue measure and counting measure as special cases of Lebesgue-Stieltjes measure. Characterization results on power series distributions via Chernoff-type inequalities are corollaries to our results.
1. Introduction
Characterization results related to variance bound established by Chernoff [5] and Cacoullos and Papathanasiou [2, 3, 4]. Goldstein and Reinert [6] and Papadatos and Papathanasiou [9] found variance bounds via Stein identity. Alharbi and Shanbhag [1] and Mohtashami Borzadaran and Shanbhag [8] published an extended version of Chernoff inequality. Hudson [7] and Prakasa Rao [11] characterized the exponen- tial families. Also, Papathanasiou [7] characterized the power series distribution via the Chernoff-type inequality. We characterize the gen- eralized exponential-type family by using a version of the Chernoff in- equality. The following theorem and corrolaries lead us to versions of the exponential families:
Theorem 1.1. With w(x) > 0 for almost all [ν F∗]x ∈ ℜ, F is a distribution function that absolutely continuous with respect to ν F∗and
and
Received July 12, 2000. Revised January 3, 2002.
2000 Mathematics Subject Classification: Primary 62E10; Secondary 60E05, 60E15.
Key words and phrases: Chernoff-type inequalities, covariance identities, charac-
terization, upper variance bound, variance inequality, exponential families.
there exists a point x 0 such that t(x) > θ for almost all [ν F∗]x ∈ ℜ, lying in (x 0 , ∞) and t(x) < θ for almost all [ν F∗]x ∈ ℜ, lying in (−∞, x 0 ) with E(t(X)) = θ and t(x 0 ) ≥ θ, where X is a random variable with a distribution function F. Then F satisfies (10) if and only if for some c ∈ (0, ∞),
]x ∈ ℜ, lying in (−∞, x 0 ) with E(t(X)) = θ and t(x 0 ) ≥ θ, where X is a random variable with a distribution function F. Then F satisfies (10) if and only if for some c ∈ (0, ∞),
dF (x) (1)
=
c{ w(x) 1 { Q
x
r∈D
x(1)(1 − m (1) {x r })}e −Hc(1)(x) }dν F
∗(x) if x ≥ x 0 c{ w∗1 (x) { Q
1 (x) { Q
x
r∈D
(2)x(1 − m (2) {x r })}e −Hc(2)(x) }dν F
∗(x) if x < x 0 , where
m (1) (•)=
Z
[x
0,∞) T •
(t(y)−θ)(w(y)) −1 dν F∗(y), H c (1) (x) = m (1) c ((−∞, x]), and
m (2) (•)=
Z
(−∞,x
0) T •
(θ−t(y))(w ∗ (y)) −1 dν F∗(y), H c (2) (x) = m (2) c ([x, ∞))
with 0 < w ∗ (x) = w(x) + (θ − t(x))ν F∗ {x}, m (1) c and m (2) c as contin- uous parts of m (1) and m (2) respectively, and D (1) x and D x (2) as the sets of discontinuity points of m (1) that lie in (−∞, x) and of m (2) that lie in (x, ∞), respectively.
Corollary 1.2. If we have the assumptions in Theorem 1.1 met with F ∗ continuous, then the conclusion of the theorem holds with the following in place of (2):
(2) dF θ (x) = c(θ)
w(x) exp{−
Z
(x
0,x]
t(y) − θ
w(y) dν F∗(y)}dν F∗(x), where x 0 is as the statement of Theorem 1.1.
(x), where x 0 is as the statement of Theorem 1.1.
Corollary 1.3. If we have the assumptions in Theorem 1.1 met
with F ∗ continuous, then for any distribution F θ (that is, absolutely
continuous with respect to ν F∗) the conclusion of the theorem holds on
taking, in place of (2), that F θ is concentrated on D and it satisfies the
following:
dF θ (x) (3)
= c(θ)
w(x) exp{−
Z
(x
0,x]
t(y)
w(y) dν F∗(y)} exp{θ Z
(x
0,x]
1
w(y) dν F∗(y)}dν F∗(x)
(x)
= c(θ)k 1 (x) exp{θ Z
(x
0,x]
1
w(y) dν F∗(y)}dν F∗(x), x ∈ D, where
(x), x ∈ D, where
(4) k 1 (x) = 1
w(x) exp{−
Z
(x
0,x]
t(y)
w(y) dν F∗(y)}.
Remark 1.4. In Corollary 1.3, F ∗ (x) = x, x ∈ ℜ, implies that we have
dF θ (x) = c(θ)
w(x) exp{−
Z
(x
0,x]
t(y)
w(y) dy} exp{θ Z
(x
0,x]
1
w(y) dy}dx
= c(θ)k 1 (x) exp{θ Z
(x
0,x]
1
w(y) dy}dx, (5)
in place of (4) and
(6) k 1 (x) = 1
w(x) exp{−
Z
(x
0,x]
t(y) w(y) dy}, in place of (4).
Let us now define an exponential family that is a special case of (4). Let X be a random variable with a distribution function F θ that is absolutely continuous with respect to ν F∗ with density f θ (x) of the form:
(7) f θ (x) = c(θ)k(x)e θµ∗(x) , x ∈ ℜ, θ ∈ ℜ, where 0 < k(x) = exp{− R
(x
0,x] t(y)dν F
∗(y)}, and µ ∗ (x) = F ∗ (x)−F ∗ (a) and E θ [t(X)] = θ. If w(x) ≡ 1, then we have the family (7) in place of (4). In (7) if we take, F ∗ (x) = x, x ∈ ℜ, then (7) simplifies to
(8) f θ (x) ∝ k(x)e θx , x ∈ ℜ, θ ∈ ℜ.
2. Exponential families via Chernoff-type inequalities We characterize the generalized exponential-type family introduced in (4) by using a version of the Chernoff inequality using Alharbi and Shanbhag [1] and Mohtashami Borzadaran and Shanbhag [8].
Lemma 2.1. Let F ∗ be a non-constant Lebesgue-Stieltjes measure function on ℜ and ν F∗ be the measure on the Borel σ−field of ℜ deter- mined by it. Let X with a distribution function F be a random variable such that E{t(X)} = θ, where t(.) is increasing and let g ′ be as defined before. Then, for almost all [F ] a ∈ ℜ,
Z
ℜ
(g ′ (y)) 2 { Z
[y,∞)
[t(x) − θ]dF (x)}dν F∗(y)
= Z
ℜ
[t(x) − θ]
Z
(a,x]
(g ′ (y)) 2 dν F∗(y)dF (x), provided the left hand side of the identity is finite.
Proof. For details of the proof of Lemma see Alharbi and Shanbhag [1] and Mohtashami Borzadaran and Shanbhag [8].
Theorem 2.2. Let F ∗ be a non-constant Lebesgue-Stieltjes measure function on ℜ and ν F∗ be the measure on the Borel σ−field of ℜ de- termined by it. Let X be a random variable and t be a function that is absolutely continuous with respect to ν F∗, such that E{t(X)} = θ and E {t(X)} 2 < ∞. Further, let w be a Borel measurable function such that w(X) > 0 a.s. and V ar{t(X)} = E{t ′ (X)w(X)}, where t ′ is the Radon-Nikodym derivative of t with respect to ν F∗. Assume that t ′ > 0 and let τ be the class of real-valued absolutely continuous func- tions g with Radon-Nikodym derivative g ′ with respect to the measure ν F∗ satisfying E {g(X)} 2 < ∞ and 0 < E{w(X) [g t′′(X)] (X)
2} < ∞. Then
, such that E{t(X)} = θ and E {t(X)} 2 < ∞. Further, let w be a Borel measurable function such that w(X) > 0 a.s. and V ar{t(X)} = E{t ′ (X)w(X)}, where t ′ is the Radon-Nikodym derivative of t with respect to ν F∗. Assume that t ′ > 0 and let τ be the class of real-valued absolutely continuous func- tions g with Radon-Nikodym derivative g ′ with respect to the measure ν F∗ satisfying E {g(X)} 2 < ∞ and 0 < E{w(X) [g t′′(X)] (X)
2} < ∞. Then
satisfying E {g(X)} 2 < ∞ and 0 < E{w(X) [g t′′(X)] (X)
2} < ∞. Then
(9) sup
g∈τ
V ar[g(X)]
E{w(X) [g t′′(X)] (X)
2} = 1 if and only if the distribution function of X satisfies (10)
Z
[x,∞)
(t(y) − θ)dF (y)dν F∗(x) = w(x)dF (x), x ∈ ℜ.
Proof. Note that if (9) holds with g = t, we have V ar[g(X)]
E{w(X) [g t′′(X)] (X)
2} = 1,
and hence to have the “ if ” part of the theorem, it is sufficient if we show that
(11)
Z
ℜ
w(y) [g ′ (y)] 2
t ′ (y) dF (y) ≥ V ar{g(X)}, g ∈ τ.
By using Lemma 2.1, and assuming that X and X ∗ are independent random variables with the distribution function F, we have for almost all [F ] a ∈ ℜ,
V ar{g(X)} = 1 2 E{
Z
(X
∗,X]
g ′ (y)dν F∗(y) 2
}
= 1 2 E{
Z
(X
∗,X]
p t ′ (y) g ′ (y)
pt ′ (y) dν F∗(y) 2
}
≤ 1 2 E{
Z
(X
∗,X]
t ′ (y)dν F∗(y) Z
(X
∗,X]
(g ′ (y)) 2
t ′ (y) dν F∗(y)}
= 1
2 E{ t(X) − t(X ∗ ) Z
(X
∗,X]
(g ′ (y)) 2
t ′ (y) dν F∗(y)}
(12)
= E{ t(X) − θ Z
(a,X]
(g ′ (y)) 2
t ′ (y) dν F∗(y)}
= Z
ℜ
t(x) − θ Z
(a,x]
(g ′ (y)) 2
t ′ (y) dν F∗(y)dF (x)
= Z
ℜ
(g ′ (y)) 2 t ′ (y) {
Z
[y,∞)
(t(x) − θ)dF (x)}dν F∗(y)
= Z
ℜ
(g ′ (y)) 2
t ′ (y) w(y)dF (y)
= E{w(X) [g ′ (X)] 2 t ′ (X) }.
(Note that here and in what follows, we take R
(a,b] = − R
(b,a] if a > b.)
To prove the “ only if ” part, we use an approach somewhat different to
that used in Alharbi and Shanbhag [1]. For every real γ and u, let g u be
such that g u ′ (x) = γ cos(ux) + t ′ (x) be the Radon-Nikodym derivative of
it with respect to ν F∗. Then (11) gives that V ar{g u (X)} = V ar{g u (X) − g u (a)}
= V ar{
Z
(a,X]
(t ′ (x) + γ cos(ux))dν F∗(x)}
= V ar{t(X)} + γ 2 V ar{
Z
(a,X]
(cos(ux))dν F∗(x)}
(13)
+2γCov{t(X), Z
(a, X]
cos(ux)dν F∗(x)}
≤ E{ w(X)
t ′ (X) [t ′ (X) + γ cos(uX)] 2 }.
It follows from (14), in view of the assumption E t ′ (X)w(X) = V ar t(X), that
γ 2 {V ar{
Z
(a, X]
cos(ux)dν F∗(x)} − E{ w(X)
t ′ (X) cos 2 (uX)}}
(14)
+2γ[Cov{t(X), Z
(a, X]
cos(ux)dν F∗(x)} − E w(X) cos(uX)] ≤ 0.
In view of (15), we get that E{(t(X) − θ)
Z
(a, X]
cos(ut)dν F∗(t)} = E w(X) cos(uX).
This relation implies by Fubini’s theorem that Z
ℜ
cos(ux)w(x)dF (x) = Z
ℜ
cos(uy){
Z
[y,∞)
t(x) − θdF (x)}dν F∗(y).
The relation above also holds if cos(ux) is replaced by sin(ux) and hence when cos(ux) is replaced by exp{iux}; then from the uniqueness theorem of the Fourier transforms the required result follows.
Remark 2.3. We can establish the “ if ” part of Theorem 2.2 via a slightly different way the same as mentioned in previous chapter.
The following theorems are essential versions of theorems related to
covariance identities.
Theorem 2.4. Let F ∗ be a non-constant Lebesgue-Stieltjes mea- sure function on ℜ and ν F∗ be the measure on the Borel σ− field of ℜ determined by it, and let t and Z be Borel measurable func- tions. Let X be a random variable with a distribution function F θ such that t(X) is integrable with θ = E θ [t(X)] and E θ (|Z(X)|I {X∈(a,b)} ) <
∞ for every −∞ < a < b < ∞ and satisfying the condition that lim inf x→∞ (t(x) − θ) > 0 if the right extremity of F θ equals ∞, and the condition that lim inf x→−∞ (θ − t(x)) > 0 if the left extremity of F θ equals −∞. Further let τ be the class of real-valued absolutely contin- uous functions g with Radon-Nikodym derivative g ′ with respect to the measure ν F∗ (i.e. such that g(b) − g(a) = R
(a,b] g ′ (x)dν F
∗(x) for all a and b with a < b). Then, we have the condition
(15) Cov θ {g(X), t(X)} = E θ {Z(X)g ′ (X)}, met for all g ∈ τ with E θ | Z(X)g ′ (X) | < ∞, if and only if (16) Z(x)dF θ (x) = {
Z
[x,∞)
[t(z) − θ]dF θ (z)}dν F∗(x), x ∈ ℜ.
Theorem 2.5. Let X, g, τ, Z and t be defined as in Theorem 2.4, but additionally with t(.) absolutely continuous with respect to ν F∗ and t(X) as nondegenerate square integrable satisfying
(17) V ar θ {t(X)} = E θ Z (X)t ′ (X)
(with two sides of the identity well defined and finite). Furthermore, let τ ∗ be the set of g ∈ τ for which g(X) is square integrable and E θ {Z(X)g ′ (X)} is defined and nonzero. Then
(18) inf
g∈τ
∗V ar θ [g(X)]V ar θ [t(X)]
E θ 2 {Z(X)g ′ (X)} = 1 if and only if (16) holds.
Let ν F∗ be a non-atomic measure, then the “ only if ” parts of The-
orems 2.2, 2.4 and 2.5 characterize generalized continuous exponential
family with the form (4). We have the following theorem related to char-
acterization of generalized continuous exponential family for the case
that F ∗ is continuous:
Corollary 2.6. Let X, g, τ, w, and t be defined as in Theorem 2.2. Also, let
V ar θ g(X) ≤ E θ {w(X) [g ′ (X)] 2
t ′ (X) }, for all g ∈ τ,
and equality holds when g is linear in t. Then the distribution of the random variable X is given by (4).
Corollary 2.7. Let X, g, τ, Z, and t be defined as in Theo- rem 2.4 but additionally with t absolutely continuous with respect to non-atomic measure ν F∗, τ ∗ as a subset of g ∈ τ for which g(X) is square integrable with E θ (Z(X)g ′ (X)) 6= 0 and t 2 (X) integrable and V ar θ {t(X)} = E θ Z(X)t ′ (X). Then
(19) V ar θ g(X) ≥ E θ 2 {Z(X)g ′ (X)}
E θ Z (X)t ′ (X) , for all g ∈ τ ∗
if and only if for all g ∈ τ ∗ , the distribution of the random variable X is given by (4). Equality holds when g is linear in t.
• Corollary 2.7 with Z(x) ≡ 1, Corollary 2.6 with w(x) ≡ 1, The- orem 2.4 with Z(x) ≡ 1 and Theorem 2.2 with w(x) ≡ 1, yield characterizations of the exponential family of the form (7).
• Let X, g, τ, Z, and t be defined as in Corollary 2.7 but F ∗ (x) = x, x ∈ ℜ and Z(x) ≡ 1. Then
V ar θ g(X) ≥ E θ 2 {g ′ (X)}
E θ t ′ (X) , for all g ∈ τ ∗ ,
in place of (19), if and only if for all g ∈ τ ∗ , the distribution of the random variable X is given by (8). Equality holds when g is linear in t.
• Let X, g, τ, w, and t be defined as in Theorem 2.2 but F ∗ (x) = x, x ∈ ℜ and w(x) ≡ 1. Also, let
V ar θ {g(X)} ≤ E θ { [g ′ (X)] 2
t ′ (X) }, for all g ∈ τ,
and equality holds when g is linear in t. Then the distribution of
the random variable X is given by (8).
Table 1. Characterization Based on t(x), w(x) and Up- per Bound of V ar θ (g(X)) in Continuous Case
Upper bound for w(x) t(x) Range of Name of
V ar θ (g(X)) a the random variable X Distributions
E θ {[g ′ (X)] 2 } 1 x x ∈ ℜ Normal
E θ { X
22 c
2[g ′ (X)] 2 } x 2c −2 ln x x ∈ (0, ∞) Lognormal
E θ { X c [g ′ (X)] 2 } x cx − 1 x ∈ (0, ∞), c > 0 Gamma
E θ { X
21+c (X−1) [g ′ (X)] 2 } x − 1 1−c(x+1) x x ∈ (0, 1), c > 0 Beta
E θ { 4c 1 [g ′ (X)] 2 } x 2cx 2 − 1 x ∈ (0, ∞), c > 0 Generalized Rayleigh
E θ {e −X [g ′ (X)] 2 } 1 e x x ∈ (0, ∞), c > 0 Standard Log-gamma
a
In this table c is a positive constant.
Based on the Corollary 2.6 when F ∗ (x) = x, x ∈ ℜ, we have the following characterizations via t(x) and upper bounds of the variance of g as seen by Table 1.
Remark 2.8. We can find based on Corollary 2.7 via t and lower
bound of the variance of g, characterizations for some distributions anal-
ogous to those in Table 1.
3. Discrete families via Chernoff-type inequalities
Let ν F∗ be concentrated on a countable set C, such that ν F∗ {x} = β > 0, then we have a discrete version of the Theorems 2.2, 2.4 and 2.5 as follows:
{x} = β > 0, then we have a discrete version of the Theorems 2.2, 2.4 and 2.5 as follows:
Corollary 3.1. Let F ∗ be a non-constant Lebesgue-Stieltjes mea- sure such that ν F∗ is concentrated on a countable set C with each point x of C having ν F∗ {x} = β, β > 0. Also, let X be a random variable and t a strictly increasing function with E θ {t(X)} = θ, and E θ {t(X)} 2 < ∞.
{x} = β, β > 0. Also, let X be a random variable and t a strictly increasing function with E θ {t(X)} = θ, and E θ {t(X)} 2 < ∞.
Further, let w be a function such that w(x) > 0 for each x ∈ C, and V ar θ {t(X)} = E θ { t(X) − t(X−)
β w(X)}.
Further, let τ be the class of real-valued functions g satisfying E θ {g(X)} 2 < ∞
and
0 < E θ {w(X) [g(X) − g(X−)] 2
t(X) − t(X−) } < ∞.
Then
(20) sup
g∈τ
βV ar θ [g(X)]
E θ {w(X) [g(X)−g(X−)]2
t(X)−t(X−) } = 1,
if and only if the probability density function of X satisfies in the following:
(21) w(x)f (x) = β X
{y∈C, y≥x}
(t(y) − θ)f (y), x ∈ C.
Proof. The result follows as a corollary to Theorem 2.2 on taking F ∗ such that ν F∗is concentrated on C such that ν F∗ {x} = β, x ∈ C; note that we have now g ′ (x) = g(x)−g(x−) β , x ∈ C and t ′ (x) = t(x)−t(x−) β , x ∈ C.
{x} = β, x ∈ C; note that we have now g ′ (x) = g(x)−g(x−) β , x ∈ C and t ′ (x) = t(x)−t(x−) β , x ∈ C.
Let X be a random variable with values in B ∗ ; we call a discrete exponential family a shifted scaled discrete exponential family if F ∗ (x) = β[ x−α β ], x ∈ B ∗ . Also, we define for each β > 0, ∆ β g(x) as
(22) ∆ β g(x) = g(x + β) − g(x)
β , x ∈ B ∗ ,
where g(.) is a real-valued function. Under the stated assumptions, we have the following specialized theorem leading us to a characterization of the shifted scaled discrete exponential family.
Theorem 3.2. Let g be a real-valued function defined on B ∗ with E θ { [∆ ∆βg(X)]
2
β