Exponential families related to Chernoff-type inequalities

(1)

EXPONENTIAL FAMILIES RELATED TO CHERNOFF-TYPE INEQUALITIES

G. R. Mohtashami Borzadaran

Abstract. In this paper, the characterization results related to Chernoff-type inequalities are applied for exponential-type (contin- uous and discrete) families. Upper variance bound is obtained here with a slightly different technique used in Alharbi and Shanbhag [1]

and Mohtashami Borzadaran and Shanbhag [8]. Some results are shown with assuming measures such as non-atomic measure, atomic measure, Lebesgue measure and counting measure as special cases of Lebesgue-Stieltjes measure. Characterization results on power series distributions via Chernoff-type inequalities are corollaries to our results.

1. Introduction

Characterization results related to variance bound established by Chernoff [5] and Cacoullos and Papathanasiou [2, 3, 4]. Goldstein and Reinert [6] and Papadatos and Papathanasiou [9] found variance bounds via Stein identity. Alharbi and Shanbhag [1] and Mohtashami Borzadaran and Shanbhag [8] published an extended version of Chernoff inequality. Hudson [7] and Prakasa Rao [11] characterized the exponen- tial families. Also, Papathanasiou [7] characterized the power series distribution via the Chernoff-type inequality. We characterize the gen- eralized exponential-type family by using a version of the Chernoff in- equality. The following theorem and corrolaries lead us to versions of the exponential families:

Theorem 1.1. With w(x) > 0 for almost all [ν _F

∗

]x ∈ ℜ, F is a distribution function that absolutely continuous with respect to ν F

^∗

and

Received July 12, 2000. Revised January 3, 2002.

2000 Mathematics Subject Classification: Primary 62E10; Secondary 60E05, 60E15.

Key words and phrases: Chernoff-type inequalities, covariance identities, charac-

terization, upper variance bound, variance inequality, exponential families.

(2)

there exists a point x ₀ such that t(x) > θ for almost all [ν F

^∗

]x ∈ ℜ, lying in (x ₀ , ∞) and t(x) < θ for almost all [ν _F

∗

]x ∈ ℜ, lying in (−∞, x ₀ ) with E(t(X)) = θ and t(x ₀ ) ≥ θ, where X is a random variable with a distribution function F. Then F satisfies (10) if and only if for some c ∈ (0, ∞),

dF (x) (1)

=







c{ _w(x) ¹ { Q

x

r

∈D

x⁽¹⁾

(1 − m ⁽¹⁾ {x _r })}e ^−H

^c⁽¹⁾

^(x) }dν _F

∗

(x) if x ≥ x ₀ c{ _w

_∗

¹ _(x) { Q

x

r

∈D

⁽²⁾x

(1 − m ⁽²⁾ {x _r })}e ^−H

^c⁽²⁾

^(x) }dν _F

∗

(x) if x < x ₀ , where

m ⁽¹⁾ (•)=

Z

[x

0

,∞) T •

(t(y)−θ)(w(y)) ⁻¹ dν _F

∗

(y), H _c ⁽¹⁾ (x) = m ⁽¹⁾ _c ((−∞, x]), and

m ⁽²⁾ (•)=

Z

(−∞,x

0

) T •

(θ−t(y))(w ^∗ (y)) ⁻¹ dν _F

∗

(y), H _c ⁽²⁾ (x) = m ⁽²⁾ _c ([x, ∞))

with 0 < w ^∗ (x) = w(x) + (θ − t(x))ν F

^∗

{x}, m ⁽¹⁾ c and m ⁽²⁾ c as contin- uous parts of m ⁽¹⁾ and m ⁽²⁾ respectively, and D ⁽¹⁾ x and D x ⁽²⁾ as the sets of discontinuity points of m ⁽¹⁾ that lie in (−∞, x) and of m ⁽²⁾ that lie in (x, ∞), respectively.

Corollary 1.2. If we have the assumptions in Theorem 1.1 met with F ^∗ continuous, then the conclusion of the theorem holds with the following in place of (2):

(2) dF _θ (x) = c(θ)

w(x) exp{−

Z

(x

0

,x]

t(y) − θ

w(y) dν _F

∗

(y)}dν F

^∗

(x), where x ₀ is as the statement of Theorem 1.1.

Corollary 1.3. If we have the assumptions in Theorem 1.1 met

with F ^∗ continuous, then for any distribution F θ (that is, absolutely

continuous with respect to ν _F

∗

) the conclusion of the theorem holds on

taking, in place of (2), that F θ is concentrated on D and it satisfies the

(3)

following:

dF _θ (x) (3)

= c(θ)

w(x) exp{−

Z

(x

0

,x]

t(y)

w(y) dν _F

∗

(y)} exp{θ Z

(x

0

,x]

1 w(y) dν _F

∗

(y)}dν F

^∗

(x)

= c(θ)k ₁ (x) exp{θ Z

(x

0

,x]

1 w(y) dν F

^∗

(y)}dν F

^∗

(x), x ∈ D, where

(4) k ₁ (x) = 1

w(x) exp{−

Z

(x

0

,x]

t(y)

w(y) dν _F

∗

(y)}.

Remark 1.4. In Corollary 1.3, F ^∗ (x) = x, x ∈ ℜ, implies that we have

dF _θ (x) = c(θ)

w(x) exp{−

Z

(x

0

,x]

t(y)

w(y) dy} exp{θ Z

(x

0

,x]

1 w(y) dy}dx

= c(θ)k ₁ (x) exp{θ Z

(x

0

,x]

1 w(y) dy}dx, (5)

in place of (4) and

(6) k ₁ (x) = 1

w(x) exp{−

Z

(x

0

,x]

t(y) w(y) dy}, in place of (4).

Let us now define an exponential family that is a special case of (4). Let X be a random variable with a distribution function F θ that is absolutely continuous with respect to ν _F

∗

with density f _θ (x) of the form:

(7) f _θ (x) = c(θ)k(x)e ^θµ

^∗

^(x) , x ∈ ℜ, θ ∈ ℜ, where 0 < k(x) = exp{− R

(x

0

,x] t(y)dν F

^∗

(y)}, and µ ^∗ (x) = F ^∗ (x)−F ^∗ (a) and E _θ [t(X)] = θ. If w(x) ≡ 1, then we have the family (7) in place of (4). In (7) if we take, F ^∗ (x) = x, x ∈ ℜ, then (7) simplifies to

(8) f _θ (x) ∝ k(x)e ^θx , x ∈ ℜ, θ ∈ ℜ.

(4)

2. Exponential families via Chernoff-type inequalities We characterize the generalized exponential-type family introduced in (4) by using a version of the Chernoff inequality using Alharbi and Shanbhag [1] and Mohtashami Borzadaran and Shanbhag [8].

Lemma 2.1. Let F ^∗ be a non-constant Lebesgue-Stieltjes measure function on ℜ and ν F

^∗

be the measure on the Borel σ−field of ℜ deter- mined by it. Let X with a distribution function F be a random variable such that E{t(X)} = θ, where t(.) is increasing and let g ^′ be as defined before. Then, for almost all [F ] a ∈ ℜ,

Z

ℜ

(g ^′ (y)) ² { Z

[y,∞)

[t(x) − θ]dF (x)}dν F

^∗

(y)

= Z

ℜ

[t(x) − θ]

Z

(a,x]

(g ^′ (y)) ² dν _F

∗

(y)dF (x), provided the left hand side of the identity is finite.

Proof. For details of the proof of Lemma see Alharbi and Shanbhag [1] and Mohtashami Borzadaran and Shanbhag [8].

Theorem 2.2. Let F ^∗ be a non-constant Lebesgue-Stieltjes measure function on ℜ and ν _F

∗

be the measure on the Borel σ−field of ℜ de- termined by it. Let X be a random variable and t be a function that is absolutely continuous with respect to ν F

^∗

, such that E{t(X)} = θ and E {t(X)} ² < ∞. Further, let w be a Borel measurable function such that w(X) > 0 a.s. and V ar{t(X)} = E{t ^′ (X)w(X)}, where t ^′ is the Radon-Nikodym derivative of t with respect to ν F

^∗

. Assume that t ^′ > 0 and let τ be the class of real-valued absolutely continuous func- tions g with Radon-Nikodym derivative g ^′ with respect to the measure ν _F

∗

satisfying E {g(X)} ² < ∞ and 0 < E{w(X) ^[g _t

^′′

^(X)] (X)

²

} < ∞. Then

(9) sup

g∈τ

V ar[g(X)]

E{w(X) ^[g _t

^′′

^(X)] (X)

²

} = 1 if and only if the distribution function of X satisfies (10)

Z

[x,∞)

(t(y) − θ)dF (y)dν _F

∗

(x) = w(x)dF (x), x ∈ ℜ.

(5)

Proof. Note that if (9) holds with g = t, we have V ar[g(X)]

E{w(X) ^[g _t

^′_′

^(X)] _(X)

²

} = 1,

and hence to have the “ if ” part of the theorem, it is sufficient if we show that

(11)

Z

ℜ

w(y) [g ^′ (y)] ²

t ^′ (y) dF (y) ≥ V ar{g(X)}, g ∈ τ.

By using Lemma 2.1, and assuming that X and X ^∗ are independent random variables with the distribution function F, we have for almost all [F ] a ∈ ℜ,

V ar{g(X)} = 1 2 E{

Z

(X

^∗

,X]

g ^′ (y)dν F

^∗

(y) 2

}

= 1 2 E{

Z

(X

^∗

,X]

p t ^′ (y) g ^′ (y)

pt ^′ (y) dν _F

∗

(y) 2

}

≤ 1 2 E{

Z

(X

^∗

,X]

t ^′ (y)dν F

^∗

(y) Z

(X

^∗

,X]

(g ^′ (y)) ²

t ^′ (y) dν F

^∗

(y)}

= 1

2 E{ t(X) − t(X ^∗ ) Z

(X

^∗

,X]

(g ^′ (y)) ²

t ^′ (y) dν F

^∗

(y)}

(12)

= E{ t(X) − θ Z

(a,X]

(g ^′ (y)) ²

t ^′ (y) dν F

^∗

(y)}

= Z

ℜ

t(x) − θ Z

(a,x]

(g ^′ (y)) ²

t ^′ (y) dν F

^∗

(y)dF (x)

= Z

ℜ

(g ^′ (y)) ² t ^′ (y) {

Z

[y,∞)

(t(x) − θ)dF (x)}dν F

^∗

(y)

= Z

ℜ

(g ^′ (y)) ²

t ^′ (y) w(y)dF (y)

= E{w(X) [g ^′ (X)] ² t ^′ (X) }.

(Note that here and in what follows, we take R

(a,b] = − R

(b,a] if a > b.)

To prove the “ only if ” part, we use an approach somewhat different to

that used in Alharbi and Shanbhag [1]. For every real γ and u, let g _u be

such that g _u ^′ (x) = γ cos(ux) + t ^′ (x) be the Radon-Nikodym derivative of

(6)

it with respect to ν F

^∗

. Then (11) gives that V ar{g u (X)} = V ar{g u (X) − g u (a)}

= V ar{

Z

(a,X]

(t ^′ (x) + γ cos(ux))dν F

^∗

(x)}

= V ar{t(X)} + γ ² V ar{

Z

(a,X]

(cos(ux))dν F

^∗

(x)}

(13)

+2γCov{t(X), Z

(a, X]

cos(ux)dν _F

∗

(x)}

≤ E{ w(X)

t ^′ (X) [t ^′ (X) + γ cos(uX)] ² }.

It follows from (14), in view of the assumption E t ^′ (X)w(X) = V ar t(X), that

γ ² {V ar{

Z

(a, X]

cos(ux)dν F

^∗

(x)} − E{ w(X)

t ^′ (X) cos ² (uX)}}

(14)

+2γ[Cov{t(X), Z

(a, X]

cos(ux)dν _F

^∗

(x)} − E w(X) cos(uX)] ≤ 0.

In view of (15), we get that E{(t(X) − θ)

Z

(a, X]

cos(ut)dν _F

∗

(t)} = E w(X) cos(uX).

This relation implies by Fubini’s theorem that Z

ℜ

cos(ux)w(x)dF (x) = Z

ℜ

cos(uy){

Z

[y,∞)

t(x) − θdF (x)}dν F

^∗

(y).

The relation above also holds if cos(ux) is replaced by sin(ux) and hence when cos(ux) is replaced by exp{iux}; then from the uniqueness theorem of the Fourier transforms the required result follows.

Remark 2.3. We can establish the “ if ” part of Theorem 2.2 via a slightly different way the same as mentioned in previous chapter.

The following theorems are essential versions of theorems related to

covariance identities.

(7)

Theorem 2.4. Let F ^∗ be a non-constant Lebesgue-Stieltjes mea- sure function on ℜ and ν _F

∗

be the measure on the Borel σ− field of ℜ determined by it, and let t and Z be Borel measurable func- tions. Let X be a random variable with a distribution function F _θ such that t(X) is integrable with θ = E θ [t(X)] and E θ (|Z(X)|I _{X∈(a,b)} ) <

∞ for every −∞ < a < b < ∞ and satisfying the condition that lim inf x→∞ (t(x) − θ) > 0 if the right extremity of F θ equals ∞, and the condition that lim inf _x→−∞ (θ − t(x)) > 0 if the left extremity of F _θ equals −∞. Further let τ be the class of real-valued absolutely contin- uous functions g with Radon-Nikodym derivative g ^′ with respect to the measure ν _F

∗

(i.e. such that g(b) − g(a) = R

(a,b] g ^′ (x)dν _F

∗

(x) for all a and b with a < b). Then, we have the condition

(15) Cov _θ {g(X), t(X)} = E _θ {Z(X)g ^′ (X)}, met for all g ∈ τ with E _θ | Z(X)g ^′ (X) | < ∞, if and only if (16) Z(x)dF θ (x) = {

Z

[x,∞)

[t(z) − θ]dF θ (z)}dν F

^∗

(x), x ∈ ℜ.

Theorem 2.5. Let X, g, τ, Z and t be defined as in Theorem 2.4, but additionally with t(.) absolutely continuous with respect to ν _F

∗

and t(X) as nondegenerate square integrable satisfying

(17) V ar _θ {t(X)} = E _θ Z (X)t ^′ (X)

(with two sides of the identity well defined and finite). Furthermore, let τ ^∗ be the set of g ∈ τ for which g(X) is square integrable and E _θ {Z(X)g ^′ (X)} is defined and nonzero. Then

(18) inf

g∈τ

^∗

V ar _θ [g(X)]V ar _θ [t(X)]

E _θ ² {Z(X)g ^′ (X)} = 1 if and only if (16) holds.

Let ν _F

∗

be a non-atomic measure, then the “ only if ” parts of The-

orems 2.2, 2.4 and 2.5 characterize generalized continuous exponential

family with the form (4). We have the following theorem related to char-

acterization of generalized continuous exponential family for the case

that F ^∗ is continuous:

(8)

Corollary 2.6. Let X, g, τ, w, and t be defined as in Theorem 2.2. Also, let

V ar _θ g(X) ≤ E θ {w(X) [g ^′ (X)] ²

t ^′ (X) }, for all g ∈ τ,

and equality holds when g is linear in t. Then the distribution of the random variable X is given by (4).

Corollary 2.7. Let X, g, τ, Z, and t be defined as in Theo- rem 2.4 but additionally with t absolutely continuous with respect to non-atomic measure ν F

^∗

, τ ^∗ as a subset of g ∈ τ for which g(X) is square integrable with E _θ (Z(X)g ^′ (X)) 6= 0 and t ² (X) integrable and V ar _θ {t(X)} = E θ Z(X)t ^′ (X). Then

(19) V ar _θ g(X) ≥ E _θ ² {Z(X)g ^′ (X)}

E θ Z (X)t ^′ (X) , for all g ∈ τ ^∗

if and only if for all g ∈ τ ^∗ , the distribution of the random variable X is given by (4). Equality holds when g is linear in t.

• Corollary 2.7 with Z(x) ≡ 1, Corollary 2.6 with w(x) ≡ 1, The- orem 2.4 with Z(x) ≡ 1 and Theorem 2.2 with w(x) ≡ 1, yield characterizations of the exponential family of the form (7).

• Let X, g, τ, Z, and t be defined as in Corollary 2.7 but F ^∗ (x) = x, x ∈ ℜ and Z(x) ≡ 1. Then

V ar θ g(X) ≥ E _θ ² {g ^′ (X)}

E _θ t ^′ (X) , for all g ∈ τ ^∗ ,

in place of (19), if and only if for all g ∈ τ ^∗ , the distribution of the random variable X is given by (8). Equality holds when g is linear in t.

• Let X, g, τ, w, and t be defined as in Theorem 2.2 but F ^∗ (x) = x, x ∈ ℜ and w(x) ≡ 1. Also, let

V ar _θ {g(X)} ≤ E _θ { [g ^′ (X)] ²

t ^′ (X) }, for all g ∈ τ,

and equality holds when g is linear in t. Then the distribution of

the random variable X is given by (8).

(9)

Table 1. Characterization Based on t(x), w(x) and Up- per Bound of V ar θ (g(X)) in Continuous Case

Upper bound for w(x) t(x) Range of Name of

V ar ^θ (g(X)) ^a the random variable X Distributions

E θ {[g ^′ (X)] ² } 1 x x ∈ ℜ Normal

E ^θ { ^X

²

₂ ^c

²

[g ^′ (X)] ² } x 2c ⁻² ln x x ∈ (0, ∞) Lognormal

E θ { ^X c [g ^′ (X)] ² } x cx − 1 x ∈ (0, ∞), c > 0 Gamma

E ^θ { ^X

²

_1+c ^(X−1) [g ^′ (X)] ² } x − 1 ^1−c(x+1) x x ∈ (0, 1), c > 0 Beta

E θ { _4c ¹ [g ^′ (X)] ² } x 2cx ² − 1 x ∈ (0, ∞), c > 0 Generalized Rayleigh

E ^θ {e ^−X [g ^′ (X)] ² } 1 e ^x x ∈ (0, ∞), c > 0 Standard Log-gamma

a

In this table c is a positive constant.

Based on the Corollary 2.6 when F ^∗ (x) = x, x ∈ ℜ, we have the following characterizations via t(x) and upper bounds of the variance of g as seen by Table 1.

Remark 2.8. We can find based on Corollary 2.7 via t and lower

bound of the variance of g, characterizations for some distributions anal-

ogous to those in Table 1.

(10)

3. Discrete families via Chernoff-type inequalities

Let ν F

^∗

be concentrated on a countable set C, such that ν F

^∗

{x} = β > 0, then we have a discrete version of the Theorems 2.2, 2.4 and 2.5 as follows:

Corollary 3.1. Let F ^∗ be a non-constant Lebesgue-Stieltjes mea- sure such that ν F

^∗

is concentrated on a countable set C with each point x of C having ν _F

^∗

{x} = β, β > 0. Also, let X be a random variable and t a strictly increasing function with E _θ {t(X)} = θ, and E _θ {t(X)} ² < ∞.

Further, let w be a function such that w(x) > 0 for each x ∈ C, and V ar _θ {t(X)} = E _θ { t(X) − t(X−)

β w(X)}.

Further, let τ be the class of real-valued functions g satisfying E _θ {g(X)} ² < ∞

and

0 < E θ {w(X) [g(X) − g(X−)] ²

t(X) − t(X−) } < ∞.

Then

(20) sup

g∈τ

βV ar _θ [g(X)]

E _θ {w(X) [g(X)−g(X−)]

²

t(X)−t(X−) } = 1,

if and only if the probability density function of X satisfies in the following:

(21) w(x)f (x) = β X

{y∈C, y≥x}

(t(y) − θ)f (y), x ∈ C.

Proof. The result follows as a corollary to Theorem 2.2 on taking F ^∗ such that ν F

^∗

is concentrated on C such that ν F

^∗

{x} = β, x ∈ C; note that we have now g ^′ (x) = ^{g(x)−g(x−)} _β , x ∈ C and t ^′ (x) = ^{t(x)−t(x−)} _β , x ∈ C.

Let X be a random variable with values in B ^∗ ; we call a discrete exponential family a shifted scaled discrete exponential family if F ^∗ (x) = β[ ^x−α _β ], x ∈ B ^∗ . Also, we define for each β > 0, ∆ _β g(x) as

(22) ∆ _β g(x) = g(x + β) − g(x)

β , x ∈ B ^∗ ,

(11)

where g(.) is a real-valued function. Under the stated assumptions, we have the following specialized theorem leading us to a characterization of the shifted scaled discrete exponential family.

Theorem 3.2. Let g be a real-valued function defined on B ^∗ with E θ { ^[∆ _∆

^β

^g(X)]

²

β

t

^∗

(X) } < ∞ for all g and X be distributed as (8) and ∆ β t ^∗ (x) > 0 for all x ∈ B ^∗ . Then we have the following inequality:

(23) V [g(X)] ≤ β ² θE{ [∆ _β g(X)] ²

∆ _β t ^∗ (X) }.

Proof. The result follows on noting that in (8) the function k is such that w(x + β)k(x + β) = βk(x), x ∈ B ^∗ and hence, we have under the validity of (8),

β ² θE{ [∆ β g(X)] ²

∆ _β t ^∗ (X) } = βθE{ [∆ β g(X)] ²

∆ _β t(X) }

= θ X

{x: x∈B

^∗

}

{ [g(x + β) − g(x)] ²

t(x + β) − t(x) c(θ)k(x)θ

^x^β

}

= X

{x: x∈B

^∗

}

{ [g(x) − g(x − β)] ²

t(x) − t(x − β) c(θ)k(x − β)θ

^β^x

} (24)

= 1

β X

{x: x∈B

^∗

}

{w(x) [g(x) − g(x − β)] ²

t(x) − t(x − β) c(θ)k(x)θ

^x^β

}

= 1

β E{w(X) [g(X) − g(X − β)] ² t(X) − t(X − β) }.

From the above equality, we get the inequality (23).

Theorem 3.3. Let the inequality (23) be satisfied for all real-valued functions g defined on B ^∗ for some function t ^∗ with ∆ _β t ^∗ (x) > 0 for all x ∈ B ^∗ . Further suppose E θ [t ^∗ (X)] = θ and the equality holds in (23), when g is linear in t. Then the distribution of X belongs to a family of the form (8).

Proof. On taking w(x + β)k(x + β) = βk(x), x ∈ B ^∗ , lead us to a

characterization of a family of the form (8).

(12)

Remark 3.4. We can prove Theorem 3.2 and Theorem 3.3 directly based on the definition of ∆ _β .

• If B ^∗ = {nβ + α : n ∈ Z}, then in Theorem 3.2, the inequality (23) is valid for the case of the probability density function of the random variable X belonging to a shifted scaled bilateral power series family; also, Theorem 3.3 characterizes a shifted scaled bi- lateral power series family.

• If B ^∗ = {nβ + α : n ∈ N }, then in Theorem 3.2, the inequality (23) is valid for the case of the probability density function of the random variable X belonging to a shifted scaled power series fam- ily; also, Theorem 3.3 characterizes a shifted scaled power series family.

• If B ^∗ = {nβ + α : n ∈ N _n

₀

= {0, 1, 2, .., n ₀ }}, then in Theorem 3.2, the inequality (23) is valid for the case of the probability density function of the random variable X belonging to a shifted scaled binomial family; also, Theorem 3.3 characterizes a shifted scaled binomial family.

• For β = 1,and α = 0, based on Theorem 3.3, we can obtain charac- terization of the bilateral polynomial power series, discrete quar- tic, scaled discrete normal, Poisson, binomial, negative binomial, Heine, Euler, Pseudo-Euler and Polya Eggenberger) distributions respectively. In this case, we obtain the first theorem in Pap- athanasiou [10, Theorem 2.1, p. 165] as a corollary of the above results.

Acknowledgment. I am grateful to Prof. D. N. Shanbhag for his valuable comments.

References

[1] A. A. G. Alharbi and D. N. Shanbhag, General characterization theorem based on versions of the Chernoff inequality and the Cox representation, J. Statist. Plann.

Inference 55 (1996), 139–150.

[2] T. Cacoullos and V. Papathanasiou, On upper bounds for the variance of a func- tion of a random variable, Statist. Probab. Lett. 3 (1985), 175–184.

[3] , A generalization of covariance identity and related characterizations,

Math. Methods Statist. 4 (1995), 106–113.

(13)

[4] , Characterizations of distributions by generalization of variance bounds and simple proof of the CLT, J. Statist. Plann. Inference 63 (1997), 157–171.

[5] H. Chernoff, A note on an inequality involving the normal distribution, Ann.

Probab. 9 (1981), 533–535.

[6] L. Goldstein and G. Reinert, Stein’s Method and zero bias transformation with application to simple random sampling, Ann. Appl. Probab. 7 (1997), no. 4, 935–952.

[7] H. M. Hudson, A natural identity for exponential families with applications in multi-parameter estimation, Ann. Statist. 6 (1978), 473–484.

[8] Mohtashami Borzadaran and D. N. Shanbhag, Further results based on Chernoff- type inequalities, Statist. Probab. Lett. 39 (1998), 109–117.

[9] N. Papadatos and V. Papathanasiou, Unified variance bounds and a Stein-type identity, Probability and statistical Models with applications, edited by Char- alambides, C. A., Koutras, M. V. and Balakrishnan, N. Chapman and Hall, 2000.

Exponential families related to Chernoff-type inequalities

EXPONENTIAL FAMILIES RELATED TO CHERNOFF-TYPE INEQUALITIES

G. R. Mohtashami Borzadaran

Abstract. In this paper, the characterization results related to Chernoff-type inequalities are applied for exponential-type (contin- uous and discrete) families. Upper variance bound is obtained here with a slightly different technique used in Alharbi and Shanbhag [1]

1. Introduction

Theorem 1.1. With w(x) > 0 for almost all [ν F

]x ∈ ℜ, F is a distribution function that absolutely continuous with respect to ν F

and

Received July 12, 2000. Revised January 3, 2002.

2000 Mathematics Subject Classification: Primary 62E10; Secondary 60E05, 60E15.

Key words and phrases: Chernoff-type inequalities, covariance identities, charac-

terization, upper variance bound, variance inequality, exponential families.

there exists a point x 0 such that t(x) > θ for almost all [ν F

]x ∈ ℜ, lying in (x 0 , ∞) and t(x) < θ for almost all [ν F

]x ∈ ℜ, lying in (−∞, x 0 ) with E(t(X)) = θ and t(x 0 ) ≥ θ, where X is a random variable with a distribution function F. Then F satisfies (10) if and only if for some c ∈ (0, ∞),

dF (x) (1)

=







c{ w(x) 1 { Q

x

∈D

(1 − m (1) {x r })}e −H

(x) }dν F

(x) if x ≥ x 0 c{ w

1 (x) { Q

x

∈D

(1 − m (2) {x r })}e −H

(x) }dν F

(x) if x < x 0 , where

m (1) (•)=

Z

[x

,∞) T •

(t(y)−θ)(w(y)) −1 dν F

(y), H c (1) (x) = m (1) c ((−∞, x]), and

m (2) (•)=

Z

(−∞,x

) T •

(θ−t(y))(w ∗ (y)) −1 dν F

(y), H c (2) (x) = m (2) c ([x, ∞))

with 0 < w ∗ (x) = w(x) + (θ − t(x))ν F

{x}, m (1) c and m (2) c as contin- uous parts of m (1) and m (2) respectively, and D (1) x and D x (2) as the sets of discontinuity points of m (1) that lie in (−∞, x) and of m (2) that lie in (x, ∞), respectively.

Corollary 1.2. If we have the assumptions in Theorem 1.1 met with F ∗ continuous, then the conclusion of the theorem holds with the following in place of (2):

(2) dF θ (x) = c(θ)

w(x) exp{−

Z

(x

,x]

t(y) − θ

w(y) dν F

(y)}dν F

(x), where x 0 is as the statement of Theorem 1.1.

Corollary 1.3. If we have the assumptions in Theorem 1.1 met

with F ∗ continuous, then for any distribution F θ (that is, absolutely

continuous with respect to ν F

) the conclusion of the theorem holds on

taking, in place of (2), that F θ is concentrated on D and it satisfies the

following:

dF θ (x) (3)

= c(θ)

w(x) exp{−

Z

(x

,x]

t(y)

w(y) dν F

(y)} exp{θ Z

(x

,x]

1

w(y) dν F

(y)}dν F

(x)

= c(θ)k 1 (x) exp{θ Z

(x

,x]

Theorem 1.1. With w(x) > 0 for almost all [ν _F

there exists a point x ₀ such that t(x) > θ for almost all [ν F

]x ∈ ℜ, lying in (x ₀ , ∞) and t(x) < θ for almost all [ν _F

]x ∈ ℜ, lying in (−∞, x ₀ ) with E(t(X)) = θ and t(x ₀ ) ≥ θ, where X is a random variable with a distribution function F. Then F satisfies (10) if and only if for some c ∈ (0, ∞),

c{ _w(x) ¹ { Q

(1 − m ⁽¹⁾ {x _r })}e ^−H

^(x) }dν _F

(x) if x ≥ x ₀ c{ _w

¹ _(x) { Q

(1 − m ⁽²⁾ {x _r })}e ^−H

^(x) }dν _F

(x) if x < x ₀ , where

m ⁽¹⁾ (•)=

(t(y)−θ)(w(y)) ⁻¹ dν _F

(y), H _c ⁽¹⁾ (x) = m ⁽¹⁾ _c ((−∞, x]), and

m ⁽²⁾ (•)=

(θ−t(y))(w ^∗ (y)) ⁻¹ dν _F

(y), H _c ⁽²⁾ (x) = m ⁽²⁾ _c ([x, ∞))

with 0 < w ^∗ (x) = w(x) + (θ − t(x))ν F

{x}, m ⁽¹⁾ c and m ⁽²⁾ c as contin- uous parts of m ⁽¹⁾ and m ⁽²⁾ respectively, and D ⁽¹⁾ x and D x ⁽²⁾ as the sets of discontinuity points of m ⁽¹⁾ that lie in (−∞, x) and of m ⁽²⁾ that lie in (x, ∞), respectively.

Corollary 1.2. If we have the assumptions in Theorem 1.1 met with F ^∗ continuous, then the conclusion of the theorem holds with the following in place of (2):

(2) dF _θ (x) = c(θ)

w(y) dν _F

(x), where x ₀ is as the statement of Theorem 1.1.

with F ^∗ continuous, then for any distribution F θ (that is, absolutely

continuous with respect to ν _F

dF _θ (x) (3)

w(y) dν _F

w(y) dν _F

= c(θ)k ₁ (x) exp{θ Z

(4) k ₁ (x) = 1

w(y) dν _F

Remark 1.4. In Corollary 1.3, F ^∗ (x) = x, x ∈ ℜ, implies that we have

dF _θ (x) = c(θ)

= c(θ)k ₁ (x) exp{θ Z

(6) k ₁ (x) = 1

Let us now define an exponential family that is a special case of (4). Let X be a random variable with a distribution function F θ that is absolutely continuous with respect to ν _F

with density f _θ (x) of the form:

(7) f _θ (x) = c(θ)k(x)e ^θµ

^(x) , x ∈ ℜ, θ ∈ ℜ, where 0 < k(x) = exp{− R

(y)}, and µ ^∗ (x) = F ^∗ (x)−F ^∗ (a) and E _θ [t(X)] = θ. If w(x) ≡ 1, then we have the family (7) in place of (4). In (7) if we take, F ^∗ (x) = x, x ∈ ℜ, then (7) simplifies to

(8) f _θ (x) ∝ k(x)e ^θx , x ∈ ℜ, θ ∈ ℜ.

Lemma 2.1. Let F ^∗ be a non-constant Lebesgue-Stieltjes measure function on ℜ and ν F

be the measure on the Borel σ−field of ℜ deter- mined by it. Let X with a distribution function F be a random variable such that E{t(X)} = θ, where t(.) is increasing and let g ^′ be as defined before. Then, for almost all [F ] a ∈ ℜ,

(g ^′ (y)) ² { Z

(g ^′ (y)) ² dν _F

Theorem 2.2. Let F ^∗ be a non-constant Lebesgue-Stieltjes measure function on ℜ and ν _F

, such that E{t(X)} = θ and E {t(X)} ² < ∞. Further, let w be a Borel measurable function such that w(X) > 0 a.s. and V ar{t(X)} = E{t ^′ (X)w(X)}, where t ^′ is the Radon-Nikodym derivative of t with respect to ν F

. Assume that t ^′ > 0 and let τ be the class of real-valued absolutely continuous func- tions g with Radon-Nikodym derivative g ^′ with respect to the measure ν _F

satisfying E {g(X)} ² < ∞ and 0 < E{w(X) ^[g _t

^(X)] (X)