Mixture modeling for efficiently estimating the spectral distribution of bivariate regular variation

(1)

Mixture modeling for efficiently estimating the spectral distribution of bivariate regular variation

Moosup Kim ¹

1 Department of Statistics, Keimyung University

Received 22 February 2021, revised 30 March 2021, accepted 7 April 2021

Abstract

This paper considers the parametric estimation of the spectral distribution of bi- variate regular variation for efficiency. Since Fisher consistency is the key condition for attaining efficiency, mixture model is employed as the parametric model class due to its flexibility. The maximum likelihood estimation is shown to be asymptotically efficient under some regularity conditions. Moreover, in the real data analysis, the maximum likelihood method based on normal mixture produces an estimation result of spectral distribution well fitted to the data.

Keywords: mixture model, bivariate regular variation, spectral distribution, efficiency

1. Introduction

The dependence between extreme values is a prominent issue in financial risk management.

The price variations of financial assets commonly reveal heavy tails that can cause severe investment loss. Moreover, the positive correlation between extreme variations increases the risk of portfolio. Thus, modeling the dependence among extreme values has received much attention for application to finance. This paper takes this issue into consideration with respect to the case of bivariate regular variation, which consists of the tail exponent and spectral distribution. The former is related to the magnitude of extremes and the latter governs the dependence between the variates. For details of multivariate regular variations and its application to finance, we refer to Hult and Lindskog (2002), Kl¨ uppelberg et al.

(2007), and Einmahl et al. (2020). Moreover, for a comprehensive description of extreme value analysis, see Kang (2005), Choi (2019), and the papers cited therein.

This paper aims at estimating the spectral distribution efficiently. Specifically, the spec- tral distribution is assumed to be a member of a parametric model class, and maximum likelihood estimation is carried out. In this paper, the resulting estimator will be shown to be asymptotically efficient indeed under some regularity conditions, i.e. it is consistent and asymptotically follows normal distribution whose the covariance matrix is equal to the inverse of Fisher information. In the parametric estimation, Fisher consistency is the key condition for attaining the efficiency, i.e., the parametric model has to be flexible so that

1 Assistant professor, Department of Statistics, Keimyung University, Daegu 42601, Korea. E-

mail:[email protected]

(2)

the true spectral distribution is its member. Thus, we employ mixture model that is widely used due to its flexibility.

The rest of this paper is organized as follows: Section 2.1 briefly reviews the efficient parametric estimation under a framework applicable to estimation of spectral distribution and Section 2.2 proposes the efficient estimation method for spectral distribution based on mixture modeling. Section 3 presents a simulation study evaluating the performance of the proposed estimation and a real data analysis. Section 4 presents the concluding remark of this paper. Last, Section 5 provides the proofs of the theorems and proposition in Section 2.1.

2. Main result

2.1. Efficient parametric estimation

This subsection presents a review on the efficient parametric estimation under a framework which is applicable to the main issue of this paper. Let {ξ _ni : n ∈ N, i = 1, . . . , k n } be a double array of continuous random variables which take their value in the unit interval [0, 1], where k _n ∈ N increases to ∞ as n → ∞. Each row {ξ n1 , . . . , ξ _nk

_n

} represents a sample, but we do not explicitly assume that it is an i.i.d. sequence. Instead, letting

Λ ˆ n (u) := 1 k n

k

_n

X

i=1

I(ξ ni ≤ u), u ∈ [0, 1],

be the empirical distribution function, it is assumed that there exists a distribution function Λ on [0, 1] such that

sup

0≤u≤1

| ˆ Λ _n (u) − Λ(u)| −→ 0, ^P (2.1)

and

√

k n { ˆ Λ n (u) − Λ(u)} −→ G ^d ^◦ (u) in D[0, 1], (2.2) where G(u), 0 ≤ u ≤ 1, is a zero-mean continuous Gaussian process with

E(G(u 1 )G(u 2 )) = Λ(u 1 ), 0 ≤ u 1 ≤ u 2 ≤ 1, and G ^◦ (u) := G(u) − Λ(u)G(1).

We consider estimating Λ efficiently based on ˆ Λ _n . Let {Λ(u; θ) : θ ∈ Θ} be a parametric model for Λ such that each member Λ(u; θ) has the continuous density function λ(u; θ) with respect to du and its support is the unit interval [0, 1]. It is assumed that Λ is a member of the class, viz.,

Λ(·) = Λ( · ; θ ^◦ ) for some θ ^◦ ∈ Θ. (2.3) Moreover, {Λ(u; θ) : θ ∈ Θ} is assumed to be identifiable, viz.

θ = θ ^◦ if and only if λ( · ; θ) = λ( · ; θ ^◦ ). (2.4)

(3)

Maximum likelihood estimation is carried out: the log-likelihood function is defined as

`(θ) :=

Z 1 0

log λ(u; θ)d ˆ Λ _n (u) = 1 k _n

k

_n

X

i=1

log λ(ξ _ni ; θ),

and the resulting estimator ˆ θ = (ˆ θ 1 , . . . , ˆ θ p ) is obtained by maximizing `(θ) with respect to θ ∈ Θ (p ∈ N indicates the dimension of θ).

For the consistency of ˆ θ, we assume the regularity conditions presented below:

A1 Θ is a compact subset of R ^p .

A2 log λ(u; θ) is bounded and continuous in (u, θ) ∈ (0, 1) × Θ.

Note that A2 implies the continuity of θ 7→ R 1

0 log λ(u; θ)dΛ(u).

Theorem 2.1 Assume that (2.1), (2.3), (2.4), and A1–A2 hold. Then, θ ˆ −→ θ ^P ^◦ , as n → ∞.

The proof of the above theorem is presented in Section 5. For asymptotic normality of ˆ θ, additional regularity conditions are required:

A3 θ ^◦ ∈ int(Θ), where int(Θ) denotes the interior set of Θ.

A4 For j = 1, . . . , p, _∂u ^∂ _λ(u;θ ¹

◦

)

∂

∂θ

j

λ(u; θ ^◦ ) is integrable on (0, 1).

A5 Each element of _λ(u;θ) ¹ _∂θ ^∂ λ(u; θ) and _∂θ∂θ ^∂

² 0

log λ(u; θ) is bounded and continuous in (u, θ) ∈ (0, 1) × Θ.

A5 combined with A2 implies that

∂

∂θ Z 1

0 log λ(u; θ)dΛ(u) = Z 1

0 ∂

∂θ log λ(u; θ)dΛ(u), for each θ ∈ int(Θ), (2.5) Z 1

0 ∂

∂θ log λ(u; θ ^◦ ) ∂

∂θ ⁰ log λ(u; θ ^◦ )dΛ(u) = − Z 1

0 ∂ ²

∂θ∂θ ⁰ log λ(u; θ ^◦ )dΛ(u). (2.6) To delineate the asymptotic variance, letting ϕ(u) := _∂θ ^∂ log λ(u; θ ^◦ ) = _λ(u;θ ¹

◦

)

∂

∂θ λ(u; θ ^◦ ) and ϕ j be the j-th element of ϕ (j = 1, . . . , p), W = [w ij ] i,j=1,...,p is set to p × p matrix whose elements are given by

w ij = Z 1

0 Z 1 0

{Λ(u 1 ∧ u 2 ) − Λ(u 1 )Λ(u 2 )} ˙ ϕ i (u 1 ) ˙ ϕ j (u 2 )du 1 du 2 , i, j = 1, . . . , p,

where ˙ ϕ(u) := ( ˙ ϕ 1 (u), . . . , ˙ ϕ p (u)) ⁰ := _∂u ^∂ ϕ 1 (u), . . . , _∂u ^∂ ϕ p (u) ⁰

denotes the vector of partial derivatives with respect to u. Moreover, let I(θ) := − R 1

0 ∂

²

∂θ∂θ

⁰

log λ(u; θ)dΛ(u) for θ ∈ Θ and

I := I(θ ^◦ ) be the Fisher information matrix. A5 implies the continuity of θ 7→ I(θ). Below,

we present the key proposition for efficiency of ˆ θ:

(4)

Proposition 2.1 Under (2.3), (2.4), and A1–A5, I = W.

The proof is presented in Section 5. The following theorem states that ˆ θ is indeed asymp- totically efficient under (2.3) together with the other conditions presented above. Its proof is presented in Section 5.

Theorem 2.2 Assume that (2.2), (2.3), (2.4), and A1–A5 hold. If I is invertible, then

√

k n (ˆ θ − θ ^◦ ) −→ N (0, I ^d ⁻¹ ), and − ∂ ²

∂θ∂θ ⁰ `(ˆ θ) −→ I, ^P as n → ∞.

2.2. Efficient estimation for spectral distribution

2.2.1. Bivariate regular variation and spectral distribution

From now and on, we consider the main subject of this paper that is a specific case of Λ and Λ n in (2.1)–(2.2). Let (X 1 , Y 1 ), . . . , (X n , Y n ) (n ∈ N) be i.i.d. 2-dimensional random vectors whose components are all positive and continuous. Their polar coordinates are obtained in the following manner:

R i = q

X _i ² + Y _i ² , U i = 2

π arctan Y i

X _i

, i = 1, . . . , n,

where R i and U i indicates the radius and angular component of (X i , Y i ), respectively. We are concerned with the case that the random vectors reveal extreme behavior specified by bivariate regular variation: there exist tail exponent α > 0 and a distribution function Λ on the unit interval [0, 1] such that

lim t→0

P(R 1 > x/t, U 1 ≤ u)

P(R 1 > 1/t) = x ^−α Λ(u), x > 0, u ∈ [0, 1] at which Λ is continuous. (2.7) Then, Λ indicates the spectral distribution of the bivariate regular variation, which describes the extremal dependence between X 1 and Y 1 . Concretely, letting x = 1, (2.7) reduces to

lim t→0 P(U 1 ≤ u|R 1 > 1/t) = Λ(u), u ∈ [0, 1] at which Λ is continuous,

which approximates to the conditional distribution of the angle U ₁ given radius R ₁ is suffi- ciently large. Since multivariate normal distributions frequently fail in jointly modeling the extreme behavior of multiple financial asset prices, heavy tailed elliptical distributions have been studied as the extension of multivariate normal (cf. Kl¨ uppelberg et al., 2007). Note that multivariate regular variation is a general class containing the heavy tailed elliptical distributions (Hult and Lindskog, 2002), thus, it has received attention as a multivariate extreme value model in finance (cf. einmahl et al., 2020).

Let k n ∈ N satisfy

k n → ∞ and k n = o(n) as n → ∞, (2.8)

and

Λ ˆ _n (u) := 1 k n

n

X

i=1

I(U _i ≤ u, R _i > R _(k

_n

₊₁₎ ), u ∈ [0, 1], (2.9)

(5)

where R _(k

_n

₊₁₎ is the (k _n +1)-th largest order statistic of R ₁ , . . . , R _n , i.e., {U _i : R _i > R _(k

_n

₊₁₎ } corresponds to {ξ _n1 , . . . , ξ _nk

_n

} in Section 2.1. Then, ˆ Λ _n is viewed as the empirical estimator of Λ: (2.1) and (2.2) hold under some regularity conditions including

sup

0≤u≤1

P(R 1 > 1/t, U ≤ u) P(R ₁ > 1/t) − Λ(u)

= O(t ^γ ), as t → 0, for some γ > 0, (2.10)

together with

n→∞ lim

√

k n (k n /n) ^γ/α = 0, (2.11)

(cf. (Einmahl et al., 1993)). For the details of the regularity conditions, refer to (Einmahl et al., 1993).

2.2.2. Mixture model

This paper aims at estimating spectral distribution Λ under a parametric model for effi- ciency. For a while, suppose that (X 1 , Y 1 ), . . . , (X n , Y n ) are independently sampled from the first quadrant of a bivariate heavy tailed elliptical distribution with tail exponent α > 0 and positive definite symmetric scale matrix Σ with vech(Σ) = (ς 1 , % √

ς 1 ς 2 , ς 2 ) ⁰ (ς 1 , ς 2 > 0 and

−1 < % < 1). Then, as seen in Example 5.1 of (Hult and Lindskog, 2002), the continuous density function λ of Λ is represented as

λ(u) = C ⁻¹ {ς 1 cos ² (h(u)) + ς ₂ sin ² (arcsin(%) + h(u))} ^α/2 d

du h(u), 0 < u < 1, (2.12) where h(u) = arctan

(1 − % ² ) ^−1/2

pς ₁ /ς ₂ tan(uπ/2) − %

and C is the normalizing con- stant. Note that (2.12) satisfies the following conditions:

C1 λ(u) is bounded above and away from 0 in u;

C2 λ(u) has the finite limits at u = 0, 1,

which are same if and only if ς ₁ = ς ₂ . Thus, based on our experience that financial data commonly reveal C1–C2, we take a mixture model into consideration: for a given κ ∈ {0, 1}, λ(u; θ) = θ 1 + θ 2 2u ^κ (1 − u) ^1−κ + (1 − θ 1 − θ 2 )ψ (u; θ ), 0 < u < 1, (2.13) Λ(u; θ) =

Z u 0

λ(v; θ)dv, θ = (θ 1 , θ 2 , θ ), θ 1 > 0, θ 2 ≥ 0, θ 1 + θ 2 ≤ 1, (2.14) where {ψ (u; θ )} is an identifiable parametric model whose the common support is [0, 1], and satisfies

u→0 lim ψ (u; θ ) = 0 = lim

u→1 ψ (u; θ ) for each θ . (2.15)

Then, (2.13)–(2.14) also constitutes an identifiable mixture model class. Note that θ 1 > 0

makes λ(u; θ) be bounded away from 0 in u, and it is also bounded above if ψ (u; θ ) is so.

(6)

If θ ₂ > 0, κ is then related to the inequality of the limits of λ(u; θ) at u = 0, 1. Otherwise, (2.13) reduces to

λ(u; θ) = θ 1 + (1 − θ 1 )ψ (u; θ ), 0 < u < 1, (2.16) thus, lim _u→0 λ(u; θ) = θ ₁ = lim _u→1 λ(u; θ) and κ has no role. Therefore, (2.13) satisfies C1–C2.

As a specific model of {ψ (u; θ )}, unimodal beta distribution and the logistic transfor- mation of normal mixture are considered. The former is specified as

ψ (u; θ ) = Γ(ρ 1 + ρ 2 )

Γ(ρ 1 )Γ(ρ 2 ) u ^ρ

¹

⁻¹ (1 − u) ^ρ

²

⁻¹ I(0 < u < 1), θ = (ρ 1 , ρ 2 ), ρ 1 , ρ 2 > 1. (2.17) The latter is specified as follows: letting m ∈ N be given and φ denote the continuous standard normal density function,

ψ (u; θ ) =

m−1

X

j=1

τ j

σ _j u(1 − u) φ

log _1−u ^u − µ j

σ _j

+ 1 − P m−1 j=1 τ j

σ _m u(1 − u) φ

log _1−u ^u − µ m

σ _m

, (2.18) θ = (µ 1 , . . . , µ m , σ 1 , . . . , σ m , τ 1 , . . . , τ m−1 ), µ 1 , . . . , µ m ∈ R,

σ 1 , . . . , σ m > 0, τ 1 , . . . , τ m−1 ≥ 0,

m−1

X

j=1

τ j ≤ 1.

ψ (u; θ ) is a probability density function of logistic transformation of a random variable following a finite normal mixture of order up to m, thus {ψ (u; θ )} is an identifiable model class (cf. Proposition 1 of Teicher (1963)). In order to examine the flexibility of the above models, we investigate the discrepancy between the spectral density λ(u) in (2.12) and λ(u; θ ^? ) equipped with (2.17) or (2.18), where θ ^? denotes the value of θ minimizing the Kullback-Leibler divergence R 1

0 log(λ(u)/λ(u; θ))λ(u)du. Figure 5.1 shows the similarity in several cases of (ς 1 , ς 2 , %, α), where the order of normal mixture m is set to 2. The λ(u; θ ^? ) equipped with (2.18) is almost identical with λ(u) in all the cases. Meanwhile, that equipped with (2.17) exhibits tolerable similarity overall.

For given κ ∈ {0, 1}, it is assumed that Λ is a member of the class, viz.,

Λ(·) = Λ( · ; θ ^◦ ) for some θ ^◦ = (θ ₁ ^◦ , θ ₂ ^◦ , θ ^◦ ) satisfying (2.14), (2.19) (cf. (2.3)) and Θ is a compact subset of R ^p (p indicates the dimension of θ) containing θ ^◦ as an interior point so that A1 and A3 hold. Specifically, for model (2.17),

Θ = {(θ ₁ , θ ₂ , θ ) : θ ₁ ≥ ₀ , θ ₂ ≥ 0, θ ₁ + θ ₂ ≤ 1, ρ ₁ , ρ ₂ ∈ [1 + ₀ , M ]} , and, in the case of model (2.18),

Θ = {(θ ₁ , θ ₂ , θ ) : θ ₁ ≥ 0 , θ ₂ ≥ 0, θ 1 + θ ₂ ≤ 1, µ 1 , . . . , µ _m ∈ [−M, M ], σ 1 , . . . , σ m ∈ [ 0 , M ], τ 1 , . . . , τ m−1 ≥ 0, τ 1 + · · · + τ m−1 ≤ 1} ,

for sufficiently small ₀ > 0 and large M > 0 so that θ ^◦ is an interior point of Θ. Moreover,

for the two models (2.17) and (2.18), the followings are readily verified:

(7)

(i) log λ(u; θ) and all elements of _λ(u;θ) ¹ _∂θ ^∂ λ(u; θ) and _∂θ∂θ ^∂

² 0

log λ(u; θ) are bounded and continuous in (u, θ) ∈ (0, 1) × Θ.

(ii) For j = 1, . . . , p, _∂u ^∂ _λ(u;θ ¹

◦

)

∂

∂θ

_j

λ(u; θ ^◦ ) is integrable on (0, 1).

Therefore, A2, A4, and A5 hold. Hence, letting

`(θ) :=

Z 1 0

log λ(u; θ)dΛ n (u) = 1 k n

n

X

i=1

log λ(U i ; θ)I(R i > R (k

_n

+1) ),

be the log-likelihood function, Theorems 2.1–2.2 implies that the resulting maximum likeli- hood estimator ˆ θ of `(θ) is consistent and asymptotically efficient, and the Fisher information is consistently estimated by − _∂θ∂θ ^∂

² 0

`(ˆ θ). However, it is noteworthy that its asymptotical un- biasedness is due to (2.10)–(2.11). Otherwise, if lim _n→∞ √

k _n (k _n /n) ^γ/α is equal to a nonzero finite constant, then the estimator may be asymptotically biased.

Remark 2.1 In practice, we have to determine κ ∈ {0, 1} in (2.13), m ∈ N in (2.18) if the model is selected, and whether or not to fix θ 2 to 0, prior to the estimation. We adopt Akaike information criterion (AIC) for the selection, which is calculated as −2k n `(ˆ θ) + 2p.

The model with smaller AIC value is preferred.

3. Simulation study and real data analysis

3.1. Simulation study

This subsection evaluates the finite sample performance of the proposed estimation. The samples are generated in the following manner:

R _i ∼ t(3), U ˜ _i ∼ λ( · ; θ ^◦ ), δ _i ∼ Ber(R ² _i /(R ² _i + 1)), V _i ∼ f _con , U _i = δ _i U ˜ _i + (1 − δ _i )V _i , (X i , Y i ) = |R i | ( cos((π/2)U i ) , sin((π/2)U i ) ),

where t(3) denotes the t-distribution with degree of freedom 3 and f _con denotes a probability density function on [0, 1] for contamination. We take (X _i , Y _i ), i = 1, . . . , n. Note that (X _i , Y _i ) has the angular component less contaminated by V _i as R _i is larger, viz., (X _i , Y _i ) goes more to the outer extreme region.

For brevity, we concentrate on the case that ψ (u; θ ^◦ ) is the unimodal beta distribution (2.17) with θ ^◦ = (θ ^◦ ₁ , θ ^◦ ₂ , ρ ^◦ ₁ , ρ ^◦ ₂ ) = (0.7, 0.0, 5.2, 4.0). We generate 500 samples of size n = 5000 with f con (u) = 0.3 + 4.2u(1 − u) (0 < u < 1) and carry out the proposed estimation for each sample with θ ₂ ^◦ = 0 known. To evaluate the estimation performance, using the resulting estimates, we calculate the bias (the median of the estimates minus the true value) and mean absolute error (MAE) with respect to the parameter components in the different scales: θ 1 in the original scale but ρ 1 and ρ 2 in the logarithm scale. Figure 5.2 shows the values of the absolute bias and MAE according to tail fraction k/n = 0.02, 0.04, . . . , 0.2.

The biases of the estimators of ρ 1 and ρ 2 steadily increase in magnitude from k/n = 0.04

while that of θ 1 is almost constant irrespective of the tail fraction. Meanwhile, for all the

components, the MAEs rapidly decrease until k/n = 0.1 but quite slowly later on because

of the contamination. On the other hand, for the case of θ ^◦ ₂ = 0 unknown, the performance

(8)

of the AIC is considered: we calculate the AIC values of the 3 models with θ ₂ fixed to 0; θ ₂ unfixed and κ = 0; θ ₂ unfixed and κ = 1, and then, select the model with the least value among them. The performance is evaluated as the rate of choosing the model with θ 2 fixed to 0. As seen in Table 5.1, the values of the rate are more than 80% for almost all of the tail fractions.

We compare the proposed estimator with ˆ Λ n in (2.9). The mean integrated absolute errors (MIAE)

E Z 1

0 | ˆ Λ n (u) − Λ(u)|du, and E Z 1

0 |Λ(u; ˆ θ) − Λ(u)|du

are adopted as the performance measure. As in the previous, the two cases of θ ₂ ^◦ = 0 known and unknown are considered. Especially, for the latter, we first choose a model using AIC and then estimate the parameter. Figure 5.3 shows the values of MIAEs of the estimators for the tail fractions. As seen therein, the proposed estimator outperforms ˆ Λ n even when θ ^◦ ₂ = 0 is unknown.

3.2. Real data analysis

In this subsection, we apply the proposed estimation to a real data. The data are the bivariate daily time series of close stock prices of Samsung Electronics and LG Electronics from Jan-01-2003 to Dec-30-2019, depicted in Figure 5.4. GARCH(1,1) model is fitted to each series of their log-returns for capturing the conditional heteroskedasticity, and then, we obtain the scaled residuals that are estimates of GARCH innovations. We are focusing on the extreme behavior of GARCH innovations in the third quadrant in the left panel of Figure 5.5 that is related to the investment risk. To estimate the spectral distribution, we apply the proposed method for the negative scaled residuals (X 1 , Y 1 ), . . . , (X n , Y n ) (n = 1526), which are displayed in the right panel of Figure 5.5. Let ψ (u; θ ) be taken as (2.17) or (2.18). As mentioned in Remark 2.2.2, the AIC is utilized for determining the values of κ ∈ {0, 1}, m ∈ {1, 2} and whether to set θ ₂ to 0. Tables 5.2-5.3 present the estima- tion results for k = 50, 100, 150. Based on the results, the final estimate is determined as (θ ₁ , θ ₂ , µ ₁ , µ ₂ , σ ₁ , σ ₂ , τ ₁ ) = (0.18, 0.24, −4.94, 0.30, 0.02, 0.60, 0.04) with ψ (u; θ ) taken as (2.18). Figure 5.6 shows the comparison between the fitted model and the empirical distri- bution of {U _i : R _i > R ₍₁₀₁₎ }, which supports the validity of the fitted model.

4. Concluding remark

This paper proposes a parametric method for estimating the spectral distribution of bi- variate regular variation efficiently. While the existing studies considered nonparametric methods, the parametric approach is adopted for attaining efficiency. The key condition is Fisher consistency (2.19), i.e., the parametric model has to be flexible enough to contain the true spectral distribution. This paper concentrates on mixture model that is widely em- ployed for flexibility. Actually, the mixture model seems to capture the spectral distributions of heavy tailed elliptical distributions well. Moreover, the simulation study and real data analysis support the proposed method.

This study concentrates on bivariate cases. However, for wide application to real financial

data, multidimensional data need to be dealt with. For higher dimensional data, the para-

(9)

metric method seems to be more promising than nonparametric one. Thus, the study on the parametric method for multidimensional data is left as a future subject.

5. Proofs

This section provides the proofs of the theorems and proposition in Section 2.1.

5.1. The proof of theorem 2.1

Since the proof is standard, we only present its sketch. It follows from (2.1) and A2 that Z 1

0 log λ(u; θ)d ˆ Λ _n (u) −→ ^P Z 1

0 log λ(u; θ)dΛ(u) for each θ ∈ Θ.

Moreover, for θ 0 ∈ Θ and r > 0, g(u; θ 0 , r) := sup{log λ(u, θ) : |θ − θ 0 | ≤ r} and g(u; θ 0 , r) :=

inf{log λ(u, θ) : |θ − θ ₀ | ≤ r} are bounded and continuous in u, thus, Z 1

0 g(u; θ 0 , r)d ˆ Λ n (u) −→ ^P Z 1

0 g(u; θ 0 , r)dΛ(u), Z 1

0 g(u; θ 0 , r)d ˆ Λ n (u) −→ ^P Z 1

0 g(u; θ 0 , r)dΛ(u).

Meanwhile, by dominated convergence theorem, as r → 0, Z 1

0 g(u; θ ₀ , r)dΛ(u) ↓ Z 1

0 log λ(u; θ ₀ )dΛ(u), Z 1

0 g(u; θ ₀ , r)d ˆ Λ _n (u) ↑ Z 1

0 log λ(u; θ ₀ )dΛ(u).

Using a compactness argument (cf. A1) with these facts, we can derive uniform convergence:

sup

θ∈Θ

Z 1 0

log λ(u; θ)d ˆ Λ n (u) − Z 1

0 log λ(u; θ)dΛ(u)

−→ 0. P

Furthermore, (2.4) implies Z 1

0 log λ(u; θ)dΛ(u) <

Z 1 0

log λ(u; θ ^◦ )dΛ(u) for θ ∈ Θ\{θ ^◦ }. (5.1) Hence, the uniform convergence combined with the compactness of Θ, the continuity of θ 7→ R 1

0 log λ(u; θ)dΛ(u) (cf. A2) and (5.1) asserts Theorem 2.1.

5.2. The proof of proposition 2.1 Fix i, j = 1, . . . , p arbitrarily. We have

Z 1 0

{Λ(u ₁ ∧ u ₂ ) − Λ(u ₁ )Λ(u ₂ )} ˙ ϕ _i (u ₁ )du ₁

= Z u

2

0 {Λ(u ₁ ) − Λ(u ₁ )Λ(u ₂ )} ˙ ϕ _i (u ₁ )du ₁ + Λ(u ₂ ) Z 1

u

₂

{1 − Λ(u ₁ )} ˙ ϕ _i (u ₁ )du ₁

(10)

Applying integration by part to each term in the righthand side, we verify that the righthand side is equal to

ϕ i (u 2 )Λ(u 2 ){1 − Λ(u 2 )} − {1 − Λ(u 2 )}

Z u

₂

0 ϕ i (u 1 )λ(u 1 ; θ ^◦ )du 1

+ Λ(u 2 )

−(1 − Λ(u 2 ))ϕ i (u 2 ) + Z 1

u

₂

ϕ i (u 1 )λ(u 1 ; θ ^◦ )du 1

= − Z u

₂

0 ϕ i (u 1 )λ(u 1 ; θ ^◦ )du 1 ,

since R 1

0 ϕ _i (u)dΛ(u) = 0 due to (2.5) and (5.1). Therefore, applying integration by part and equation R 1

0 ϕ _i (u)dΛ(u) = 0 again, we have Z 1

0 Z 1 0

{Λ(u ₁ ∧ u ₂ ) − Λ(u ₁ )Λ(u ₂ )} ˙ ϕ _i (u ₁ ) ˙ ϕ _j (u ₂ )du ₁ du ₂

= − Z 1

0 Z u

2

0 ϕ _i (u ₁ )λ(u ₁ )du ₁

˙

ϕ _j (u ₂ )du ₂ = Z 1

0 ϕ _i (u ₂ )ϕ _j (u ₂ )λ(u ₂ ; θ ^◦ )du ₂

= Z 1

0 ∂

∂θ i

log λ(u; θ ^◦ ) ∂

∂θ j

log λ(u; θ ^◦ )dΛ(u) = − Z 1

0 ∂ ²

∂θ i ∂θ j

log λ(u; θ ^◦ )dΛ(u), where the last equation hold due to (2.6). Hence, the proof is completed.

5.3. The proof of theorem 2.2

By Theorem 2.1 combined with A3, with probability tending to 1, ˆ θ resides in a open ball of θ ^◦ contained in int(Θ). In that case, applying mean value theorem to each component of

∂

∂θ `(θ) together with the definition of ˆ θ, we obtain 0 = ∂

∂θ `(ˆ θ) = ∂

∂θ `(θ ^◦ ) +

∂ ²

∂θ∂θ 1

`(θ ⁽¹⁾ ), · · · , ∂ ²

∂θ∂θ p

`(θ ^(p) )

⁰

(ˆ θ − θ ^◦ ) (5.2)

=: ∂

∂θ `(θ ^◦ ) + ˆ I(ˆ θ − θ ^◦ ),

where θ ⁽¹⁾ , . . . , θ ^(p) lie on the segment between ˆ θ and θ ^◦ . The proof is mainly divided into two parts: the first one is to prove the leading term multiplied by √

k converges weakly to N (0, W), and the second is ˆ I −→ I. ^P

(5.1) combined with (2.5) implies R 1

0 ϕ j (u)dΛ(u) = 0 for j = 1, . . . , p. Moreover, by inte- gration by part, we have

0 = Z 1

0 ϕ _j (u)dΛ(u) = ϕ _j (1) − Z 1

0 Λ(u) ∂

∂u ϕ _j (u)du, Z 1

0 ϕ _j (u)d ˆ Λ _n (u) = ϕ _j (1) − Z 1

0 Λ ˆ _n (u) ∂

∂u ϕ _j (u)du, j = 1, . . . , p,

due to Λ(0) = ˆ Λ _n (0) = 0 and Λ(1) = ˆ Λ _n (1) = 1. Thus, the j-th element of _∂θ ^∂ `(θ ^◦ )

(11)

(j = 1, . . . , p) is equal to

∂

∂θ j

Z 1 0

log λ(u; θ ^◦ )d ˆ Λ n (u) = Z 1

0 ϕ j (u)d ˆ Λ n (u)

= Z 1

0 ϕ j (u)d ˆ Λ n (u) − Z 1

0 ϕ j (u)dΛ(u) = − Z 1

0 Λ ˆ n (u) ∂

∂u ϕ j (u)du + Z 1

0 Λ(u) ∂

∂u ϕ j (u)du.

Therefore,

√ k n

∂

∂θ `(θ ^◦ ) = − Z 1

0 √

k n { ˆ Λ n (u) − Λ(u)} ˙ ϕ(u)du = Υ √

k n { ˆ Λ n (u) − Λ(u)} , where

Υ(g) := − Z 1

0 g(u) ˙ ϕ(u)du

= −

Z 1 0

g(u) ∂

∂u ϕ ₁ (u)du, · · · , Z 1

0 g(u) ∂

∂u ϕ _p (u)du

⁰

, for g ∈ D[0, 1].

A4 implies that Υ(g) is a mapping from D[0, 1] to R ^p which is continuous at continuous function g, thus, by applying mapping theorem to (2.2), we obtain √

k n ∂

∂θ `(θ ^◦ ) −→ Υ(G ^d ^◦ ).

Moreover, the weak limit Υ(G ^◦ ) is distributed as multivariate normal with zero mean vector and covariance matrix W. Therefore,

√ k _n ∂

∂θ `(θ ^◦ ) −→ N (0, W). ^d (5.3)

Theorem 2.1 implies that θ ⁽¹⁾ , . . . , θ ^(p) in (5.2) converge in probability to θ ^◦ . Moreover, in a similar fashion to the proof of Theorem 2.1, we can derive from A5 that _∂θ∂θ ^∂

² 0

`(θ) converges in probability to I(θ) uniformly in a neighborhood of θ ^◦ and θ 7→ I(θ) is continuous, thus, ˆ I −→ I. Hence, the proof is completed by (5.3), invertibility of I and Proposition 2.1. ^P

Table 5.1 The AIC performance: the rate of choosing the model with θ 2 fixed to 0

k n /n 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

rate 0.786 0.816 0.814 0.818 0.822 0.808 0.812 0.814 0.806 0.800

Table 5.2 The estimation result under the model where ψ (u; θ ) is taken as (2.17). The standard errors are presented in the parentheses.

k n θ 1 θ 2 ρ 1 ρ 2 κ AIC

50 0.48(0.14) 0.00(-) 7.76(3.53) 6.91(3.30) - −8.96

100 0.15(0.16) 0.31(0.10) 7.30(2.78) 5.36(1.89) 0 −21.74

150 0.47(0.09) 0.00(-) 6.19(1.92) 5.19(1.62) - −31.30

(12)

Table 5.3 The estimation result under the model where ψ (u; θ ) is taken as (2.18).

The standard errors are presented in the parentheses.

k n θ 1 θ 2 µ 1 µ 2 σ 1 σ 2 τ 1 κ AIC

50 0.37(0.14) 0.00(-) −4.83(0.10) 0.13(0.14) 0.13(0.07) 0.57(0.13) 0.06(0.04) - −11.29 100 0.18(0.16) 0.24(0.10) −4.94(0.01) 0.30(0.12) 0.02(0.01) 0.60(0.11) 0.04(0.02) 0 −27.50 150 0.41(0.01) 0.00(-) −4.86(0.08) 0.19(0.10) 0.12(0.05) 0.65(0.10) 0.03(0.02) - −34.48

0.0 0.2 0.4 0.6 0.8 1.0

0.00.51.01.52.0

ς1=1 ς2=1 ρ =0.5 α =4

u

λ(u)

Elliptical Beta Normal mixture

0.0 0.2 0.4 0.6 0.8 1.0

0.00.51.01.52.0

ς1=1 ς2=1.5 ρ =0.5 α =4

u

λ(u)

0.0 0.2 0.4 0.6 0.8 1.0

012345

ς1=1 ς2=1 ρ =0.9 α =4

u

λ(u)

0.0 0.2 0.4 0.6 0.8 1.0

0.00.51.01.52.0

ς1=1 ς2=1 ρ =0.5 α =2

u

λ(u)

Figure 5.1 The similarity between the spectral density of heavy tailed elliptical distribution and λ(u; θ ^? )

for 4 cases of (ς 1 , ς 2 , %, α)

(13)

1 1 1 1 1 1 1 1 1 1

0.05 0.10 0.15 0.20

0.000.050.100.150.20

The estimation performance w.r.t. θ

1

tail fraction k/n

absolute bias/MAE

2

2 2

2

2 2 2 2 2

bias MAE

1

1 1 1

1 1 1 1 1 1

0.05 0.10 0.15 0.20

0.00.20.40.60.81.0

The estimation performance w.r.t. ρ

1

tail fraction k/n

absolute bias/MAE (log−scale)

2

2 2 2 2 2 2

bias MAE

1

1 1 1 1 1 1 1 1 1

0.05 0.10 0.15 0.20

0.00.20.40.60.81.0

The estimation performance w.r.t. ρ

2

tail fraction k/n

absolute bias/MAE (log−scale)

2

2 2 2 2 2 2

bias MAE

Figure 5.2 The performance of the proposed estimator

(14)

1 1 1

1 1 1 1 1 1 1

0.05 0.10 0.15 0.20

0.000 0.010 0.020 0.030

Performance Comparison

tail fraction k/n

MIAE

2 2 2

2 2 2 2 2 2 2

3

3 3 3

3 3 3 3 3 3

proposed with θ

2

= 0 known proposed with θ

2

= 0 unknown Λ ^

n

Figure 5.3 The performance comparison with ˆ Λ n in (2.9)

0 1000 2000 3000 4000

0 50000 100000 150000

Samsung and LG

Daily Time (Jan−01−2003 ~ Dec−30−2019)

Close Pr ice

LG Samsung

Figure 5.4 The time series of close prices of LG Electronics and Samsung Electronics

(15)

−5 0 5 10

−4 −2 0 2 4

The scaled residuals

LG

Samsung

0 1 2 3 4 5 6 7

0 1 2 3 4 5

The negative scaled residuals

LG

Samsung

Figure 5.5 The scatter plots of the scaled residuals (left panel) and the negative ones in the third quadrant of the left plot (right panel)

spectral distribution

U

Density

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.5 1.0 1.5 2.0

0.0 0.2 0.4 0.6 0.8 1.0

Q−Q plot

U

fitted model

Figure 5.6 The comparison between the fitted model and the empirical distribution of {U i : R i > R (101) }.

The black solid line in the left panel indicates the density curve of the fitted model.

(16)

References

Choi, B. (2019). A study on goodness-of-fit test for extreme value distribution. Journal of Korean Data &

Information Science Society, 30, 539-549.

Einmahl, J. H., Dehaan, L. and Huang, X. (1993). Estimating a multidimensional extreme-value distribution.

J. Multivariate Anal., 47, 35-47.

Einmahl, J. H., Yang, F. and Zhou, C. (2020). Testing the multivariate regular variation model. Journal of Business & Economic Statistics, 1-13.

Hult, H. and Lindskog, F. (2002) Multivariate extremes, aggregation and dependence in elliptical distribu- tions. Adv. Appl. Prob., 34, 587-608.

Kang, S.-B. (2005). Estimation for the extreme value distribution based on multiply type-ii censored samples.

Journal of Korean Data & Information Science Society, 16, 629-638.

Kl¨ uppelberg, C., Kuhn, G. and Peng, L. (2007). Estimating the tail dependence function of an elliptical distribution. Bernoulli , 13 229-251.

Teicher, H. (1963). Identifiability of finite mixtures. Ann. Math. Stat., 34, 1265-1269.

Mixture modeling for efficiently estimating the spectral distribution of bivariate regular variation