• 검색 결과가 없습니다.

M-estimation of the long-memory parameter by Laplace periodogram<sup>†</sup>

N/A
N/A
Protected

Academic year: 2021

Share "M-estimation of the long-memory parameter by Laplace periodogram<sup>†</sup>"

Copied!
10
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

M-estimation of the long-memory parameter by Laplace periodogram

Yaeji Lim 1

1 Department of Statistics, Pukyong National University

Received 16 January 2018, revised 31 January 2018, accepted 12 February 2018

Abstract

The estimation of the long-memory parameter is a crucial issue in the long-range dependent process. The log-regression method proposed by Geweke and Porter-Hudak (1983) is one of the popular semi-parametric approach to estimate the long-memory parameter. However, the conventional method is highly influenced by the presence of outliers or heavy-tailed distributed errors. This paper investigates the possibility of using Laplace periodogram to analyze long-memory processes. Laplace periodogram derived by the least absolute deviations in the harmonic regression procedure is a robust alternative to the ordinary periodogram for spectral analysis. Numerical studies including simulation study and real data analysis are presented for the comparison.

Keywords: ARFIMA, Laplace periodogram, log periodogram regression, long-memory process, robustness.

1. Introduction

In many statistical models such as regression and time series, it is typically assumed that a random sequence {Y t } is a white noise process or a stationary process with short-range de- pendence such as an ARMA process (Hurst, 1951; Baillie, 1996; Baillie and Chung, 2002; Lee et al., 2013; Myoung et al., 2013). However, in several fields of sciences including astronomy, economics, hydrology, and signal processing, this assumption is too stringent to represent real data (Beran, 1994; Doukhan et al., 2003). Instead, the process {Y t } exhibits slow decay in autocorrelation (or autocovariance), which is often referred as long-memory or long-range dependence. The spectral density function f Y (ω) is approximated in the neighbourhood of zero frequency by

f Y (ω) ∼ c 1 (k)ω −2d , as ω → 0 + ,

where c 1 (k) is slowly varying function at 0. Here, d ∈ (0, 1/2) is called as the long memory parameter. Therefore, the spectral density of the long-memory process has a pole at zero frequency. In the time domain, the long-memory process can be described by the correlation function,

† This work was supported by the Research Grant of Pukyong National University (2017 year).

1

Corresponding author: Assistant professor, Department of Statistics, Pukyong National University, 1

Yongsoro, Nam-gu, Busan 48513, Korea. E-mail: [email protected]

(2)

ρ(k) := Corr(Y t+k , Y t ) ∼ c 2 (k)k 2d−1 as k → ∞, (1.1) where c 2 (k) is slowly varying function at infinity and positive for large k and the ρ(k) is not absolutely summable (Bhansali and Kokoszka, 2001; Beran, 2013). By contrast, {Y t } is a short-memory process if its correlation function is absolutely summable. Common examples of long-memory processes are fractionally integrated white noise and fractionally integrated ARMA (ARFIMA) models.

More generally, the process is stationary if d < 1/2 and invertible if d > −1/2. A stationary process with d ∈ (0, 1/2) is characterized as a long-memory process. Thus, identifying the parameter d is a crucial issue to characterize the process.

Several methods have been proposed to estimate the long-memory parameter in para- metric or semi-parametric framework. In the parametric context, the regression approach developed by Geweke and Porter-Hudak (1983) is popular with maximum likelihood ap- proach. It is based on a regression model with log periodogram. In this paper, we investigate the possibility of using Laplace periodogram to analyze long-memory processes in the pres- ence of outliers or heavy-tailed distributed errors. The Laplace periodogram of Li (2008), derived by least absolute deviations (LAD) in the harmonic regression procedure, is a robust alternative to the ordinary periodogram for spectral analysis.

For robust analysis of long-memory processes, we use Laplace periodogram of Li (2008), and derive asymptotic distribution of the Laplace periodogram for long-memory processes.

Furthermore, along with the line of Geweke and Porter-Hudak (1983), we propose a new method to estimate the parameter d by log Laplace periodogram regression.

The rest of this paper is organized as follows. In Section 2, we consider Laplace peri- odogram of long-memory processes and its asymptotic properties are investigated. Section 3 presents a new method to estimate d based on a combination of log regression and the Laplace periodogram. In Section 4, the numerical examples including simulation study and real data analysis are discussed for evaluating the practical performance. Finally, concluding remarks are given in Section 5.

2. Laplace periodogram

Suppose that we observe a sequence of time series data {Y 1 , . . . , Y n }. Then the ordinary periodogram is defined by

I n (ω) = 1 n

n

X

t=1

Y t exp(itω)

2

for a frequency parameter ω ∈ (0, π). Here i = √

−1. When ω is a Fourier frequency, i.e., ω = 2πk/n for some integer k, the ordinary periodogram can be written similarly as in regression formulation as Li (2008)

I n (ω) = n

4 k ˜ β n (ω)k 2 ,

where k · k denotes the ` 2 norm of vectors, and ˜ β n (ω) is the solution of the conventional

least squares problem

(3)

β ˜ n (ω) = arg min β ∈R

2

n

X

t=1

Y t − x T t (ω)β

2

with the harmonic regressor x t (ω) = [cos(ωt), sin(ωt)] T .

As a robust alternative of the ordinary periodogram, Li (2008) proposed Laplace peri- odogram by replacing the least squares criterion with LAD. Then the new regression coeffi- cient is defined as

β(ω) = arg min β ∈R

2

E

Y − x T (ω)β , and its sample version as

β ˆ n (ω) = arg min β ∈R

2

n

X

t=1

Y t − x T t (ω)β

. (2.1)

We now define Laplace periodogram as in Li (2008), to be

L n (ω) = n

4 k ˆ β n (ω)k 2 . (2.2)

3. Log regression via Laplace periodogram

The autoregressive fractionally integrated moving average (ARFIMA) models, which is a generalized ARIMA models by allowing non-integer values of the differencing parameter, is a long-memory process. The process {Y t } is said to be an ARFIMA(p, d, q) process with d ∈ (0, 1/2) if {Y t } is stationary and satisfies the difference equations

φ(B)∇ d Y t = θ(B)Z t , (3.1)

where {Z t } is a white noise process, and φ and θ denote polynomials of degrees p and q, respectively. Note that the process exhibits short memory when d = 0, and this is correspond to stationary and invertible ARMA model.

For the ARFIMA(p, d, q) process {Y t } with d ∈ (0, 0.5) in (3.1), the spectral density is defined as

f (ω) = 4 −d sin −2d  ω 2



f e (ω), (3.2)

where f e (ω) is the spectral density of the ARMA process. It can be shown that, for ω → 0, f (ω) ∼ C 2 ω −2d for positive constant C 2 , which implies that the spectral density has a peak at zero frequency.

By using the fact of (3.2), Geweke and Porter (1983) proposed a regression approach to estimate long-memory parameter d based on the first m ordinary periodogram. For harmonic frequencies ω j = 2πj/n, taking logarithm in (3.2) and adding ln I n (ω j ) to both sides yield

ln I nj ) = ln f ej ) − d ln n

4 sin 2 ω j 2

 o

+ ln n I nj ) f (ω j )

o , (j = 1, . . . , m).

(4)

When m/n → 0 for n → ∞, it follows that the frequency ω j converges to zero, which makes the first term of the right-hand side asymptotically constant. Note that to represent dependency of m on n, we set m := m(n). We now consider a model

z j = c 0 + dR j,n + u j , j = 1, . . . , m(n), (3.3) where z j = ln I n (ω j ), c 0 = ln f e (0) + ξ(1), R j,n = − ln 4 sin 2 ω 2

j

 , and u j = ln  I

n

j

)

f (ω

j

) − ξ(1) with ξ(a) = ∂ log Γ(a)/∂a, the logarithmic derivative of gamma function. Geweke and Porter (1983) established that, for −0.5 < d < 0, ln  I

n

j

)

f (ω

j

) is a sequence of approximately independent Gumble random variables with mean ξ(1) = −0.5772, and showed that the ordinary least squares estimator and its asymptotic variance are given by

d − d = ˆ P m

j=1 (R j,n − R m )u j P m

j=1 (R j,n − R m ) 2 , and Var( ˆ d) = π 2 6

n X m

j=1

R j,n − R m

 2 o −1

,

where R m = m −1 P m

j=1 R j,n . Robinson (1995) further investigated the consistency and asymptotic normality of ˆ d under some additional assumptions with Gaussian time series.

Here, we consider a new regression model by replacing the ordinary periodogram I n (ω) with the Laplace periodogram L n (ω) in the model (3.3). We then build a model

` j = a + dR j,n + s j , j = 1, . . . , m(n), (3.4) where ` j = ln L n (ω j ), a = ln[f e (0)] + ξ(1), and s j = ln  L

n

j

)

f (ω

j

) − ξ(1).

Then the proposed least squares estimator based on the model (3.4) is d = d + ˜

P m

j=1 (R j,n − R m )s j

P m

j=1 (R j,n − R m ) 2 .

To derive the asymptotic distribution of ˜ d, we need to prove that the ln  L

n

j

) f (ω

j

) is a sequence of approximately independent random variables with mean ξ(1) = −0.5772, as in the conventional approach. Since Li (2008) proved the asymptotic distribution of the Laplace periodogram L nj ), we may further extend Li’s approach to prove the asymptotic properties of the proposed estimator. Future research should address the asymptotic theory for the proposed estimator.

4. Numerical examples

4.1. Simulation study

We perform a simulation study in order to compare ˆ d from ordinary log regression with ˜ d from log Laplace periodogram regression.

Example 4.1 ARFIMA(0, d, 0)

The data are generated from the following model (Molinares et al., 2009).

(1 − α)X 1 + αX 2 , (4.1)

(5)

where X 1 follows ARFIMA(0, d, 0) with d = 0.30 and 0.45, i.e., (1 − B) d X 1t = Z t (t = 1, . . . , n), where Z t ∼ W N (0, 1), and X 2 follows N (0, σ 2 ). Three different sample sizes are considered as n = 100, 300, and 800. We set m(n) = n γ with γ = 1/2. We also consider three contamination rates (α = 0, 0.05, 0.1), and two noise levels (σ = 5, 7). For each combination of d, α, σ, and n, 500 samples were generated. For each dataset, the conventional estimator d based on the model (3.3) and the proposed estimator ˜ ˆ d from the model (3.4) are applied to estimate the parameter d.

As measures for comparison, we consider average (mean), standard deviation (s.d) and mean squared errors (MSE) of estimates. From the results in Table 4.1, we observe that the proposed method outperforms the conventional method in every contaminated case, and results from both methods are almost identical for the cases of α = 0. Results for σ = 7 are similar, and hence are omitted.

To evaluate how to handle outliers in the model (4.1), we consider the case that (1 − B) d X 1t = U t (t = 1, . . . , n), where U t follows t-distribution with three degrees of freedom.

As shown in Table 4.2, the proposed method is superior to the conventional method.

It is well-known that the conventional estimator ˆ d suffers from significant negative bias mainly due to the existence of the spectrum of non-Gaussian noise (Breidt et al., 1998; Deo and Hurvich, 2001). Therefore the underestimations of contaminated cases are expected results. The proposed estimator ˜ d also underestimates the parameter when the data is con- taminated. However, when the ARFIMA model is generated by the t-distribution, ˜ d has positive bias (Table 4.2). More details about the bias of the estimator and the reduction method are left for the future study.

Example 4.2 ARFIMA(p, d, q)

Here we consider ARFIMA(1, d, 0) processes for X 1 in the model (4.1). That is, we generate the process {X 1t } from

(1 − φB)(1 − B) d X 1t = Z t ,

where Z t ∼ W N (0, 1), d = 0.3, and φ = 0.2 and 0.7. With the same setting as in Example 4.1, we compute mean, sd, and MSE over 500 replications. As listed in Table 4.3, for uncon- taminated cases, the proposed method ˜ d is comparable to the conventional method ˆ d, and the proposed procedure outperforms for contaminated cases.

4.2. Real data analysis

We perform the proposed method for spectral analysis of paleoclimatic glacial varve thick-

ness series in Figure 4.1, which is yearly collected at one location in Massachusetts for

n = 634. We note that glaciers deposits, called varves, have been used as proxies for paleo-

climatology. The same series data have been taken by Shumway and Stoffer (2005) for their

analysis of long-memory process. For detailed information of the data, refer to Shumway and

Verosub (1992). For our analysis, we consider both the original data and log-transformed

data in Figure 4.1. As shown, the log-transformation improves the normality assumption

clearly. For both series, we perform both the conventional method and the proposed method

to estimate the parameter d. Figure 4.2 shows the estimated slopes by the ordinary peri-

odogram and Laplace periodogram with setting m(n) = 180. We observe that the proposed

(6)

Table 4.1 Contaminated normal ARFIMA(0, d, 0) with σ = 5 α = 0

d n d ˆ d ˜

0.30 100

mean 0.363 0.328 s.d 0.303 0.308 MSE 0.096 0.096

300

mean 0.295 0.238 s.d 0.196 0.196 MSE 0.039 0.042

800

mean 0.317 0.280 s.d 0.134 0.121 MSE 0.018 0.015

0.45 100

mean 0.460 0.380 s.d 0.301 0.281 MSE 0.091 0.084

300

mean 0.466 0.410 s.d 0.199 0.040 MSE 0.203 0.043

800

mean 0.440 0.397 s.d 0.157 0.157 MSE 0.025 0.027

α = 0.05

d n d ˆ d ˜

0.30 100

mean 0.118 0.239 s.d 0.283 0.259 MSE 0.113 0.071

300

mean 0.180 0.243 s.d 0.213 0.203 MSE 0.060 0.044

800

mean 0.188 0.266 s.d 0.158 0.141 MSE 0.038 0.021

0.45 100

mean 0.188 0.371 s.d 0.298 0.250 MSE 0.158 0.069

300

mean 0.285 0.372 s.d 0.239 0.194 MSE 0.084 0.044

800

mean 0.344 0.402 s.d 0.160 0.145 MSE 0.037 0.023 α = 0.10

d n d ˆ d ˜

0.30 100

mean 0.060 0.216 s.d 0.265 0.243 MSE 0.128 0.066

300

mean 0.138 0.262 s.d 0.200 0.175 MSE 0.066 0.032

800

mean 0.165 0.256 s.d 0.157 0.130 MSE 0.043 0.019

0.45 100

mean 0.143 0.287 s.d 0.305 0.287 MSE 0.187 0.109

300

mean 0.237 0.373 s.d 0.206 0.193 MSE 0.087 0.043

800

mean 0.338 0.408

s.d 0.163 0.142

MSE 0.039 0.022

(7)

Table 4.2 Contaminated ARFIMA(0, d, 0) generated by t-distribution with three degrees of freedom and α = 0.1. Note that the true d = 0.45

t-distributed ARFIMA

σ n d ˆ d ˜

5

100

mean 0.236 0.456

s.d 0.287 0.262

MSE 0.128 0.069

800

mean 0.369 0.474

s.d 0.147 0.146

MSE 0.028 0.022

7

100

mean 0.274 0.507

s.d 0.286 0.267

MSE 0.113 0.075

800

mean 0.362 0.461

s.d 0.176 0.136

MSE 0.039 0.019

Table 4.3 ARFIMA(1, d, 0) with σ = 5 and d = 0.30 α = 0

φ n d ˆ d ˜

0.2 100

mean 0.229 0.220 s.d 0.348 0.303 MSE 0.126 0.098

300

mean 0.221 0.256 s.d 0.218 0.227 MSE 0.054 0.053

800

mean 0.271 0.272 s.d 0.158 0.166 MSE 0.026 0.028

0.7 100

mean 0.462 0.472 s.d 0.302 0.259 MSE 0.118 0.096

300

mean 0.374 0.346 s.d 0.193 0.186 MSE 0.043 0.037

800

mean 0.301 0.300 s.d 0.153 0.157 MSE 0.023 0.025

α = 0.10

φ n d ˆ d ˜

0.2 100

mean 0.125 0.267 s.d 0.352 0.276 MSE 0.154 0.077

300

mean 0.148 0.283 s.d 0.212 0.216 MSE 0.068 0.047

800

mean 0.199 0.269 s.d 0.141 0.144 MSE 0.030 0.022

0.7 100

mean 0.370 0.454 s.d 0.308 0.318 MSE 0.045 0.036

300

mean 0.316 0.339 s.d 0.211 0.186 MSE 0.064 0.043

800

mean 0.303 0.313

s.d 0.149 0.166

MSE 0.022 0.028

(8)

method provides robust results according to two series ( ˜ d = 0.368 for the original data and d = 0.365 for the log-transformed varve series), compared to the conventional approach. ˜

varve

Time

0 100 200 300 400 500 600

050100150

log(varve)

Time

0 100 200 300 400 500 600

2345

Figure 4.1 Glacial varve thickness series from one location in Massachusetts for n = 643 years (top), and its log transformed thicknesses (bottom)

5. Conclusion

In this paper, we have proposed a new robust method to estimate the long-memory pa- rameter. By a combination of log regression with Laplace periodogram, we have achieved robustness of the estimator. Throughout numerical studies, we have shown that the pro- posed estimator outperforms the conventional log periodogram regression method, especially when the data are contaminated. However, theoretical justification of the proposed estima- tor is missing. We need to prove the asymptotic distribution of ln  L

n

j

)

f (ω

j

) to verify the

log-regression using Laplace periodogram. Also, theoretical comparison between the conven-

tional estimator and the proposed one need to be provided. If we can prove the asymptotic

distribution of the proposed estimator, we may theoretically show the superiority of the

estimator. Before then, our conjecture is only supported by the numerical experiments. We

left this issue as the future study.

(9)

-10 -8 -6 -4 -2 0

246810

log frequency

log periodogram

slope= 0.438

-10 -8 -6 -4 -2 0

-20246810

log frequency

log Laplace periodogram

slope= 0.368

-10 -8 -6 -4 -2 0

-6-4-202

log frequency

log periodogram

slope= 0.399

-10 -8 -6 -4 -2 0

-8-6-4-2024

log frequency

log Laplace periodogram

slope= 0.365

Figure 4.2 Estimates of d by the ordinary periodogram and Laplace periodogram of the original (top) and the log-transformed varve series (bottom). From left and top panel, ˆ d = 0.438 for the original data, ˜ d = 0.368 for the original data, ˜ d = 0.365 for the log-transformed varve series, and ˆ d = 0.399 for the log-transformed varve series, clock-wisely. Note that m(n) = 180.

References

Baillie, R. T. (1996). Long memory processes and fractional integration in econometrics. Journal of Econo- metrics, 73, 5-59.

Baillie, R. T. and Chung, S. K. (2002). Modeling and forecasting from trend-stationary long memory models with applications to climatology. International Journal of Forecasting, 18, 215-226.

Beran, J. (1994). Statistics for long-memory processes, Chapman & Hall, New York.

Beran, J., Feng, Y., Ghosh, S. and Kulik, R. (2013). Long memory processes-probabilistic properties and statistical models, Springer, New York.

Bhansali, R. J. and Kokoszka, P. S. (2001). Estimation of the long-memory parameter: A review of recent developments and an extension. Lecture Notes-Monograph Series, 125-150.

Breidt, F., Crato, N. and de Lima, P. (1998). The detection and estimation of long memory in stochastic volatility. Journal of Econometrics, 83, 325-348.

Deo, R. and Hurvich, C. M. (2001). On the log periodogram regression estimator of the memory parameter in long memory stochastic volatility models. Econometric Theory, 17, 686-710.

Doukhan, P., Oppenheim, G. and Taqqu, M. S. (2003). Theory and applications of long-range dependence, Birkhauser, Boston.

Geweke, J. and Porter-Hudak, S. (1983). The estimation and application of long memory time series models.

Journal of Time Series Analysis, 4, 221-238.

Hurst, E. (1951). Long-term storage capacity of reservoirs. Transactions of the American Society of Civil

(10)

Engineers, 116, 770-799.

Koul, H. L. and Mukherjee, M. (1993). Asymptotics of R-, MD- and LAD-estimators in linear regression models with long range dependent errors. Probability Theory and Related Fields, 95, 535-553.

Lee, Y. S., Kim, J., Jang, M. S. and Kim, H. G. (2013). A study on comparing short-term wind power prediction models in Gunsan wind farm. Journal of the Korean Data & Information Science Society, 24, 585-592.

Li, T.-H. (2008). Laplace periodogram for time series analysis. Journal of the American Statistical Associ- ation, 103, 757-768.

Myoung, S., Kim, D., Lee, D. J., Kim, H. S. and Jo, J. (2013). An analysis of time series models for toilet and laundry water-uses. Journal of the Korean Data & Information Science Society, 24, 1141-1148.

Molinares, F. F., Reisen, V. A. and Cribari-Neto, F. (2009). Robust estimation in long-memory processes under additive outliers. Journal of Statistical Planning and Inference, 139, 2511-2525.

Robinson, P. M. (1995). Log-periodogram regression of time series with long range dependence. The Annals of Statistics, 23, 1048-1072.

Shumway, R. H. and Verosub, K. L. (1992). State space modeling of paleoclimatic time series. Proceeding of 5th International Meeting Statistical Climatology, 22-26.

Shumway, R. H. and Stoffer, D. S. (2005). Time series analysis and its applications with R examples,

Springer, New York.

수치

Table 4.1 Contaminated normal ARFIMA(0, d, 0) with σ = 5 α = 0 d n d ˆ d ˜ 0.30 100 mean 0.363 0.328s.d0.3030.308MSE0.0960.096300mean0.2950.238s.d0.1960.196 MSE 0.039 0.042 800 mean 0.317 0.280s.d0.1340.121 MSE 0.018 0.015 0.45 100 mean 0.460 0.380s.d0.301
Table 4.2 Contaminated ARFIMA(0, d, 0) generated by t-distribution with three degrees of freedom and α = 0.1
Figure 4.1 Glacial varve thickness series from one location in Massachusetts for n = 643 years (top), and its log transformed thicknesses (bottom)
Figure 4.2 Estimates of d by the ordinary periodogram and Laplace periodogram of the original (top) and the log-transformed varve series (bottom)

참조

관련 문서

Modern Physics for Scientists and Engineers International Edition,

2재화 2요소 헥셔-올린 모형에서는 어느 한 경제에서 어느 한 요소의 양이 증가하면, 그 요소를 집약적으로 사용하는 산업의 생산량은 증가하고 다른

웹 표준을 지원하는 플랫폼에서 큰 수정없이 실행 가능함 패키징을 통해 다양한 기기를 위한 앱을 작성할 수 있음 네이티브 앱과

_____ culture appears to be attractive (도시의) to the

The index is calculated with the latest 5-year auction data of 400 selected Classic, Modern, and Contemporary Chinese painting artists from major auction houses..

The “Asset Allocation” portfolio assumes the following weights: 25% in the S&amp;P 500, 10% in the Russell 2000, 15% in the MSCI EAFE, 5% in the MSCI EME, 25% in the

1 John Owen, Justification by Faith Alone, in The Works of John Owen, ed. John Bolt, trans. Scott Clark, &#34;Do This and Live: Christ's Active Obedience as the

In gi ngi va,LCs are found i n oralepi thel i um ofnormalgi ngi va and i n smal l er amountsi nthesul cul arepi thel i um,buttheyareprobabl yabsentfrom thejuncti onal epi thel