Estimation of the excess relative risk using the piecewise linear model with Gaussian process †
Yeongwoo Park 1 · Yongku Kim 2
12 Department of Statistics, Kyungpook National University
Received 15 October 2020, revised 2 November 2020, accepted 3 November 2020
Abstract
As technology develops, many people are exposed to radiation in hospitals and living spaces. It is widely known that radiation-related cancer risk increases as the background rate increases over time. Accurately quantifying the risk of exposure to radiation is a major study in radiation epidemiology. The solid cancer incidence data among Life Span Study (1958-2009) was used to estimate the risk associated with radiation by using Bayesian analysis. First, we considered the piecewise linear model, which estimates the slope by dividing the dose range, as dose-response function. In the piecewise linear model, the slope tends to change rapidly at the cut points. In this paper, we consider the Gaussian process with a covariance matrix to allow a dose difference structure to the dose category slopes. Finally we compare the results with other models using Bayesian analysis. As a result, estimated risk appeared slightly smaller than the Piecewise linear model with general assumption.
Keywords: Bayesian analysis, dose-response model, excess relative risk, Gaussian Pro- cess, piecewise linear model, radiation exposure.
1. Introduction
As technology develops, many people are receiving X-rays and CT scans for better treat- ment in hospitals, and some surgery such as a stent insertion into the heart. We also know that radon, an invisible but natural radiation, is floating in the air. As such, we are exposed to various radiations in everyday life. Generally, a low level of exposure doesn’t give an immediate response to the body. However, it is widely known that radiation-related cancer risk increases as the background rate increases over time. To accurately quantify the risk of exposure to radiation is a major study in the field of radiation epidemiology, and many studies have been actively conducted with persistence (BEIR VII, 2006; UNSCEAR, 2019).
Some statistical approaches are being used to estimate the risk of cancer incidence as- sociated with radiation exposure. To solve the ambiguity of the quantitative link to doses
† This research was supported by the Research Grants of Korea Forest Service (Korea Forestry Promotion Institute) project (No.2019149B10-2023-0301).
1
Ph.D. candidate, Department of Statistics, Kyungpook National University, Daegu 41566, Korea
2
Corresponding auther: Associate professor, Department of Statistics, Kyungpook National University,
Daegu 41566, Korea. E-mail: [email protected]
0.2Sv, Sasaki et al. (2014) conducted a cancer risk assessment in A-bomb survivors through a nonparametric statistics based on the integrate and fire algorithm of the artificial neural network (ANN). Dropkin (2016) analyzed cancer incidence by using the Generalized Addi- tive Model (GAM) for low dose radiation risk of Japanese atomic bomb women survivors. An (2018) presented the estimate of the expectile regression for the approximate linear model of the non-linear model, and Lee (2020) presented the estimate using percentile regression in the nonlinear Poisson regression analysis, such as excess relative risk (ERR). Furukawa et al. (2016) analyzed a Bayesian semiparametric model using a piecewise linear dose-response model to reduce the uncertainty of low dose. Park et al. (2020) estimated the parameters of some dose-response functions in order to apply Bayesian analysis and presented by compar- ing the results with the piecewise linear model.
It is important to accurately quantify the potential cancer risk as a risk assessment method for radiation exposure. Risk regression in the Life Span Study cohort data at RERF is based on an ERR model (Ozasa et al., 2012; Preston et al., 2007). ERR means how much radiation- exposed people have exceeded the risk compared to those who do not. And it is fitted to cross-classified data by using Poisson nonlinear regression (Breslow and Day, 1987; Frome, 1983). The most important thing is to quantitatively indicate the degree of risk associated with radiation exposure.
To solve the problem that the uncertainty of the risk estimate at low dose is underesti- mated, we have considered the piecewise linear models with different slopes in each range using cutpoints. In particular, the Gaussian process is assumed for the slope of each dose category. To compare the results, we considered models that are widely used as dose-response models in radiation studies and included simple assumptions about dose category-specific slopes.
Some statistical approaches are being applied to accurately characterize the dose-response relation, and studies using the Bayesian approach have been actively conducted lately. In order to estimate the radiation risk related to the incidence rate, we considered the piecewise linear model, which estimates the slope by dividing the dose range, with similar slope for the adjacent range, but it has a problem that the slope changes rapidly at the cutpoints.
In this paper, in order to solve this problem, the parameters were estimated by applying the Gaussian process to the slope on the piecewise linear model setting. And we tried to compare the results with other models estimated using Bayesian analysis.
2. Materials and Methods
2.1. Life Span Study data
The Life Span Study (LSS) incidence data of atomic-bomb survivors is used to estimate
parameters of excess relative risk for radiation exposure. The LSS data is the cohort data
constructed by atomic bomb survivors of Hiroshima and Nagasaki. Eligible members of the
data included 105,444 subjects, who survived and had no record of cancer at the time of
starting to follow up. The follow up period of the data is from 1958 to 2009, it provides
information about 3,079,484 person-years of follow up. The data set is cross-classified for
gender, radiation dose, exposure age and attained age. Accordingly, the number of events
and person-years are organized. The detailed description of the material is well documented
(Ozasa et al., 2012; Grant et al., 2017), and the LSS data are available at the site of the
Radiation Effects Research Foundation (https://www.rerf.or.jp/en/library/data-en/
lssinc17e/). The distribution on gender, attained age in the LSS cancer incidence cohort data by radiation dose are shown at Table 2.1.
Table 2.1 Distribution of radiation dose by sex and exposure age (%) Weigthed absorbed colon dose (Gy)
subject NIC
∗0-0.005 0.005-0.5 0.5-1 1≤
Total 105,444(100) 25,239(23.9) 35,978(34.1) 39,031(37.0) 3,136(3.0) 2,060(2.0) Sex
male 42,910(100) 10,488(24.4) 14,574(34.0) 15,608(36.4) 1,282(3.0) 958(2.2) female 62,534(100) 14,751(23.6) 21,404(34.2) 23,423(37.5) 1,854(3.0) 1,102(1.8) Age at exposure
<10 22,708(100) 4,995(22.0) 7,928(34.9) 8,909(39.2) 505(2.2) 371(1.6) 10-19 23,079(100) 5,878(18.7) 7,973(34.5) 7,750(33.6) 892(3.9) 586(2.5) 20-29 14,251(100) 3,675(25.8) 4,718(33.1) 5,070(35.6) 478(3.4) 310(2.2) 30-39 15,838(100) 4,034(25.5) 5,127(32.4) 5,953(37.6) 418(2.6) 306(1.9) 40-49 16,074(100) 3,727(23.2) 5,472(34.0) 6,067(37.7) 504(3.1) 304(1.9) 50-59 9,379(100) 1,996(21.3) 3,306(35.2) 3,678(39.2) 258(2.8) 141(1.5) 60≤ 4,115(100) 934(22.7) 1,454(35.3) 1,604(39.0) 81(2.0) 42(1.0)
2.2. Excess Relative Risk model
In radiation epidemiology, the excess relative risk (ERR) is used to quantify the increased risk of people exposed to radiation compared to people who are not exposed. Relative Risk (RR) is commonly used to estimate exposure-related risk, but ERR is commonly used in radiation epidemiology. Because ERR is a from of subtracting 1 from RR, if ERR is positive then it means that the risk is high. ERR can be expressed as follows.
λ d (x) = λ 0 (x)(1 + ERR),
where λ d (x) means the incidence rate when exposed to radiation, and λ 0 (x) is the incidence rate when not exposed to radiation, which can generally be interpreted as a background rate.
The LSS data is grouped data because it is cross-classified for several variables. Therefore, the use of the Poisson regression model is the most common form to estimate the risk of exposure to radiation and can be expressed as follows.
Y i ∼ P ois(P Y i e η
i(1 + ERR(d i , s i , e i , a i ))) i = 1, · · · , n,
where Y i is the number of solid cancer for stratum i, P Y i is the number of person-years of follow up for stratum i, and e η
iis the background rate for stratum i. η i is linear predictor, η i = α 0 + α 1 x 1i + · · · + α 11 x 11i , depended on city (c), sex (s), attained age (a), birth year (b) and location at the time of bombing (l). Since η i is a risk that is not related to radiation exposure, no information related to exposure was included. Unlike the general log- linear poisson model (McCullagh and Nelder, 1989), this model is described as a generalized nonlinear model because it has a log-linear term and a linear term together.
The ERR were described using models of the product form ρ(d)(s, e, a), in which ρ(d) is
a function of dose-response describing the main effect of radiation and (s, e, a) is a function
of effect modification depend on sex (s), exposure age (e), attained age (a). For more detail, ERR can be expressed as follows.
ERR(d, s, e, a) = ρ(d)θ s exp{γ (e − 30)
10 + φ log(a/70)}.
2.3. Dose-response model
In the case of estimating risk using the LNT model, the estimations of the slopes are highly influenced by observations of high doses than observations of low doses. Therefore, the problem of underestimatation of uncertainty occurs at low dose, and we tried to estimate each slope by dividing the dose range to solve this problem.
ρ(d) =
C
X
k=1
β k h k (d),
h k (d) = {min(d, δ k ) − δ (k−1) } I(d > δ k−1 ),
where I(·) is the indicator function such that if d > δ k−1 is true then I(d > δ k−1 )=1 and 0 otherwise, β k is a slope for kth dose range and h k (d) is a function of connecting even if a different slope is given for each range. In addition, in estimating the dose category slope, it is common to think that the slopes of adjacent ranges are affected by each other. Therefore, we assumed that adjacent slopes influence each other, and the piecewise linear dose-response model reflecting these is as follows.
[β k |β k−1 ] ∼ N (β k−1 , σ 2 ), k = 1, · · · , C.
In general assumption, the piecewise linear dose-response model has a problem that the slope changes rapidly at the cutpoints. To solve this problem, we considered the Gaussian Process model on the dose category slopes. It can be expressed as follows by assuming a multivariate normal distribution with a dose difference structure in the slope parameters.
(β(d 1 ), · · · , β(d C )) 0 ∼ M V N (m, Σ),
where m is a mean vector of the slopes, and Σ is the covariance matrix having a dose difference structure. The covariance matrix can be expressed as the product of the variance and the correlation matrix, Σ = σ 2 R θ,ν (d 1 , d 2 ), the most commonly used covariance function is the Matern covariance function, which has the following form.
R θ,ν (d 1 , d 2 ) = ( |d
2−d θ
1| ) ν
2 ν−1 Γ(ν) K ν ( |d 2 − d 1 | θ ),
where θ is a scale parameter, and ν is a parameter representing the degree of smoothness.
In particular, when ν=0.5, the covariance function has a well-known exponential covariance function, so we fixed ν to fit the analysis. The model applied in this study can be organized into the following hierarchical Bayesian model.
• Data model:
Y i ∼ P ois(P Y i e η
i(1 + ERR(d i , s i , e i , a i ))).
• Process model:
[β(d 1 ), · · · , β(d C )|m, Σ(σ 2 , θ)] : (β(d 1 ), · · · , β(d C )) 0 ∼ M V N (m, Σ(σ 2 , θ)).
• Prior model:
[α], [β], [σ 2 ], [θ], [γ], [φ].
In the data model, a Poisson nonlinear model was assumed for the number of solid cancer (y), and a multivariate normal distribution was assumed for the dose category slopes. As the prior model, independent non-informative prior was assumed for the parameters of the background rate, α, the parameters of category slope, β, the parameters of the covariance function, σ 2 , θ, and the exposure age and attained age effects, γ, φ. For the convenience of analysis, the average of the slopes, m, was fixed and the analysis was performed.
3. Application
The LSS solid cancer incidence data (1958-2009) was used to estimate the risk associated with radiation. To fit the piecewise linear dose-response model with Gaussian process, we set the 22 cutpoints used to stratify the person-year table: 0, 0.005, 0.02, 0.04, 0.06, 0.08, 0.1, 0.125, 0.15, 0.175, 0.2, 0.25, 0.3, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.5 and 3. The joint posterior distribution for Gaussian process piecewise linear model dose-response model is as follows.
π(α, β, σ 2 , θ, γ, φ|x) ∝
n
Y
i=1
[P Y i e η
i(1 + (
C
X
k=1
β k h k (d))exp(γe ∗ i + φa ∗ i ))] Y
i× exp[−
n
X
i=1
P Y i e η
i(1 + (
C
X
k=1
β k h k (d))exp(γe ∗ i + φa ∗ i ))]
×
n
Y
i=1
1
Y i ! × |Σ(σ 2 , θ)| −1/2 exp[− 1
2 (β − m) 0 Σ(σ 2 , θ) −1 (β − m)]
× π(α 0 ) × · · · × π(α 11 ) × π(β 1 ) × · · · × π(β 22 )
× π(σ 2 ) × π(θ) × π(γ) × π(φ).
It is hard to find the joint posterior distribution, so we used a Gibbs sampler that gener-
ates random samples from conditional probability distributions. This is because it is known
that the limiting distribution of random samples under appropriate conditions becomes joint
posterior probability density function. To use the Gibbs sampler, the conditional posterior
distribution of parameters was calculated. Because most of the conditional posterior distribu-
tion doesn’t include well-known family of distribution, we inserted the Metropolis-Hastings
algorithm in Gibbs sampler. To use this method, the conditional posterior distribution of
interest parameters based on the joint posterior distribution is calculated as follows.
• The conditional posterior distribution of β
β|rest ∝ exp[
n
X
i=1
Y i log(1 + (
C
X
k=1
β k h k (d))exp(γe ∗ i + φa ∗ i ))]
× exp[−
n
X
i=1
P Y i e η
i(1 + (
C
X
k=1
β k h k (d))exp(γe ∗ i + φa ∗ i ))]
× |Σ(σ 2 , θ)| −1/2 exp[− 1
2 (β − m) 0 Σ(σ 2 , θ) −1 (β − m)].
• The conditional posterior distribution of σ 2
σ 2 |rest ∼ IG( C − 2 2 , 1
2 (β − m) 0 R(θ) −1 (β − m)).
• The conditional posterior distribution of θ
θ|rest ∝ |Σ(σ 2 , θ)| −1/2 exp[− 1
2 (β − m) 0 Σ(σ 2 , θ) −1 (β − m)].
• The conditional posterior distribution of γ
γ|rest ∝ exp[
n
X
i=1
Y i log(1 + (
C
X
k=1
β k h k (d))exp(γe ∗ i + φa ∗ i ))]
× exp[−
n
X
i=1
P Y i e η
i(1 + (
C
X
k=1
β k h k (d))exp(γe ∗ i + φa ∗ i ))].
• The conditional posterior distribution of φ
φ|rest ∝ exp[
n
X
i=1
Y i log(1 + (
C
X
k=1
β k h k (d))exp(γe ∗ i + φa ∗ i ))]
× exp[−
n
X
i=1
P Y i e η
i(1 + (
C
X
k=1
β k h k (d))exp(γe ∗ i + φa ∗ i ))].
Most of the conditional posterior distribution of interest parameters do not include to the generally well-known family of distribution. To generate the samples, metropolis-hastings is inserted and a random walk chain, q(y|x) = q(|y − x|), was used as a candidate-generating density function. Considered the condition that the support of the random variable is greater than 0, the truncated normal distribution was selected as the candidate-generating density function for some interest parameters. The truncated normal distribution set by the candi- date generation density function is as follows.
q(y|x) = φ(y|x, σ 2 )
1 − Φ(0|x, σ 2 ) ,
where φ(y|x, σ 2 ) = √ 1
2πσ
2exp{− 2σ 1
2(y − x) 2 } and Φ(0|x, σ 2 ) = R 0
−∞ φ(y|x, σ 2 ).
In order to check the efficiency and convergence of Gibbs sampling, the rejection rate and the Gelman-Rubin Statistics were confirmed (Gelman et al., 2013). In this paper, two parallel chains were used to check the convergence of the chain. To eliminate the influence of the initial value in the generated 20,000 samples, the first 5000 samples were removed, and every 30th sample was selected to eliminate autocorrelation.
Using the samples generated from the posterior distribution, the parameters of the excess relative risk can be estimated. For parameters estimation, posterior mean was used in this study. To compare the estimated excess relative risk at age 70 after exposure at age 30, estimated excess relative risk is shown at Figure 1 with linear nonthreshold, quadratic, linear-quadratic, threshold, piecewise linear with general assumption (Park et al., 2020). The entire dose range (left) was shown to see the shape of the overall dose-response function, and the 0-0.05Gy range (right) was shown to see detailed changes on the low dose. Overall, the fitted curved by the Gaussian process prior was fairly similar to the piecewise linear model.
It appeared slightly smaller than the piecewise linear model in all areas. Compared with the LNT model, the piecewise model with Gaussian process prior has a high risk between 1Gy and 2Gy, and lower in others. The ERR at 1Gy was estimated to be 0.48 by the Gaussian process, which was close to 0.48 by the LNT. At low doses, the fitted curve by the Gaussian process prior was close to the threshold model and Linear-Quadratic model.
In this paper, It can be confirmed that the rejection rate of the Metropolis-Hastings algorithm is about 20-50% (Table 3.1). Two parallel chains were used to check whether the chains converged, and the G-R statistics was checked to have a value of 1.00-1.11.
Table 3.1 Gelman-Rubin statistics and rejection rate in MCMC simulation
α
0s β
0s θ γ φ
G-R statistics 1.00-1.04 1.10-1.11 1.01 1.01 1.00 Rejection rate(%) 0.20-0.48 0.25-0.48 0.48 0.43 0.21
4. Conclusion
In this paper, a piecewise linear model was used as a dose-response function to estimate the risk of radiation exposure. We considered the Gaussian process having the dose difference structure on the dose category slopes. The Life Span Study solid cancer incidence data (1958- 2009) was used to fit the parameters for the proposed model. As a result of the analysis, it was possible to obtain a conclusion similar to that of the piecewise linear model, which made a general assumption, and it was checked that the change in slope was different depending on the range of dose. In addition, since the dose difference structure is included in the Gaussian process model, it affects to have a similar slope for adjacent slopes, and it also provides a solution to the problem of sharply changing slopes occurring at cutpoints.
Describing the limitations of the study, a basic model for fitting the radiation risks is
poisson regression, which can lead to a statistical overdispersoin problem. As a disadvantage
of the piecewise linear model, there is a problem in that the slope varies depending on the
method of determining cutpoints. In further studies, it is necessary to check the sensitivity
by setting cutpoints differently. In addition, we compared the excess relative risk with the
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.51.01.5
Weighted absorbed colon dose(Gy)
Excess Relative Risk
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.51.01.5
Linear nonthreshold Quadratic Linear−Quadratic Threshold Piecewise linear Gaussian Process
0.0 0.1 0.2 0.3 0.4 0.5
0.000.050.100.150.200.25
Weighted absorbed colon dose(Gy)
Excess Relative Risk
0.0 0.1 0.2 0.3 0.4 0.5
0.000.050.100.150.200.25
Linear nonthreshold Quadratic Linear−Quadratic Threshold Piecewise linear Gaussian Process