The influence of a first-order antedependence model and hyperparameters in BayesCπ for genomic prediction

(1)

1863

This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and repro- duction in any medium, provided the original work is properly cited.

www.ajas.info Asian-Australas J Anim Sci

Vol. 31, No. 12:1863-1870 December 2018 https://doi.org/10.5713/ajas.18.0102 pISSN 1011-2367 eISSN 1976-5517

The influence of a first-order antedependence model and hyperparameters in BayesCπ for genomic prediction

Xiujin Li

¹

, Xiaohong Liu

¹

, and Yaosheng Chen

^1,

*

Objective: The Bayesian first-order antedependence models, which specified single nucleo- tide polymorphisms (SNP) effects as being spatially correlated in the conventional BayesA/B, had more accurate genomic prediction than their corresponding classical counterparts. Given advantages of BayesCπ over BayesA/B, we have developed hyper-BayesCπ, ante-BayesCπ, and ante-hyper-BayesCπ to evaluate influences of the antedependence model and hyper- parameters for v

g

and

2 Abstract

18

Objective: The Bayesian first-order antedependence models, which specified single nucleotide 19

polymorphisms (SNP) effects as being spatially correlated in the conventional BayesA/B, had more 20

accurate genomic prediction than their corresponding classical counterparts. Given advantages of 21

BayesCπ over BayesA/B, we have developed hyper-BayesCπ, ante-BayesCπ, and ante-hyper-BayesCπ 22

to evaluate influences of the antedependence model and hyperparameters for vg and sg2 on BayesCπ.

23

Methods: Three public data (two simulated data and one mouse data) were used to validate our 24

proposed methods. Genomic prediction performance of proposed methods was compared to traditional 25

BayesCπ, ante-BayesA and ante-BayesB.

26

Results: Through both simulation and real data analyses, we found that hyper-BayesCπ, ante-BayesCπ 27

and ante-hyper-BayesCπ were comparable with BayesCπ, ante-BayesB, and ante-BayesA regarding the 28

prediction accuracy and bias, except the situation in which ante-BayesB performed significantly worse 29

when using a few SNPs and π = 0.95.

30

Conclusion: Hyper-BayesCπ is recommended because it avoids pre-estimated total genetic variance of 31

a trait compared with BayesCπ and shortens computing time compared with ante-BayesB. Although the 32

antedependence model in BayesCπ did not show the advantages in our study, larger real data with high 33

density chip may be used to validate it again in the future.

34 35

Keywords: BayesCπ; Antedependence Model; Hyperparameter 36

37 38 39 40

on BayesCπ.

Methods: Three public data (two simulated data and one mouse data) were used to validate our proposed methods. Genomic prediction performance of proposed methods was com- pared to traditional BayesCπ, ante-BayesA and ante-BayesB.

Results: Through both simulation and real data analyses, we found that hyper-BayesCπ, ante-BayesCπ and ante-hyper-BayesCπ were comparable with BayesCπ, ante-BayesB, and ante-BayesA regarding the prediction accuracy and bias, except the situation in which ante- BayesB performed significantly worse when using a few SNPs and π = 0.95.

Conclusion: Hyper-BayesCπ is recommended because it avoids pre-estimated total genetic variance of a trait compared with BayesCπ and shortens computing time compared with ante-BayesB. Although the antedependence model in BayesCπ did not show the advantages in our study, larger real data with high density chip may be used to validate it again in the future.

Keywords: BayesCπ; Antedependence Model; Hyperparameter

INTRODUCTION

Methods for genomic prediction have been developed widely since Meuwissen et al [1] firstly proposed genomic selection methods. To date, there are mainly two classical types of methods applied to genomic prediction in livestock breeding. The first one is BLUP methods such as GBLUP [2], which assume that single nucleotide polymorphisms (SNP) effects indepen- dently and identically follow a normal distribution with an equal variance. The second one is Bayesian hierarchical models with different prior distributions on the variances of SNP effects. For example, BayesA assumes a scaled-inverted chi square distribution on SNP- specific variances [1].

Habier et al [3] firstly developed BayesCπ for genomic prediction to address the draw- backs of BayesA and BayesB with respect to influences of prior hyperparameters and the prior probability π that a SNP has zero effect. BayesCπ assumes that all SNPs have a common effect variance instead of locus-specific variances in BayesA or BayesB so that the influence of

2 Abstract

18

Objective: The Bayesian first-order antedependence models, which specified single nucleotide 19

23

26

30

34 35

37 38 39 40

becomes smaller. π = 0.95 or 0.99 in BayesB is generally arbitrarily defined. It is im- proved in BayesCπ where π is treated as an unknown being inferred from the data. Although the statistical drawbacks of BayesA and BayesB did not impair the prediction accuracy mainly

* Corresponding Author: Yaosheng Chen Tel: +86-020-39332940, Fax: +86-02039332940, E-mail: [email protected]

1 State Key Laboratory of Biocontrol, School of

Life Sciences, Sun Yat-sen University, North Third Road, Guangzhou Higher Education Mega Center, Guangzhou, Guangdong 510006, China ORCID

Xiujin Li

https://orcid.org/0000-0002-8526-5547 Xiaohong Liu

https://orcid.org/0000-0001-8141-4835 Yaosheng Chen

https://orcid.org/0000-0002-3871-7651 Submitted Jan 31, 2018; Revised Apr 19, 2018;

Accepted Jun 2, 2018

Open Access

(2)

1864 www.ajas.info

Li et al(2018) Asian-Australas J Anim Sci 31:1863-1870

due to linkage disequilibrium (LD) information, accounting for reducing computing time and better fitting the genetic architecture of a trait, BayesCπ can be deemed to have better merit for routine application compared with BayesA/B.

So far, most of Bayesian methods, such as BayesA, BayesB, and BayesCπ, have a common feature that SNP effects are assumed to be independent. However, the existence of LD between SNPs makes the above assumption improper, and a close relationship should exist between SNPs. To account for LD between SNPs, Yang and Tempelman [4] brought first- order antedependence models which model SNP effects as correlated into the conventional BayesA and BayesB, and de- veloped ante-BayesA and ante-BayesB. It is reported in their paper that the antedependence methods had significantly higher accuracies than the corresponding classical counter- parts at higher LD levels (r

²

>0.24) in the simulation study, and they were also more accurate in mice data and other bench- mark data sets.

However, although there are advantages of BayesCπ over BayesA and BayesB aforementioned, first-order antedepen- dence models are still not brought to BayesCπ. Thus, we will develop ante-BayesCπ, and then investigate whether ante- BayesCπ can improve the prediction accuracy and bias compared to BayesCπ. Besides, it has not been widely appre- ciated that v

g

and

2 Abstract

18

23

26

30

34 35

37 38 39 40

also can be estimated from the data [5].

This recognition is important as both hyperparameters can help define the genetic architecture of a trait of interest. The impact of adding hyperparameters to BayesCπ for genomic prediction is little reported and is also an interesting topic for investigation.

Therefore, the objective of this study was to investigate the impact of adding a first-order antedependence model and hyperparameters into BayesCπ for genomic prediction re- garding the prediction accuracy and bias. We have demonstrated the predictive ability of BayesCπ with first-order antedepen- dence models and hyperparameters in simulation data from the study of Jiang et al [6], the 15th quantitative trait loci- marker assisted selection (QTL-MAS) workshop data set and the heterogeneous stock mice, compared to the classical BayesCπ, ante-BayesA, and ante-BayesB.

MATERIALS AND METHODS

Statistical model

The general linear mixed model used for genomic prediction could be written as

y = Xβ+Zg+e (1)

Here, y was a vector of phenotypes, β was a vector of fixed effects, g was a vector of random SNP effects, and e was a re- sidual vector. X was a known incidence matrix linking β to y.

Z was a known matrix linking g to y in which SNP genotypes were coded as 0, 1, or 2 copies of one allele for each SNP (col- umn) and animal (row). Furthermore, we specified g ~N (0,G), where

4 data [5]. This recognition is important as both hyperparameters can help define the genetic architecture 78

of a trait of interest. The impact of adding hyperparameters to BayesCπ for genomic prediction is little 79

reported and is also an interesting topic for investigation.

80

Therefore, the objective of this study was to investigate the impact of adding a first-order 81

antedependence model and hyperparameters into BayesCπ for genomic prediction regarding the 82

prediction accuracy and bias. We have demonstrated the predictive ability of BayesCπ with first-order 83

antedependence models and hyperparameters in simulation data from the study of Jiang et al [6], the 15th 84

QTL-MAS (Please write it in full form.)workshop data set and the heterogeneous stock mice, compared 85

to the classical BayesCπ, ante-BayesA and ante-BayesB.

86 87

MATERIALS AND METHODS 88

89

Statistical model 90

The general linear mixed model used for genomic prediction could be written as 91

92

y = Xβ+Zg+e (1) 93

94

Here, y was a vector of phenotypes, βwas a vector of fixed effects, g was a vector of random SNP 95

effects, and e was a residual vector. X was a known incidence matrix linking β to y. Z was a known 96

matrix linking g to y in which SNP genotypes were coded as 0, 1, or 2 copies of one allele for each SNP 97

(column) and animal (row). Furthermore, we specified 𝐠𝐠 ~N (0, 𝐆𝐆) , where 𝐆𝐆 = diag(𝜎𝜎_𝑔𝑔²_𝑗𝑗) 98

and 𝐞𝐞 ~N (0, 𝐈𝐈𝜎𝜎_𝑒𝑒²).

99 100

BayesCπ 101

Habier et al [3] firstly developed BayesCπ, and the obvious difference between BayesCπ and other 102

methods such as GBLUP, BayesA, and BayesB was the characterization of G. 𝜎𝜎_𝑔𝑔²_𝑗𝑗 in BayesCπ had a 103

two-component mixture with one component being 0 and the other component being one fixed value 𝜎𝜎_𝑔𝑔², 104

i.e., 105 106

and

80

86 87

89

92

y = Xβ+Zg+e (1) 93

94

99 100

BayesCπ 101

i.e., 105 106

. BayesCπ

Habier et al [3] firstly developed BayesCπ, and the obvious difference between BayesCπ and other methods such as GB- LUP, BayesA, and BayesB was the characterization of G.

80

86 87

89

92

y = Xβ+Zg+e (1) 93

94

99 100

BayesCπ 101

i.e., 105 106

in BayesCπ had a two-component mixture with one compo- nent being 0 and the other component being one fixed value

80

86 87

89

92

y = Xβ+Zg+e (1) 93

94

and 𝐞𝐞 ~N (0, 𝐈𝐈𝜎𝜎𝑒𝑒2).

99 100

BayesCπ 101

two-component mixture with one component being 0 and the other component being one fixed value 𝜎𝜎𝑔𝑔2, 104

i.e., 105 106

, i.e.,

⁵

𝜎𝜎𝑔𝑔2_𝑖𝑖|𝑣𝑣𝑔𝑔, 𝑠𝑠𝑔𝑔2, π {= 0 with probability π

= 𝜎𝜎𝑔𝑔2 ~𝑥𝑥⁻²(𝑣𝑣𝑔𝑔, 𝑠𝑠𝑔𝑔2) with probability 1 − π (2) 107

108

Here, π represented the proportion of SNPs having no genetics effects on the trait of interest. Based 109

on Habier et al [3], the prior of π was treated as an unknown with uniform (0,1). The v^g was set to 4.2, 110

and sg2 was set as follows:

111 112

s

g2

=

^σ̃^g²^(v_v^g⁻²⁾

g (3) 113

114

σ̃

_g²

=

_{(1−π) ∑} ^σ̃^a²_2p

k(1−p_k) Kk=1 (4) 115

116

Where σ̃g2 was the additive genetic variance for a SNP, and σ̃a2 was the additive genetic variance 117

explained by all SNPs.

118 119

Hyper- BayesCπ 120

We labeled hyper-BayesCπ when we sampled vg using a Metropolis Hastings (MH) update and sg2 with 121

a Gibbs update in BayesCπ. Similar to BayesA and BayesB, we specified the following prior distributions 122

on vg and sg2 for hyper-BayesCπ: vg~p(vg) ∝ (vg+ 1)⁻², and sg2~𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺(𝛼𝛼𝑠𝑠, 𝛽𝛽𝑠𝑠) with shape 123

parameter 𝛼𝛼𝑠𝑠= 1.0 and rate parameter 𝛽𝛽𝑠𝑠=1.0. The full conditional density (FCD) of v^g did not have 124

a recognizable form, and a random walk MH step other than a Gibbs step was used to sample vg from 125

this FCD. The detailed procedure can be seen Yang et al [7]. The FCD of sg2 had a recognizable form 126

with a Gamma distribution, provided that a conditionally conjugate Gamma prior is used [8].

127 128

Ante-BayesCπ 129

Ante-BayesCπ was defined here as combining BayesCπ and the first-order antedependence model 130

proposed from the study of Yang and Tempelman [4]. Likewise, the following nonstationary first-order 131

antedependence correlation structure for g based on the relative physical location of SNPs along the 132

chromosome was used:

133 134

(2) Here, π represented the proportion of SNPs having no genetics effects on the trait of interest. Based on Habier et al [3], the prior of π was treated as an unknown with uniform (0,1). The v

g

was set to 4.2, and

2 Abstract

18

23

26

30

34 35

37 38 39 40

was set as follows:

5 𝜎𝜎𝑔𝑔2_𝑖𝑖|𝑣𝑣𝑔𝑔, 𝑠𝑠𝑔𝑔2, π {= 0 with probability π

108

111 112

s

g2

=

^σ̃^g²^(v_v^g⁻²⁾

g (3) 113

114

σ̃

_g²

=

_{(1−π) ∑} ^σ̃^a²_2p

k(1−p_k) Kk=1 (4) 115

116

118 119

parameter 𝛼𝛼𝑠𝑠= 1.0 and rate parameter 𝛽𝛽𝑠𝑠=1.0. The full conditional density (FCD) of vg did not have 124

127 128

133 134

(3)

5 𝜎𝜎𝑔𝑔2𝑖𝑖|𝑣𝑣𝑔𝑔, 𝑠𝑠𝑔𝑔2, π {= 0 with probability π

108

111 112

s

_g²

=

^σ̃^g²^(v_v^g⁻²⁾

g (3) 113

114

σ̃

g2

=

_{(1−π) ∑} ^σ̃^a²_2p

k(1−pk) Kk=1 (4) 115

116

118 119

127 128

133 134

(4)

Where

5 𝜎𝜎_𝑔𝑔²_𝑖𝑖|𝑣𝑣_𝑔𝑔, 𝑠𝑠_𝑔𝑔², π {= 0 with probability π

108

111 112

s

g2

=

^σ̃^g²^(v_v^g⁻²⁾

g (3) 113

114

σ̃

g2

=

_{(1−π) ∑} ^σ̃^a²_2p

k(1−pk) Kk=1 (4) 115

116

118 119

on vg and sg2 for hyper-BayesCπ: vg~p(v_g) ∝ (v_g+ 1)⁻², and sg2~𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺(𝛼𝛼_𝑠𝑠, 𝛽𝛽_𝑠𝑠) with shape 123

127 128

133 134

was the additive genetic variance for a SNP, and

108

111 112

s

_g²

=

^σ̃^g²^(v_v^g⁻²⁾

g (3) 113

114

σ̃

g2

=

_{(1−π) ∑} ^σ̃^a²_2p

k(1−pk) Kk=1 (4) 115

116

118 119

127 128

133 134

was the additive genetic variance explained by all SNPs.

Hyper- BayesCπ

We labeled hyper-BayesCπ when we sampled v

g

using a Me- tropolis Hastings (MH) update and

= 𝜎𝜎_𝑔𝑔² ~𝑥𝑥⁻²(𝑣𝑣_𝑔𝑔, 𝑠𝑠_𝑔𝑔²) with probability 1 − π (2) 107

108

111 112

s

_g²

=

^σ̃^g²^(v_v^g⁻²⁾

g (3) 113

114

σ̃

g2

=

_{(1−π) ∑} ^σ̃^a²_2p

k(1−pk) Kk=1 (4) 115

116

118 119

parameter 𝛼𝛼𝑠𝑠= 1.0 and rate parameter 𝛽𝛽_𝑠𝑠=1.0. The full conditional density (FCD) of vg did not have 124

127 128

133 134

with a Gibbs update in BayesCπ. Similar to BayesA and BayesB, we specified the following prior distributions on v

g

and

108

111 112

s

_g²

=

^σ̃^g²^(v_v^g⁻²⁾

g (3) 113

114

σ̃

g2

=

_{(1−π) ∑} ^σ̃^a²_2p

k(1−pk) Kk=1 (4) 115

116

118 119

127 128

133 134

for hyper-BayesCπ:

108

111 112

s

g2

=

^σ̃^g²^(v_v^g⁻²⁾

g (3) 113

114

σ̃

g2

=

_{(1−π) ∑} ^σ̃^a²_2p

k(1−pk) Kk=1 (4) 115

116

118 119

127 128

133 134

, and

108

111 112

s

g2

=

^σ̃^g²^(v_v^g⁻²⁾

g (3) 113

114

σ̃

g2

=

_{(1−π) ∑} ^σ̃^a²_2p

k(1−pk) Kk=1 (4) 115

116

118 119

127 128

133 134

with shape pa- rameter α

s

=1.0 and rate parameter β

s

=1.0. The full conditional density (FCD) of v

g

did not have a recognizable form, and a random walk MH step other than a Gibbs step was used to sample v

g

from this FCD. The detailed procedure can be seen Yang et al [7]. The FCD of

108

111 112

s

_g²

=

^σ̃^g²^(v_v^g⁻²⁾

g (3) 113

114

σ̃

g2

=

_{(1−π) ∑} ^σ̃^a²_2p

k(1−pk) Kk=1 (4) 115

116

118 119

127 128

133