• 검색 결과가 없습니다.

Fig. 1. Work Flow

From AML RNA-seq datasets, gene selection is conducted for capturing the expression pattern of the BCL2 family. First, the BCL2 family’s transcriptionally-, regulationally-, or functionally-related genes are collected using a gene set database and documentary survey. Second, backward selection is performed to find optimal genes based on the recovery of BCL2 family utilizing Non-Negative Matrix Factorization (NMF). Afterward, the “signatures”, consisting of the optimal genes, are calculated. These are used to identify subtypes of AML, to predict venetoclax response, and to guide treatment strategy targeting BCL2 family in individual samples.

Fig. 2. Association of Venetoclax Response

(A) Comparison of anti-apoptotic BCL2 family expression between vanetoclax sensitive and resistant group in BeatAML (81 sensitive and 72 resistant) and LeuceGene (20 sensitive and 3 resistant).

(B) Comparison of the signatures representing patterns of BCL2 family expression. Note that we only used a subset of LeuceGene in which response information is available publicly. Differently expressed BCL2 and BFL1 were already identified using whole LeuceGene in (12). p-values are calculated by t-test. *<0.05, **<0.01, ns>0.10

Fig. 3. Identification of BCL2 Family-Related AML Subtypes H matrix presenting BCL2 family expression signature profiles calculated using optimized genes. Four AML datasets show three distinct clusters. Rows were clustered using hierarchical clustering with average distance.

Fig. 4. Contribution of Optimized Genes to Signatures Weight of optimized gene in the W matrix. BCL2, MCL1, and BFL1 are marked in cyan, magenta, and yellow respectively. Some determinant components of the signatures are marked in black. Four AML datasets show concordance of the weight of optimized genes.

Each weight of genes is normalized to sum 1. Correlation coefficients were calculated using spearman’s rho.

Fig. 5. Functional Enrichment Analysis of BCL2 Family-Related AML Subtypes

Gene set enrichment analysis (GSEA) from the comparison between one subtype and the others, which identifies enriched gene sets in each subtype. Enrichment patterns are consistent across three AML datasets. NES indicates normalized enrichment score. Gene set of MAPK pathway and JAK/STAT pathway are from GO and KEGG databases respectively. The others are from the hallmark database.

Fig. 6. Prediction Performance of Signatures

Comparison of performance between classifiers. Used variables and models are described in the right table. BCL2 family(5) and BCL2 family(3) indicate (BCL2+MCL1+BFL1+BCLXL+BCLW) and (BCL2+MCL1+BFL1) respectively. Error bar indicates 95% confidence interval (CI). p values are calculated comparing with the signature model using the DeLong test.

Fig. S1. Histogram of IC50 values from BeatAML dataset

IC50 values were binarized to sensitive if IC50<=1 µm and resistant if IC50>=10 um. The 87, 72, and 27 samples were allocated to the sensitive, resistant, and intermediate groups, respectively.

Fig. S2. Gene Optimization

(A) Decrease of aMAPE (an average of mean absolute percentage errors) of anti-apoptotic BCL2 family genes during gene optimization.

Genes with minimal aMAPE were selected as optimized genes.

(B) Test AUC of classifier when using given genes during the optimization process.

(C) A Venn diagram illustrating the optimized genes of three AML datasets with rank3. 37 intersect genes are listed in the right of the figure.

Fig. S3. Optimal Rank Selection for NMF

Cophenetic correlation coefficients of at given optimized genes and rank. Rank3 is the first rank in which the cophenetic correlation coefficient begins to fall in the three AML cohorts.

Fig. S4. Weight of BCL2 Family in Signatures

Correlation between signatures and inhibitor responses (ln(IC50)) in AML cell lines. RNA data and response data were from CCLE and GDSE2, respectively. Bold box emphasizes correlation between the signatures and drug responses.

Fig. S5. Original Gene Expression Profile of Optimized Genes Original expression (RPKM) profile after gene optimization in the three AML datasets. Row indicates the optimized genes with rank3 in the given dataset. The order of samples is the same as in Fig. 3.

Clustering is conducted after log2-transformation and row scaling.

Fig S6. Identification of BCL2 Family-Related Subtypes in Other Hematologic Malignancies

(A) Profile of signatures in other hematologic malignancies (CLL and DLBCL). Weights of the BCL2 family in signatures are described in Fig. S4B. Clustering is conducted after row scaling.

Cluster A, B, and C indicate signature 1, 2, and 3-enriched samples respectively.

(B) GSEA from the comparison between one cluster and the others identifies enriched gene sets in each cluster. NES indicates normalized enrichment score. Gene sets are the same as in Fig. 5.

Fig. S7. Batch Effect Correction

PCA from expression profiles before or after batch correction between (A) BeatAML and LeuceGene or (B) BeatAML and CCLE.

Fig. S8. External Validation of the Venetoclax Response Classifier

Comparison of prediction performance between classifiers in LeuceGene (external validation set). Bar graphs show an estimate of the probability (y hat) of individuals. Used variables and models are the same as in Fig. 6. Error bar indicates 95% confidence interval (CI).

Fig. S9. Prediction Performance of Signatures between Before and After Gene Optimization

Comparison of prediction performance between classifiers, which are trained using signatures calculated from total genes, domain knowledge-based collected genes, or optimized genes. Logistic regression is used. The number of signatures components are stated in the right table (#Gene). Error bar indicates 95% confidence interval (CI). P-values are calculated by comparison to the optimized gene model (rank3) using the DeLong test.

Fig. S10. Profile of Signatures in CCLE AML

H matrix presenting a signature profile of CCLE AML. Clustering is conducted after row scaling. Drug IC50 information is from GDSC2.

Fig. S11. Correlation between signatures and drug response in CCLE AML

Correlation between signatures and inhibitor responses (ln(IC50)) in AML cell lines. RNA data and response data were from CCLE and GDSE2, respectively. The bold box emphasizes the correlation between the signatures and drug responses.

Fig. S12. Prediction Performance in Cell Line

H matrix presenting a signature profile of CCLE AML. Clustering is conducted after row scaling. Drug IC50 information is from GDSC2.

Fig. S13. Scheme of Gene Optimization Algorithm

H matrix presenting a signature profile of CCLE AML. Clustering is conducted after row scaling. Drug IC50 information is from GDSC2.

Gene Beat

AML Leuce Gene TCGA

ABL1 o o o

AEN o x o

AIFM1 x x o

AKT1 x o x

APAF1 o o x

APOPT1 o x x

ARL6IP5 o o o

ATF4 o o x

ATM o x x

ATP2A1 o o o

BAD x x o

BAX o x o

BCL2 o o o

BCL2A1 o o o

BCL2L1 o o o

BCL2L10 o o o

BCL2L11 x o o

BCL2L2 o o o

BCL3 x o o

BIK x o x

BIRC2 o o x

BIRC3 o o o

BNIP3 o x x

BOK o o o

BRCA1 x x o

BRCA2 x x o

BRSK2 o o o

CASP2 o o o

CASP3 x o o

CASP4 o o o

CASP7 o x o

CASP8 x o o

CASP9 x x o

CD40 x x o

CD40LG x x o

CDKN1A x x o

CEBPB x o x

CFLAR o o o

CHAC1 x x o

CHEK2 o x o

CHUK x x o

CLU x o x

CNR1 o o o

CREB1 o x o

CRIP1 x x o

CUL1 x x o

CUL2 x x o

CUL4A o x o

CUL5 o x x

CYCS x x o

CYP1B1 o o x

DAB2IP o o o

DDIT3 o o x

DDIT4 x x o

DDX3X o o x

DDX5 o o x

DFFA x o o

DFFB o o o

DIABLO x o o

DNAJC10 x o o

DYNLL1 o o x

DYNLL2 x o o

DYRK2 o o o

E2F1 o o o

E2F2 x x o

리. a

ELK1 x x o

ENDOG x x o

EP300 x x o

EPHA2 o x x

ERCC6 x o x

ERO1L x x o

ETS1 x x o

FHIT o o x

FNIP2 x x o

GSK3B o o x

GZMB x x o

HIF1A o o x

HIPK1 o o o

HIPK2 x o o

HMOX1 o x o

HRAS o x o

HRK x o o

HSPB1 o x o

HTRA2 x x o

IFI16 o o o

IKBKB o o o

IKBKE x o o

IKBKG x x o

IRAK1 x x o

ITPR1 o o x

LGALS12 x x o

MAEL o o o

MARCH5 o o x

MAP3K14 x x o

MAP3K5 o o x

MAPK8 o o x

MAPT o x o

MAT2A o x x

MCL1 o o o

MELK x x o

MLLT11 x o o

MSH2 x o o

MSH6 x o o

MYBBP1A x o o

MYC x o o

NF1 o o o

NFATC4 o x x

NFKB2 x o o

NGFR o x x

NMT1 o x o

NOL3 x x o

NPRL2 x o o

NUPR1 o x o

PDCD10 x x o

PDK1 x x o

PDK2 x o o

PERP x x o

PGAP2 o o x

PHLDA3 o x o

PIK3R1 o o o

PMAIP1 x o o

PML o o o

POLB x o x

PPM1F x o o

PPP1R15A o o o

PPP3R1 o x x

PRKCD o o x

PRKCZ o o o

PRKDC x o x

PRODH o o x

PTBP1 o o o

PTGER2 o o o

PTHLH o o x

PYCARD o x x

RAB25 o x o

RAC2 o o x

RELA x x o

RIPK1 x x o

RPS6KB2 x x o

RRP8 o x o

SCN2A o o x

SELK x x o

SFN o x o

SGPP1 x x o

SHISA5 o x x

SIRT1 x o x

SIVA1 o o o

SNW1 o x x

SRF x x o

ST20 o o o

STAT3 o o o

STAT5A x x o

STAT5B o o o

STK11 x x o

STK24 x x o

SYNGAP1 x o x

TCF4 o o x

TMEM109 o x x

TNF x o x

TNFRSF10B x x o

TNFRSF12A x o x

TNFRSF1A x o o

TNFRSF4 x o x

TNFSF4 o o o

TOPORS x x o

TP53 x o x

TP53BP2 x o x

TP63 x o x

TRADD o o x

TRAF3 x x o

TRAF5 o o o

TRAF6 o o x

TRIB3 o x x

USP28 o x x

WWOX o o x

XBP1 x o x

XPA x o o

XRCC1 x o o

ZMAT1 x o x

ZMAT3 o o x

ZNF385B x x o

ZNF385C o o x

ZNF385D o x o

관련 문서