The improved distance (or similarity) measurement between gene sets, which takes into account both the overlap between gene sets and the networks between genes, has overcome the limitation of existing path analysis methods that only consider the overlap or the network connection. The conventional approach clustered the gene sets for the pathway based on the overlap between gene sets. In this regard, the proposed methods and their computational approach provide new insights into trajectory analysis.
Introduction
Biological data
- Transcriptome
- Microarray
- RNA-sequencing
- Single Cell-sequencing
- Interactome
- Pathway
First, databases that provide network information from experimental results such as MINT are referred to as primary databases (Licata et al., 2011). Finally, databases that provide computationally predicted network information such as stringDB are referred to as predictive databases (Szklarczyk et al., 2018). Tools such as ClueGO are considered an example for ORA. (Bindea et al., 2009) To cover this limitation, the Functional Class Score (FCS) has emerged.

Programmatic approach
- Shiny
For example, it is not difficult to find programs for "Pathway Analysis" in both Python and R ("PyPathway: Python Package for Biological Network Analysis and Visualization Rue- Albrecht, Marini, Soneson, & Lun, 2018). It is designed to create easily interactive web applications from R, which can be applied to dashboards with web extension techniques such as HTML/CSS/JavaScript.This framework is used in broad fields such as teaching, biology, data analysis, social study .

Research overview
This is one of the standard approaches for group-level pathway analysis in scRNA seq (with marker genes), but it will result in a loss of information compared to cell-level analysis. 2 public data were used to test the performance comparison between group-level pathway analysis and cell-level pathway analysis (Han et al., 2018; Zheng et al., 2017).
GScluster: network-weighted gene-set clustering analysis
- Overview
- Introduction
- Methods
- Gene-set clustering and distance measures in GScluster
- Implementation for GScluster
- Results
- GScluster R package
- Comparative analysis for real data
- Discussion
However, these distance measures only take into account the number of genes within each gene group (eg, the number of genes intersecting in 2 gene groups). One of the frequently provided formats of GSA results consists of a list of gene sets and their member genes (it is also able to contain scores for genes or any gene set). We applied PPI weighted gene distance (pMM) in the present study, which integrates both overlapping genes and PPIs between two sets of genes.

Overview
Introduction
However, the conventional overlap-based selection implies that each gene will have the same probability of being selected for selection in the target group of closely associated genes. These strategies do not involve specific networks between the target and the Route/GO conditions. Recent network-based methods (Alexeyenko et al., 2012; Ogris, Guala, Helleday, & Sonnhammer, 2017) check the richness of network connections between pathway/GO terms without overlapping information.
Method
- Integrated score
- Network adjusted resampling
- Implementation of netGO
This kind of method has further improved the biological relevance of gene clustering work (Sora Yoon et al., 2019). In the latter expression, we assume that a gene x∈T−A partially belongs to A to the extent of the average interaction score. Our benchmark results indicated that α=20 would have a reasonable compromise between overlap and network scores.
We mention the methods are called netGO+ and netGO, which use the integrated score P(T → A) and only the network score P′(T → A), respectively. So, depending on the situation, netGO suggests either our software package or the network-dependent approach. All genes included in the networks and the gene-set annotation treated as the background genes for the enrichment test [e.g.
28 . interactions) network connectivity scores are calculated and each gene is by default divided into K bins out of the 2000 genes. Because cytoscape is based on java, we instead used cytoscape.js (Franz et al., 2015), which is a javascript library. The results table shows significant gene sets and their FDR q values for netGO, netGO+ and Fisher's exact test.
The bubble chart shows more statistics of significant gene sets for netGO+, such as the overlap score, network score, and the significance of each gene set as represented by the node size.
Results
- netGO R-shiny package
- Benchmark test
For these three target gene sample distributions, we compared the false positive controls (Fig and the prioritization of the corresponding gene sets (Fig for the four different background gene connections. Both combined (netGO+) / network only results (netGO ) and the results of the FET result will be provided and the interconnected networks between target genes and gene set annotation will be visualized (Figure 3.5 below). EnrichNet (Glaab, Baudot, Krasnogor, Schneider, & Valencia, 2012) does not was compared because no support analysis for custom gene sets.
Among them, we used 156 gene sets that contained "breast cancer" or "BRCA" in their names as our standard positive results. Second, we analyzed a summary of GWAS data for type II diabetes compared in our previous work (S. Yoon et al., 2018). A detailed list of gene sets and their significance score are provided in Supplementary Tables S3.1 and S3.2 for breast tumor and type II diabetes.
Each enrichment analysis method produced a list of gene sets (or pathways) sorted by FDR q-values, and for each rank of gene set, the cumulative counts of standard positives between different methods were compared in Figure 3.6 (below) for the STRING networks. NEA and BinoX showed poor performance; this result was not unexpected because overlap-based methods have an advantage when testing differentially expressed genes for the expression-based (signature) gene sets. These results show that our methods are able to prioritize relevant gene sets better than existing methods especially for a small number of target genes.
The running times of each method were also compared and netGO cost significantly less time compared to other network methods (Figure 3.10 below).

Conclusion
Supplementary information of Chapter 3
CellEnrich
Overview
Introduction
To identify significant marker genes, current methods typically adopt relative statistics from each pairwise comparison of subgroups (eg, Wilcoxon rank test). However, relative statistics may have difficulty detecting "shared" significant genes across multiple groups.
Methods
- CellEnrich algorithms
- Pre-process
- Normalization
- Select significant genes in samples
- Select significant genes in groups
- Find enriched pathway for each samples
It is an effective strategy to investigate which marker genes generate a difference between groups with pathway analysis. Furthermore, each subgroup consists of a different number of cells. This chapter introduces the R/Shiny package CellEnrich, which can be used for both analysis and visualization of single-cell RNA-sequencing data. CellEnrich used the findmarker function of the scran package (Lun, McCarthy, & Marioni, 2016) to perform wilcoxon ranksum tests.
CellEnrich uses Fisher's exact test to assess the significance between marker genes and pathways for each sample.
Results
- CellEnrich R-shiny package
- Benchmark test
First, significant marker genes are selected for each group using the findmarker function of the scran package, which uses Wilcoxon's test by default. Then, a set of marker genes is analyzed using Fisher's exact test (FET) in each data set and group. The detailed list of marker genes and significant pathways for each method is provided in Table S4.1 For both datasets, CellEnrich detected more pathways than those detected by FET.
Since the lists of significant pathways identified by FET and CellEnrich were quite different in their sizes, (Figure 4.4) the literature survey was applied 1 set to each data where the size difference was not significant. Platelets for the PBMC dataset, Mast cells for the Pancreas dataset. Both CellEnrich and Fisher discovered pathways called Platelet Activation, which is strongly linked to the immune response upon detection of external factors such as microbes and viruses. However, CellEnrich's findings focused on cellular structure involved in physical interactions associated with "platelet activation" (Actin cytoskeleton regulation, tight junction, focal adhesion, phagosome) and specific diseases (dilated cardiomyopathy, Shigellosis, Escherichia coli Pathogens, infection).
In contrast, FET results were more closely related to “platelet activation” (chemokine signaling pathway, viral protein interaction with cytokine and cytokine receptor). First, CellEnrich detected a larger number of significant pathways because it uses whole-cell metrics. Literature review of both methods also revealed that most of the significant pathways were biologically meaningful.
Second, the main theme of the pathways revealed by each method was slightly different from each other.

Conclusion
Supplementary information of Chapter 4
- PBMC
- Pancreas
Development for interactive graph and network visualization
- Overview
- Introduction
- Development method
- Design and implemented result
- Summary
For example, the interactome consists of 4 biological elements ('A','B','C','D') and their relationship can be represented as the figure below for each method. GUI application development can be achieved using the Shiny framework (Chang et al., 2017) in the R programming language. Therefore, there are many R packages for graph and network visualization that can be used in the GUI application development process.
To solve this problem, another Javascript library that supports dynamic grid rendering, cytoscape.js, was chosen. The difference between cytoscape.js and cytoscape (used in Rcytoscape) is, cytoscape is based on Java which is not considered for web framework (gloss) and works only for Java (not R), cytoscape.js is javascript library which is exported by cytoscape with considering the grid that can be applied to the R-frame and glossy. Core library is javascript library which contains core function, in this shinyCyJS, cytoscape.js is used as core library.
Rendering unit is collection of R functions to provide function of core library in R and shiny. Wrapper library is R package to combine both renderer and core library, 'htmlwidget' package is commonly used as R/shiny wrapper library development. And function of shinyCyJS considered with customization includes most style not provided in existing R packages.
In this chapter I introduced a brief concept of the R environment and an example of an R package for graph visualization for the Shiny framework, called 'shinyCyJS'.

Conclusions
Human Mast Cell Progenitors Can Be Infected by Macrophagetropic Human Immunodeficiency Virus Type 1 and Retain Virus with Aging In Vitro. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-seq data reveals hidden subpopulations of cells. Interferon-γ protects against chronic viral myocarditis by reducing mast cell degranulation, fibrosis, and the profibrotic cytokines transforming growth factor-β1, interleukin-1β, and interleukin-4 in the heart.
A mechanism for the sustained action of mast cell-derived TNF-alpha during IgE-dependent biological responses. A susceptibility gene set for early-onset colorectal cancer that integrates diverse signaling pathways: implication for tumorigenesis. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor [version 2; peer review: 3 approved, 2 approved with reservations].
Mutation studies of Flt-3 and c-kit in a spectrum of chronic myeloid disorders, including systemic mast cell disease. Ultraviolet radiation-induced cytokines induce mast cell accumulation and matrix metalloproteinase production: possible role in cutaneous lupus erythematosus. Gene dispersion is a major determinant of read count bias in differential expression analysis of RNA-seq data.