DiffuPath¶
DiffuPath is an analytic tool for biological networks that connects the generic label propagation algorithms from DiffuPy to biological networks encoded in several formats such as Simple Interaction Format (SIF) or Biological Expression Language (BEL). For example, in the application scenario presented in the paper, we use three pathway databases (i.e., KEGG, Reactome and WikiPathways) and their integrated network retrieved from PathMe 1 to analyze three multi-omics datasets. However, other biological networks can be imported from the Bio2BEL ecosystem 2.
Installation is as easy as getting the code from PyPI with
python3 -m pip install diffupath
. See the installation documentation.
See also
Documented on Read the Docs
Versioned on GitHub
Tested on Travis CI
Distributed by PyPI
Installation¶
The latest stable code can be installed from PyPI with:
$ python3 -m pip install diffupath
The most recent code can be installed from the source on GitHub with:
$ python3 -m pip install git+https://github.com/multipaths/diffupath.git
Required to install the latest PathMe version directly from GitHub:
$ python3 -m pip install git+https://github.com/PathwayMerger/PathMe.git
For developers, the repository can be cloned from GitHub and installed in editable mode with:
$ git clone https://github.com/multipaths/diffupath.git
$ cd diffupath
$ python3 -m pip install -e .
Requirements¶
diffupath
requires the following libraries:
networkx (>=2.1)
pybel (0.13.2)
biokeen (0.0.14)
click (7.0)
tqdm (4.31.1)
numpy (1.16.3)
scipy (1.2.1)
scikit-learn (0.21.3)
pandas (0.24.2)
openpyxl (3.0.2)
plotly (4.5.3)
matplotlib (3.1.2)
matplotlib_venn (0.11.5)
bio2bel (0.2.1)
pathme
diffupy
Command Line Interface¶
The following commands can be used directly use from your terminal:
Download a database for network analysis.
The following command generates a BEL file representing the network of the given database.
$ python3 -m diffupath database network --database=<database-name>
To check the available databases, run the following command:
$ python3 -m diffupath database ls
Run a diffusion analysis
The following command will run a diffusion method on a given network with the given data
$ python3 -m diffupath diffusion run --network=<path-to-network-file> --input=<path-to-data-file> --method=<method>
Constants¶
Constants of DiffuPath.
- diffupath.constants.DEFAULT_DIFFUPATH_DIR = '/home/docs/.diffupath'¶
Default DiffuPath directory
- diffupath.constants.OUTPUT_DIFFUPATH_DIR = '/home/docs/.diffupath/output'¶
Default DiffuPath output directory
- diffupath.constants.BY_METHOD = 'method'¶
raw
- diffupath.constants.KEGG_NAME = 'kegg'¶
KEGG
- diffupath.constants.REACTOME_NAME = 'reactome'¶
Reactome
- diffupath.constants.WIKIPATHWAYS_NAME = 'wikipathways'¶
WikiPathways
- diffupath.constants.MIRTARBASE_NAME = 'mirtarbase'¶
MirTarBase
- diffupath.constants.SIDER_NAME = 'sider'¶
SIDER
- diffupath.constants.PHEWAS_NAME = 'phewascatalog'¶
PhewasCatalog
- diffupath.constants.HSDN_NAME = 'hsdn'¶
HSDN
- diffupath.constants.DDR_NAME = 'ddr'¶
DDR
- diffupath.constants.DRUGBANK_NAME = 'drugbank'¶
DrugBank
- diffupath.constants.GENE_ONTOLOGY_NAME = 'go'¶
Gene Ontology
- diffupath.constants.DATABASES = ['kegg', 'reactome', 'wikipathways', 'mirtarbase', 'sider', 'phewascatalog', 'hsdn', 'ddr', 'drugbank', 'go']¶
Databases available for download in DiffuPath
Databases¶
In this section, we describe the types of networks (databases) you can select to run diffusion methods over. These include the following and are described in detail in this section *:
Select a network representing an individual biological database
Select multiple databases to generate a harmonized network
Select from one of four predefined collections of biological databases representing a harmonized network
Submit your own network † from one of the accepted formats
- *
Please note that all networks available through DiffuPath have been generated using PyBEL v.0.13.2.
- †
If there are duplicated nodes in your network, please take a look at this Jupyter Notebook to address the issue.
Network Dumps¶
Because of the high computational cost of generating the kernel, we provide links to pre-calculated kernels for a set of networks representing biological databases.
Database |
Description |
Reference |
Download |
---|---|---|---|
DDR |
Disease-disease associations |
||
DrugBank |
Drug and drug target interactions |
||
Gene Ontology |
Hierarchy of tens of thousands of biological processes |
||
HSDN |
Associations between diseases and symptoms |
||
KEGG |
Multi-omics interactions in biological pathways |
||
miRTarBase |
Interactions between miRNA and their targets |
||
Reactome |
Multi-omics interactions in biological pathways |
||
SIDER |
Associations between drugs and side effects |
||
WikiPathways |
Multi-omics interactions in biological pathways |
If you would like to use one of our predefined collections, you can similarly download pre-calculated kernels for sets of networks representing integrated biological databases.
Collection |
Database |
Description |
Download |
---|---|---|---|
#1 |
KEGG, Reactome and WikiPathways |
-omics and biological processes/pathways |
|
#2 |
KEGG, Reactome, WikiPathways and DrugBank |
-omics and biological processes/pathways with a strong focus on drug/chemical interactions |
|
#3 |
KEGG, Reactome, WikiPathways and MirTarBase |
-omics and biological processes/ pathways enriched with miRNAs |
Custom-network formats¶
You can also submit your own networks in any of the following formats:
Minimally, please ensure each of the following columns are included in the network file you submit:
Source
Target
Optionally, you can choose to add a third column, “Relation” in your network (as in the example below). If the relation between the Source and Target nodes is omitted, and/or if the directionality is ambiguous, either node can be assigned as the Source or Target.
Custom-network example¶
Source |
Target |
Relation |
---|---|---|
A |
B |
Increase |
B |
C |
Association |
A |
D |
Association |
You can also take a look at our sample networks folder for some examples networks.
References¶
- 1
Menche, J., et al. (2015). Disease networks. Uncovering disease-disease relationships through the incomplete interactome. Science, 347(6224), 1257601.
- 2
Wishart, D. S., et al. (2018). DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Research, 46(D1), D1074–D1082.
- 3
Ashburner, M., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics, 25(1), 25–9.
- 4
Zhou, X., Menche, J., Barabási, A. L., & Sharma, A. (2014). Human symptoms–disease network. Nature communications, 5(1), 1-10.
- 5
Kanehisa, et al. (2017). KEGG: new perspectives on genomes, pathways, diseases and drugs.. Nucleic Acids Res. 45,D353-D361.
- 6
Huang, H. Y., et al. (2020). miRTarBase 2020: updates to the experimentally validated microRNA–target interaction database. Nucleic acids research, 48(D1), D148-D154.
- 7
Fabregat, A et al. (2016). The Reactome Pathway Knowledgebase. Nucleic Acids Research 44. Database issue: D481–D487.
- 8
Kuhn, M., et al. (2016). The SIDER database of drugs and side effects. Nucleic Acids Research, 44(D1), D1075–D1079.
- 9
Slenter, D.N., et al. (2017). WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Research, 46(D1):D661-D667.
Visualization¶
Input mapping¶
Even though it is not relevant for the input user usage, taking into account the input mapped entities over the background network is relevant for the diffusion process assessment, since the coverage of the input implies the actual entities-scores that are being diffused. In other words, only the entities whose labels match an entity in the network will be further processed for diffusion.

To visualize the mapping statistics heatmap, use the following function:
Further data views can be rendered for the input data mapping, such as VennDiagram to explore the overlap or distribution bloxplot:


Validations¶
To visualize the metrics derived from validation experiments, you can plot metric Boxplots for repeated holdouts or iterated cross validation and its statistical tests and Barcharts with its threshold line:
Two dimensional BLOXPLOT:

Three dimensional BLOXPLOT:

Statistical test BARCHART:

PathMe Harmonization¶
Disclaimer¶
DiffuPath is a scientific software that has been developed in an academic capacity, and thus comes with no warranty or guarantee of maintenance, support, or back-up of data.
References¶
- 1
Domingo-Fernandez, D., Mubeen, S., Marin-Llao, J., Hoyt, C., et al. Hofmann-Apitius, M. (2019). PathMe: Merging and exploring mechanistic pathway knowledge.. BMC Bioinformatics, 20:243.
- 2
Hoyt, C. T., et al. (2019). Integration of Structured Biological Data Sources using Biological Expression Language. bioRxiv, 631812.