Data sources

Data source inventory.

Sources
48 +21 sub
Categories
7
By category
  • Genome 13
  • Transcriptome 11
  • Ont & Lit 11
  • Proteome 10
  • Therapeutics 6
  • Visualization 2
  • Phenome 1
Key highlights
Genome
Clinical variants (ClinVar, Orphanet), common & rare-variant GWAS/PheWAS, fine-mapped credible sets, somatic cancer mutations, gnomAD constraint, polygenic risk scores, eQTL/cCRE overlays
Transcriptome
Bulk and single-cell expression across normal tissues, tumors, and cell lines (GTEx, TCGA, HPA, CPTAC, CELLxGENE); CRISPR/DepMap genetic dependencies; cross-species orthology (Bgee)
Proteome
Protein-protein interactions (BioGRID, InnateDB, MatrixDB, STRING), UniProt features & domains, plasma pQTLs (UKB-PPP, SomaScan), tumor proteomics
Phenome
UK Biobank case/control cohorts, phenotype labels, and ICD-10/EFO mappings
Therapeutics
Drug-target-indication evidence (ChEMBL, IDG), drug sensitivity (PharmacoDB), antibody-drug conjugates (ADCdb), antibody sequences (PLAbDab), clinical trials
Ont & Lit
Disease ontologies (MONDO, EFO, HPO, Orphanet, ICD-10, UMLS, OncoTree), pathway/function (GO, Reactome, MSigDB, OmniPath), tissue/anatomy (UBERON), literature (EuropePMC, OpenAlex, NCBI)
Visualization
3D protein structures (AlphaFold, PDBe, SWISS-MODEL), homology & isoform models, subcellular localization pictograms

Complete list

Data source Category Summary statistics
Cancer Hotspots Genome 2,675 variants · 224 transcripts
cBioPortal Genome 8,815,313 mutations · 495 sources
CIViC Genome clinical interpretations of variants
ClinVar Genome 553,894 variants · 29,025 phenotypes
EBI eQTL Catalogue Genome 32 studies
ENCODE SCREEN cCREs Genome ~2.3M cCREs
Ensembl Genome 87,893 genes
FinnGen Genome 11,814 targets · 2,591 phenotypes
GENIE Genome 290,583 variants · 1,738 transcripts
gnomAD Genome ~17,500 transcripts
GWAS Catalog Genome 255,022 variants · 12,583 phenotypes
Open Targets Scores Genome Transcriptome Proteome Ont & Lit 12 sub-sources
Orphanet Genome Ont & Lit 2,300 phenotypes · 7,875 cross-references
PGS Catalog Genome 5,332 models · 669 traits · 781 publications
Gene Ontology (GO) Ont & Lit 40,440 terms
ICD-10 Ont & Lit 4,068 phenotypes
Molecular Signatures Database (MSigDB) Ont & Lit 6,269,060 gene-set memberships
NCBI Ont & Lit 95M+ gene records
OmniPath Ont & Lit 100+ resources
OncoTree Ont & Lit 834 oncotree codes · 257 cancer types
OpenAlex Ont & Lit 309M+ scholarly works
REACTOME Ont & Lit 2,863 pathway annotations
UMLS CUI Ont & Lit 726 phenotypes
UK Biobank Phenome 20,119 targets · 7,266 phenotypes
BioGRID Proteome 18,471 proteins · 5,840,810 interactions
InnateDB Proteome 3,737 proteins · 33,359 interactions
MatrixDB Proteome 10,423 proteins · 443,458 interactions
SomaScan Soon Proteome aptamer-based proteomics (Fenland / INTERVAL / deCODE)
STRING Soon Proteome 12,174 proteins · 8,387,240 interactions
UKB-PPP Proteome 2,941 proteins · 54,306 participants · 14,287 pQTLs
UniProt Proteome 20,779 protein-coding genes · 43,114 reviewed proteins · 1,425,900 features + domains
ADCdb Therapeutics 327 antigens · 6,500+ ADCs
ChEMBL Therapeutics 2,898,002 compounds · 1,001 targets · 2,811 indications
ClinicalTrials.gov Therapeutics 581,326 trials
Illuminating the Druggable Genome (IDG) Therapeutics 256 targets · 15,054 phenotypes
PharmacoDB Therapeutics 55,302 compounds · 5,712,751 measurements
PLAbDab Therapeutics 150,000+ antibody sequences
Bgee Transcriptome 60,490 genes · 3 species
CELLxGENE Discover Transcriptome 33M+ cells · 436 datasets · 2,700+ cell types
CPTAC (pancan) Transcriptome Proteome 104,235 transcripts · 1,565 samples · 10 datasets
Gene Expression Omnibus (GEO) Soon Transcriptome Proteome 273,310 series · 4,352 datasets
GTEx Transcriptome 58,988 transcripts · 68 tissues
Human Protein Atlas Transcriptome Proteome 20,141 transcripts · 59 tissues
PanglaoDB Transcriptome 178 cell types
Prostate Cancer Atlas Transcriptome Proteome 92,144 samples
TCGA Transcriptome 17,102 transcripts · 13,454 samples · 34 datasets
TCGA/TARGET/GTEx (UCSC XenaBrowser) Transcriptome 17,409 transcripts · 28,823 samples · 37 datasets
SWISS-MODEL Visualization 3,939,534 models · 234,439 structures · 12 proteomes
SwissBioPic Soon Visualization ~50 species with subcellular pictograms
3D Beacons Visualization 9 sub-sources