Graph Schema
Edges
Edge types and their schema in OptimusKG.
OptimusKG encodes 26 edge types connecting the node types across molecular, clinical, anatomical, and environmental domains.
| Label | Relation(s) | Count |
|---|---|---|
DIS-GEN | ASSOCIATED_WITH | 9,734,774 |
ANA-GEN | EXPRESSION_PRESENT, EXPRESSION_ABSENT | 8,787,955 |
DRG-DRG | SYNERGISTIC_INTERACTION, PARENT | 1,345,376 |
PHE-GEN | ASSOCIATED_WITH | 793,279 |
GEN-GEN | INTERACTS_WITH | 327,924 |
DIS-PHE | PHENOTYPE_PRESENT | 157,144 |
BPO-GEN | INTERACTS_WITH | 158,410 |
DRG-DIS | INDICATION, CONTRAINDICATION, OFF_LABEL_USE | 70,442 |
MFN-GEN | INTERACTS_WITH | 90,933 |
DRG-PHE | ADVERSE_DRUG_REACTION, ASSOCIATED_WITH, CONTRAINDICATION, INDICATION, OFF_LABEL_USE | 13,758 |
PWY-GEN | INTERACTS_WITH | 46,977 |
BPO-BPO | IS_A | 44,494 |
DIS-DIS | PARENT | 44,215 |
CCO-GEN | INTERACTS_WITH | 105,309 |
DRG-GEN | ACTIVATOR, AGONIST, ALLOSTERIC_ANTAGONIST, ANTAGONIST, BINDING_AGENT, BLOCKER, CARRIER, DEGRADER, ENZYME, INHIBITOR, INVERSE_AGONIST, MODULATOR, NEGATIVE_ALLOSTERIC_MODULATOR, NEGATIVE_MODULATOR, OPENER, PARTIAL_AGONIST, POSITIVE_ALLOSTERIC_MODULATOR, POSITIVE_MODULATOR, RELEASING_AGENT, STABILISER, SUBSTRATE, TARGET, TRANSPORTER | 20,694 |
PHE-PHE | PARENT | 24,862 |
MFN-MFN | IS_A | 12,587 |
PWY-PWY | PARENT | 2,819 |
EXP-GEN | INTERACTS_WITH | 2,989 |
EXP-DIS | LINKED_TO | 2,391 |
EXP-EXP | PARENT | 2,443 |
EXP-BPO | INTERACTS_WITH | 2,260 |
ANA-ANA | PARENT | 17,082 |
CCO-CCO | IS_A | 4,639 |
EXP-MFN | INTERACTS_WITH | 47 |
EXP-CCO | INTERACTS_WITH | 13 |
All edges share the same base schema in the unified edges.parquet and largest_connected_component_edges.parquet tables:
fromStringSource node identifier in CURIE format
toStringTarget node identifier in CURIE format
labelStringEdge type label (e.g. DIS-GEN)
relationStringRelation type (e.g. ASSOCIATED_WITH)
undirectedBooleanTrue if the edge has no intrinsic directionality
propertiesStringJSON-encoded edge-specific properties. Expanded to a native Struct in per-type parquet files.
In the stratified per-type parquet files (edges/<label>.parquet), properties is expanded into native typed columns as a Polars Struct.
Anatomy-Anatomy
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (ANA-ANA)
relationStringRelation type
undirectedBooleanFalse
propertiesStructEdge-specific properties
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Anatomy-Gene
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (ANA-GEN)
relationStringRelation type
undirectedBooleanTrue
propertiesStructEdge-specific properties
expression_rankInt32Bgee expression rank score (lower = higher expression)
call_qualityStringExpression call quality (gold/silver)
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Biological Process-Biological Process
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (BPO-BPO)
relationStringRelation type
undirectedBooleanFalse
propertiesStructEdge-specific properties
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Biological Process-Gene
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (BPO-GEN)
relationStringRelation type
undirectedBooleanTrue
propertiesStructEdge-specific properties
evidenceList[String]GO evidence codes (e.g. IDA, IMP, TAS)
gene_productList[String]Gene product IDs annotated to this term
eco_idsList[String]Evidence & Conclusion Ontology (ECO) IDs
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Cellular Component-Cellular Component
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (CCO-CCO)
relationStringRelation type
undirectedBooleanFalse
propertiesStructEdge-specific properties
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Cellular Component-Gene
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (CCO-GEN)
relationStringRelation type
undirectedBooleanTrue
propertiesStructEdge-specific properties
evidenceList[String]GO evidence codes (e.g. IDA, IMP, TAS)
gene_productList[String]Gene product IDs annotated to this term
eco_idsList[String]Evidence & Conclusion Ontology (ECO) IDs
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Disease-Disease
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (DIS-DIS)
relationStringRelation type
undirectedBooleanFalse
propertiesStructEdge-specific properties
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Disease-Gene
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (DIS-GEN)
relationStringRelation type
undirectedBooleanTrue
propertiesStructEdge-specific properties
evidence_scoreFloat64Aggregated association evidence score
evidence_countInt64Number of evidence items supporting the association
evidence_indexFloat64Combined evidence index (Open Targets)
disease_specificity_indexFloat64DSI, specificity of the gene to this disease
disease_pleiotropy_indexFloat64DPI, number of disease classes the gene is associated with
disgenet_scoreFloat64DisGeNET gene–disease association score
year_initialStringYear of the earliest supporting publication
year_finalStringYear of the most recent supporting publication
number_of_pmidsInt16Number of supporting PubMed publications
number_of_snpsInt16Number of supporting SNPs (GWAS evidence)
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Disease-Phenotype
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (DIS-PHE)
relationStringRelation type
undirectedBooleanTrue
propertiesStructEdge-specific properties
aspectList[String]HPO annotation aspect (P=phenotypic, I=inheritance, etc.)
evidence_typeList[String]Evidence type codes (e.g. IEA, PCS, TAS)
frequencyList[String]Phenotype frequency annotations
onsetList[String]Age of onset annotations
modifiersList[String]Clinical modifier annotations
sexesList[String]Sex-specific annotations
qualifier_notBooleanTrue if phenotype is explicitly absent
bio_curationList[String]Biocuration provenance entries
referencesList[String]Supporting publication or database references
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Drug-Disease
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (DRG-DIS)
relationStringRelation type
undirectedBooleanTrue
propertiesStructEdge-specific properties
highest_clinical_trial_phaseFloat64Highest clinical trial phase for this indication
structure_idStringDrugCentral structure ID
drug_disease_idStringDrugCentral drug–disease identifier
reference_idsList[String]Supporting reference identifiers
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Drug-Drug
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (DRG-DRG)
relationStringRelation type
undirectedBooleanFalse
propertiesStructEdge-specific properties
interaction_descriptionStringDescription of the drug–drug interaction
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Drug-Gene
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (DRG-GEN)
relationStringRelation type
undirectedBooleanFalse
propertiesStructEdge-specific properties
mechanisms_of_actionList[String]Mechanism of action descriptions
source_idsList[String]Source-specific interaction identifiers
source_urlsList[String]URLs to source evidence records
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Drug-Phenotype
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (DRG-PHE)
relationStringRelation type
undirectedBooleanTrue
propertiesStructEdge-specific properties
highest_clinical_trial_phaseFloat64Highest clinical trial phase
structure_idStringDrugCentral structure ID
drug_disease_idStringDrugCentral drug–disease identifier
reference_idsList[String]Supporting reference identifiers
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Exposure-Biological Process
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (EXP-BPO)
relationStringRelation type
undirectedBooleanTrue
propertiesStructEdge-specific properties
evidence_countUInt32Number of evidence entries
number_of_receptorsInt64Number of receptor/study participants
receptorsList[String]Receptor identifiers (e.g. cell line, organism)
receptor_notesList[String]Free-text notes on receptors
smoking_statusesList[String]Smoking status of study subjects
sexesList[String]Sex of study subjects
racesList[String]Race/ethnicity of study subjects
methodsList[String]Measurement methods used
mediumsList[String]Biological mediums measured (e.g. blood, urine)
detection_limitList[String]Lower limit of detection values
detection_limit_uomList[String]Units of detection limit values
detection_frequencyList[String]Detection frequency values
age_entriesUInt32Number of age-stratified entries
age_range_valuesList[String]Age range values for subjects
age_mean_valuesList[String]Mean age values
age_median_valuesList[String]Median age values
age_point_valuesList[String]Point age values
age_open_range_valuesList[String]Open-ended age range values
study_countriesList[String]Countries where studies were conducted
states_or_provincesList[String]States or provinces of study
city_town_region_areasList[String]City/town/region of study
outcome_relationshipsList[String]Observed outcome relationships
exposure_event_notesList[String]Notes on the exposure event
exposure_outcome_notesList[String]Notes on the exposure outcome
referencesList[String]Supporting literature references
associated_study_titlesList[String]Titles of associated studies
enrollment_start_yearsList[String]Study enrollment start years
enrollment_end_yearsList[String]Study enrollment end years
study_factorsList[String]Study design factors
assay_notesList[String]Notes on the assay used
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Exposure-Cellular Component
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (EXP-CCO)
relationStringRelation type
undirectedBooleanTrue
propertiesStructEdge-specific properties
evidence_countUInt32Number of evidence entries
number_of_receptorsInt64Number of receptor/study participants
receptorsList[String]Receptor identifiers (e.g. cell line, organism)
receptor_notesList[String]Free-text notes on receptors
smoking_statusesList[String]Smoking status of study subjects
sexesList[String]Sex of study subjects
racesList[String]Race/ethnicity of study subjects
methodsList[String]Measurement methods used
mediumsList[String]Biological mediums measured (e.g. blood, urine)
detection_limitList[String]Lower limit of detection values
detection_limit_uomList[String]Units of detection limit values
detection_frequencyList[String]Detection frequency values
age_entriesUInt32Number of age-stratified entries
age_range_valuesList[String]Age range values for subjects
age_mean_valuesList[String]Mean age values
age_median_valuesList[String]Median age values
age_point_valuesList[String]Point age values
age_open_range_valuesList[String]Open-ended age range values
study_countriesList[String]Countries where studies were conducted
states_or_provincesList[String]States or provinces of study
city_town_region_areasList[String]City/town/region of study
outcome_relationshipsList[String]Observed outcome relationships
exposure_event_notesList[String]Notes on the exposure event
exposure_outcome_notesList[String]Notes on the exposure outcome
referencesList[String]Supporting literature references
associated_study_titlesList[String]Titles of associated studies
enrollment_start_yearsList[String]Study enrollment start years
enrollment_end_yearsList[String]Study enrollment end years
study_factorsList[String]Study design factors
assay_notesList[String]Notes on the assay used
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Exposure-Disease
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (EXP-DIS)
relationStringRelation type
undirectedBooleanFalse
propertiesStructEdge-specific properties
evidence_countUInt32Number of evidence entries
number_of_receptorsInt64Number of receptor/study participants
receptorsList[String]Receptor identifiers (e.g. cell line, organism)
receptor_notesList[String]Free-text notes on receptors
smoking_statusesList[String]Smoking status of study subjects
sexesList[String]Sex of study subjects
racesList[String]Race/ethnicity of study subjects
methodsList[String]Measurement methods used
mediumsList[String]Biological mediums measured (e.g. blood, urine)
detection_limitList[String]Lower limit of detection values
detection_limit_uomList[String]Units of detection limit values
detection_frequencyList[String]Detection frequency values
age_entriesUInt32Number of age-stratified entries
age_range_valuesList[String]Age range values for subjects
age_mean_valuesList[String]Mean age values
age_median_valuesList[String]Median age values
age_point_valuesList[String]Point age values
age_open_range_valuesList[String]Open-ended age range values
study_countriesList[String]Countries where studies were conducted
states_or_provincesList[String]States or provinces of study
city_town_region_areasList[String]City/town/region of study
outcome_relationshipsList[String]Observed outcome relationships
exposure_event_notesList[String]Notes on the exposure event
exposure_outcome_notesList[String]Notes on the exposure outcome
referencesList[String]Supporting literature references
associated_study_titlesList[String]Titles of associated studies
enrollment_start_yearsList[String]Study enrollment start years
enrollment_end_yearsList[String]Study enrollment end years
study_factorsList[String]Study design factors
assay_notesList[String]Notes on the assay used
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Exposure-Exposure
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (EXP-EXP)
relationStringRelation type
undirectedBooleanFalse
propertiesStructEdge-specific properties
evidence_countUInt32Number of evidence entries
number_of_receptorsInt64Number of receptor/study participants
receptorsList[String]Receptor identifiers (e.g. cell line, organism)
receptor_notesList[String]Free-text notes on receptors
smoking_statusesList[String]Smoking status of study subjects
sexesList[String]Sex of study subjects
racesList[String]Race/ethnicity of study subjects
methodsList[String]Measurement methods used
mediumsList[String]Biological mediums measured (e.g. blood, urine)
detection_limitList[String]Lower limit of detection values
detection_limit_uomList[String]Units of detection limit values
detection_frequencyList[String]Detection frequency values
age_entriesUInt32Number of age-stratified entries
age_range_valuesList[String]Age range values for subjects
age_mean_valuesList[String]Mean age values
age_median_valuesList[String]Median age values
age_point_valuesList[String]Point age values
age_open_range_valuesList[String]Open-ended age range values
study_countriesList[String]Countries where studies were conducted
states_or_provincesList[String]States or provinces of study
city_town_region_areasList[String]City/town/region of study
outcome_relationshipsList[String]Observed outcome relationships
exposure_event_notesList[String]Notes on the exposure event
exposure_outcome_notesList[String]Notes on the exposure outcome
referencesList[String]Supporting literature references
associated_study_titlesList[String]Titles of associated studies
enrollment_start_yearsList[String]Study enrollment start years
enrollment_end_yearsList[String]Study enrollment end years
study_factorsList[String]Study design factors
assay_notesList[String]Notes on the assay used
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Exposure-Gene
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (EXP-GEN)
relationStringRelation type
undirectedBooleanFalse
propertiesStructEdge-specific properties
evidence_countUInt32Number of evidence entries
number_of_receptorsInt64Number of receptor/study participants
receptorsList[String]Receptor identifiers (e.g. cell line, organism)
receptor_notesList[String]Free-text notes on receptors
smoking_statusesList[String]Smoking status of study subjects
sexesList[String]Sex of study subjects
racesList[String]Race/ethnicity of study subjects
methodsList[String]Measurement methods used
mediumsList[String]Biological mediums measured (e.g. blood, urine)
detection_limitList[String]Lower limit of detection values
detection_limit_uomList[String]Units of detection limit values
detection_frequencyList[String]Detection frequency values
age_entriesUInt32Number of age-stratified entries
age_range_valuesList[String]Age range values for subjects
age_mean_valuesList[String]Mean age values
age_median_valuesList[String]Median age values
age_point_valuesList[String]Point age values
age_open_range_valuesList[String]Open-ended age range values
study_countriesList[String]Countries where studies were conducted
states_or_provincesList[String]States or provinces of study
city_town_region_areasList[String]City/town/region of study
outcome_relationshipsList[String]Observed outcome relationships
exposure_event_notesList[String]Notes on the exposure event
exposure_outcome_notesList[String]Notes on the exposure outcome
referencesList[String]Supporting literature references
associated_study_titlesList[String]Titles of associated studies
enrollment_start_yearsList[String]Study enrollment start years
enrollment_end_yearsList[String]Study enrollment end years
study_factorsList[String]Study design factors
assay_notesList[String]Notes on the assay used
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Exposure-Molecular Function
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (EXP-MFN)
relationStringRelation type
undirectedBooleanTrue
propertiesStructEdge-specific properties
evidence_countUInt32Number of evidence entries
number_of_receptorsInt64Number of receptor/study participants
receptorsList[String]Receptor identifiers (e.g. cell line, organism)
receptor_notesList[String]Free-text notes on receptors
smoking_statusesList[String]Smoking status of study subjects
sexesList[String]Sex of study subjects
racesList[String]Race/ethnicity of study subjects
methodsList[String]Measurement methods used
mediumsList[String]Biological mediums measured (e.g. blood, urine)
detection_limitList[String]Lower limit of detection values
detection_limit_uomList[String]Units of detection limit values
detection_frequencyList[String]Detection frequency values
age_entriesUInt32Number of age-stratified entries
age_range_valuesList[String]Age range values for subjects
age_mean_valuesList[String]Mean age values
age_median_valuesList[String]Median age values
age_point_valuesList[String]Point age values
age_open_range_valuesList[String]Open-ended age range values
study_countriesList[String]Countries where studies were conducted
states_or_provincesList[String]States or provinces of study
city_town_region_areasList[String]City/town/region of study
outcome_relationshipsList[String]Observed outcome relationships
exposure_event_notesList[String]Notes on the exposure event
exposure_outcome_notesList[String]Notes on the exposure outcome
referencesList[String]Supporting literature references
associated_study_titlesList[String]Titles of associated studies
enrollment_start_yearsList[String]Study enrollment start years
enrollment_end_yearsList[String]Study enrollment end years
study_factorsList[String]Study design factors
assay_notesList[String]Notes on the assay used
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Gene-Gene
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (GEN-GEN)
relationStringRelation type
undirectedBooleanFalse
propertiesStructEdge-specific properties
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Molecular Function-Gene
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (MFN-GEN)
relationStringRelation type
undirectedBooleanTrue
propertiesStructEdge-specific properties
evidenceList[String]GO evidence codes (e.g. IDA, IMP, TAS)
gene_productList[String]Gene product IDs annotated to this term
eco_idsList[String]Evidence & Conclusion Ontology (ECO) IDs
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Molecular Function-Molecular Function
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (MFN-MFN)
relationStringRelation type
undirectedBooleanFalse
propertiesStructEdge-specific properties
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Pathway-Gene
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (PWY-GEN)
relationStringRelation type
undirectedBooleanTrue
propertiesStructEdge-specific properties
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Pathway-Pathway
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (PWY-PWY)
relationStringRelation type
undirectedBooleanFalse
propertiesStructEdge-specific properties
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Phenotype-Gene
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (PHE-GEN)
relationStringRelation type
undirectedBooleanFalse
propertiesStructEdge-specific properties
evidence_scoreFloat64Aggregated association evidence score
evidence_countInt64Number of evidence items supporting the association
evidence_indexFloat64Combined evidence index (Open Targets)
disease_specificity_indexFloat64DSI, specificity of the gene to this disease
disease_pleiotropy_indexFloat64DPI, number of disease classes the gene is associated with
disgenet_scoreFloat64DisGeNET gene–disease association score
year_initialStringYear of the earliest supporting publication
year_finalStringYear of the most recent supporting publication
number_of_pmidsInt16Number of supporting PubMed publications
number_of_snpsInt16Number of supporting SNPs (GWAS evidence)
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship
Phenotype-Phenotype
fromStringSource node ID (CURIE format)
toStringTarget node ID (CURIE format)
labelStringEdge type label (PHE-PHE)
relationStringRelation type
undirectedBooleanFalse
propertiesStructEdge-specific properties
sourcesStructProvenance of this edge
directList[String]Datasets that directly contributed this relationship
indirectList[String]Datasets that referenced this relationship