Unifying biomedical knowledgein a modern multimodal graph

Schema figure
  • 65sources
  • 18ontologies
  • 10entity types
  • 190,531nodes
  • 27relation types
  • 21,813,816edges
  • 110,276,843properties

Rich strongly-typed properties

Every entity is enriched with structured properties for fine-grained analysis.

Learn about the schema
Graph Schema
idStringNode identifier in CURIE format (e.g. ENSG00000141510)
labelStringNode type abbreviation (GEN)
propertiesStructGene-specific properties
symbolStringOfficial HGNC gene symbol (e.g. TP53)
nameStringFull gene name
biotypeStringGene biotype (e.g. protein_coding, lncRNA)
genomic_locationStructChromosomal coordinates
chromosomeStringChromosome name
startInt64Start position (0-based)
endInt64End position
strandInt32Strand (+1 forward, -1 reverse)
transcription_start_siteInt64Transcription start site position
transcript_idsList[String]All associated Ensembl transcript IDs
function_descriptionsList[String]Functional descriptions
xrefsList[Struct]Cross-references to external databases
idStringExternal identifier
sourceStringDatabase name
sourcesStructProvenance of this node
directList[String]Datasets that directly contributed this entity
indirectList[String]Datasets that referenced this entity

Delightfully simple Python client

Install with one command and load the graph as Polars data frames or a NetworkX graph in a single line.

uv add optimuskg
Python Client
import optimuskg

# Download a specific file and store it locally
local_path = optimuskg.get_file("nodes/gene.parquet")

# Load a single Parquet file as a Polars DataFrame
drugs = optimuskg.load_parquet("nodes/drug.parquet")

# Load nodes and edges as Polars DataFrames
# Set lcc=True to load only the largest connected component
nodes, edges = optimuskg.load_graph(lcc=True)

# Load the graph as a NetworkX MultiDiGraph with metadata
# Set lcc=True to load only the largest connected component
G = optimuskg.load_networkx(lcc=True)