Getting Started
Quickstart
Generate a full biomedical knowledge graph in one command.
OptimusKG is designed to generate a full knowledge graph in one command.
Generate the Graph
uv run kedro run \
--to-nodes gold.export_kg \
--runner=optimuskg.runners.FixedParallelRunner \
--asyncThis will:
- Download all necessary data into the
landinglayer - Extract and standardize data in the
bronzelayer - Consolidate entities and relationships in the
silverlayer - Export the final knowledge graph in the
goldlayer
The exported graph is saved to data/gold/formats/.
Use FixedParallelRunner for concurrent node execution and the --async flag to reduce I/O time. The original ParallelRunner from Kedro has a bug that prevents validation checks.
Export Formats
Export formats can be configured in conf/base/parameters.yml under gold.export_formats:
| Format | Description |
|---|---|
| CSV | Partitioned CSV files for each node and edge type. Useful for Neo4j bulk import. |
| Parquet | Apache Parquet files. Recommended for data science workflows with Polars or Spark. |
| Neo4j-JSONL | JSON lines export from a Neo4j instance. Requires Docker. |
Start Neo4j
To browse the graph interactively:
make neo4jAccess the Neo4j Browser at http://localhost:7474/browser/preview/.
Run a Specific Pipeline
# Run only the bronze layer
uv run kedro run --pipeline bronze
# Run a specific node
uv run kedro run --nodes <node_name>
# Run from a specific node downstream
uv run kedro run --from-nodes=<node_name>
# Dry run (list nodes without executing)
uv run kedro run --from-nodes=<node_name> --runner=optimuskg.runners.DryRunner