Export Formats
Available export formats for the OptimusKG knowledge graph.
OptimusKG exports the knowledge graph in multiple formats, each tailored for different use cases. Formats are configured in conf/base/parameters.yml.
CSV
Partitioned CSV files for each node and edge type, plus unified nodes.csv and edges.csv files.
Use case: Bulk importing into a Neo4j database.
Parquet
Partitioned Apache Parquet files for each node and edge type, plus unified nodes.parquet and edges.parquet files.
Use case: Data science and machine learning workflows with Polars or Apache Spark.
Neo4j-JSONL
A direct JSON lines export from a Neo4j instance of OptimusKG.
Use case: Interoperability with other tools in the Neo4j ecosystem.
Neo4j-JSONL requires Docker to run a Neo4j instance. See Neo4j Deployment.
Configuration
# conf/base/parameters.yml
gold:
export_formats:
csv:
properties: true
parquet:
properties: true
# neo4j:
# properties: trueDistributed OptimusKG data files contain only publicly available data. If you have access to private datasets, place them under data/landing/ and the pipeline will automatically use them.