OptimusKG
Graph Schema

Export Formats

Available export formats for the OptimusKG knowledge graph.

OptimusKG exports the knowledge graph in multiple formats, each tailored for different use cases. Formats are configured in conf/base/parameters.yml.

CSV

Partitioned CSV files for each node and edge type, plus unified nodes.csv and edges.csv files.

Use case: Bulk importing into a Neo4j database.

Parquet

Partitioned Apache Parquet files for each node and edge type, plus unified nodes.parquet and edges.parquet files.

Use case: Data science and machine learning workflows with Polars or Apache Spark.

Neo4j-JSONL

A direct JSON lines export from a Neo4j instance of OptimusKG.

Use case: Interoperability with other tools in the Neo4j ecosystem.

Neo4j-JSONL requires Docker to run a Neo4j instance. See Neo4j Deployment.

Configuration

# conf/base/parameters.yml
gold:
  export_formats:
    csv:
      properties: true
    parquet:
      properties: true
    # neo4j:
    #   properties: true

Distributed OptimusKG data files contain only publicly available data. If you have access to private datasets, place them under data/landing/ and the pipeline will automatically use them.

On this page