Build a packet
Build a packet the Refinery accepts
Describes exactly how to assemble a packet (or packet set) the Graunt Refinery accepts: the input shapes, the metadata.csv columns it recognizes, the packet-set manifest, the rights and provenance it expects, and the deterministic artifacts it produces. Following this contract makes your upload ingest cleanly the first time.
Input shapes the Refinery accepts
- folder — a directory of source files plus an optional metadata.csv
- zip — a safe archive of the same (no traversal, no absolute paths, no symlinks)
- metadata.csv — one row per source file, joined to the files by path / basename / sha256
- source files — individual TXT, MD, CSV/TSV, JSON/JSONL, or born-digital PDF files
What each source needs
- filename / path Required the source file each metadata row describes (the primary join key)
- rights / license Required the rights basis for the source (e.g. PUBLIC_DOMAIN, GOVERNMENT_WORK, CC0, or your own license)
- display_name / title Required the human-facing packet name
- description Required what the packet contains and who it is for
- citation / source_url Optional where the source came from; recommended for provenance
- packet_category Optional COMPLETE, FOUNDATION, or SUB_PACKET when assembling a packet set
- buy_price / monthly_sub / annual_sub Optional integer cents you set; a paid FOUNDATION must clear the paid minimum ($0 FOUNDATION is rejected)
Unknown columns are retained as extra_<column> and never cause a failure. Rows join to files by exact path → basename → normalized basename → sha256.
Packet sets
Manifest schema graunt/packet-set/v1. Roles: MASTER, FOUNDATION, MEMBER. Categories: COMPLETE, FOUNDATION, SUB_PACKET.
Tier 0 (deterministic, model-free retrieval files) is required ONLY when you claim a packet is agent-ready. Tier 1 (standard navigation) and Tier 2 (premium/semantic) are optional and never required to publish.
Rights & provenance
- Declare a rights basis for every source; public-domain source still produces a refined, machine-optimized packet you may price.
- Provide citation / source_url where you have it; provenance strengthens the packet but is never fabricated for you.
- Absent required packet metadata is proposed and flagged for your confirmation — never silently invented.
What the Refinery produces
- source originals (preserved byte-for-byte with content_sha256)
- normalized content
- content_units (stable ids + byte offsets)
- retrieval_chunks
- chunk_context
- BM25 postings + meta
- router.retrieval guidance
- manifest + checksums
After you submit
- Assembled listings enter PENDING_REVIEW.
- A human admin review remains ahead of any listing going ACTIVE — assembly never auto-publishes.
- The Refinery runs deterministic, model-free extraction; you bring your own retrieval or model stack to use the agent-ready files.
- Unsupported or scan-only inputs are recorded with an explicit extraction status — they are never silently dropped or claimed as fully extracted.
Test a packet manifest
Paste a packet manifest JSON object below and check it against the deterministic validator before you upload. This runs the same presence, validation, and coherence checks the Refinery runs on ingest — no account needed, no files stored.
Agents can call the same check directly: POST /v1/refinery/validate with { "packet_manifest": { … } } (public, no auth). The contract is machine-readable at /v1/meta/packet-assembly-contract.