An operating spec for a durable, file-first family-history archive with an AI research assistant layered on top.
This project stemmed from one idea: for a hundred years, genealogy lived in a filing cabinet, and anyone could open the drawer. No login, no subscription, no schema migration. A century later a curious descendant could still pull the folder or open the book and read it. Modern genealogy software and workflows have lost that virtue.
Plainfile is that filing cabinet, built to last. Plain files at the foundation, with search, structured claims, and an AI research layer stacked on top of the files, never instead of them. Delete every layer above and the archive still works, the way the drawer still works.
NOTE: This is a specification and scaffold, not finished software.
It is the blueprint for a simple future proof system for family research. The goal of this repo is to establish simple standards that can be maintained with or without tooling. It also provides the spec to create tooling from scratch if you so wish and sample tools you can you use to get you started.
Three things live at arm’s length from each other, by design:
SPEC.md, TOOLING.md, AGENTS.md), the docs, the generic fha tools (once built), an empty archive-template/, and a fictional example-archive/ fixture.tools/): generic — they operate on any conforming archive and hold no family data. Publishing them is the manifestation of the spec. Tools are replaceable glue, regenerable from the spec.archive-template/, depending on this repo’s spec and tools but never living inside it. Public examples stay fictional; your groceries don’t go in the cookbook.Public examples must remain fictional. Do not open issues or PRs containing real records about living people, private family documents, raw DNA files, or identifying photos. See PRIVACY.md.
In practice you end up with two separate repositories — one public, one private:
plainfile-family-history/ ← PUBLIC: the spec + the generic tools (this repo)
my-family-archive/ ← PRIVATE: your real family's records
They are not technically linked. The only relationship is that your private archive uses the tools that live in this public repo. There are two ways to get those tools to your archive (decide once the tools are built — you don’t need to now):
tools/ folder into your private archive so the tools live beside your data. The archive becomes fully self-contained — it works on any machine, offline, forever, even if this repo disappears. Updating means re-copying tools/ when they improve. Recommended for a personal archive — it matches the “survives tool churn, usable from a USB stick” goal.fha suite is implemented and packaged, you’ll be able to install it from this repo (pip install git+https://github.com/YOURNAME/plainfile-family-history.git) and call fha from anywhere. Cleaner day-to-day (tools live in one place), but your archive then depends on the tools being installed separately. Until then, use the vendored-copy model above.Either way, your private family data never enters this public repo. The public repo is the cookbook and the appliances; your private repo is your kitchen with your food in it.
Plainfile is an archive-first system. The durable archive — plain text and standard file formats on disk — is the source of truth. Every other moving part (the search index, the AI assistant, any genealogy app or website) is an optional, replaceable helper built from the archive and rebuildable from scratch.
It is designed to be operated with an AI coding agent.
You open the archive in Claude Code (or any agent that reads AGENTS.md), and the agent helps you process records, draft sourced claims, build family trees, and surface research leads — while a set of small deterministic tools (the fha command suite, specified in TOOLING.md) does the mechanical work.
The spec is written so that all of that tooling can be regenerated from the documents, in any language, if it is ever lost.
fha tools (lint, index, id, stubs) are now implemented; the full suite (process, site, packet, etc.) is still being built per the roadmap below.Your existing photos and documents plus FIVE record types, all plain Markdown/YAML on disk:
| Type | What it is |
|---|---|
Person P- |
A human — identity, flags, and prose. |
Source S- |
A piece of evidence: a record, document, photo, interview. |
Claim C- |
A single sourced assertion (a date, place, relationship) living inside its source record, moving through a suggested → accepted review lifecycle. |
Place L- |
A physical location, identified by coordinates, with a dated name/jurisdiction history. |
Hypothesis H- |
An unsourced working theory — a guess, never a fact, until evidence promotes it to a claim. |
Around those, a rebuildable index (SQLite, regenerated from the files) powers search, family-tree generation, contradiction detection, and a research report — none of it authoritative, all of it disposable. The operating loop is simple: capture → file → process → review → report, with human review the only gate to an accepted fact.
plainfile-family-history/
├── README.md ← you are here
├── SPEC.md ← the law: philosophy, data model, physical format, governance
├── TOOLING.md ← implementation design for every supporting tool (the fha suite)
├── AGENTS.md ← canonical operating instructions for the AI agent
├── CLAUDE.md ← Claude Code entry point (defers to AGENTS.md)
├── docs/ ← supporting documentation
│ ├── GETTING_STARTED.md
│ ├── GLOSSARY.md
│ └── FAQ.md
├── archive-template/ ← empty skeleton (+ fha.yaml) to copy when starting your own (private) archive
├── example-archive/ ← a small, fully fictional worked example (+ its own fha.yaml)
├── tools/ ← the generic fha command suite (skeletal in v1; see TOOLING.md)
├── tests/ ← fixtures for the linter (skeletal in v1)
├── PRIVACY.md ← example-data policy
└── .github/ ← issue templates, contributing guide
You need an AI coding agent that can read project instructions and run shell commands — Claude Code is the reference harness. The spec is harness-agnostic; anything that reads
AGENTS.mdworks.
SPEC.md end to end. It is the contract; everything else serves it.CLAUDE.md → AGENTS.md and know the rules before you say anything.TOOLING.md §15. The first milestone is the linter running clean on example-archive/.inbox/, and ask the agent to process it.See docs/GETTING_STARTED.md for the full walkthrough.
| Document | Read it for |
|---|---|
| SPEC.md | The complete specification — what exists, how it lives on disk, and the rules that never bend. Start here. |
| TOOLING.md | How every tool is built, in enough detail to rewrite it from scratch. The fha command suite, the index schema, the linter rules. |
| AGENTS.md | What an AI agent may and may not do inside the archive — the contract, the operating modes, the workflows. |
| docs/GETTING_STARTED.md | A practical first-session walkthrough. |
| docs/GLOSSARY.md | Every term and ID type defined. |
| docs/FAQ.md | Why files, why not a database, why AI, how durable is this really. |
.md, .txt, .csv, .jsonl, .yaml, .jpg, .tiff, embedded IPTC/XMP.Current: spec v1.2 — milestone 1 complete.
The first implementation milestone is done: fha lint runs on the example archive with no errors.
The intended build sequence (detailed in TOOLING.md §15):
_lib: parsing, dates, ID grammar, path resolution)fha id, fha index, fha lint, fha stubs — the substrate (milestone 1: lint clean on the example archive)fha process, view generators, the photo indexA related project worth studying: if your interest is the research half — autonomous AI research loops, archive guides for specific countries, prompt templates for pushing a family tree backward — see autoresearch-genealogy by Matt Prusak. It and Plainfile arrived independently at the same files-first, Claude-Code-driven philosophy from different angles: that project is a research playbook, this one is the filing system the findings live in. They complement each other well.
This is an early-stage spec and feedback is genuinely useful — especially from genealogists and from anyone building the tools against it.
See .github/CONTRIBUTING.md.
Issues and discussion welcome.
MIT. The spec, the documents, and any code built from them are free to use, adapt, and build on.