plainfile-family-history

Plainfile Family History

An operating spec for a durable, file-first family-history archive with an AI research assistant layered on top.

This project stemmed from one idea: for a hundred years, genealogy lived in a filing cabinet, and anyone could open the drawer. No login, no subscription, no schema migration. A century later a curious descendant could still pull the folder or open the book and read it. Modern genealogy software and workflows have lost that virtue.

Plainfile is that filing cabinet, built to last. Plain files at the foundation, with search, structured claims, and an AI research layer stacked on top of the files, never instead of them. Delete every layer above and the archive still works, the way the drawer still works.

NOTE: This is a specification and scaffold, not finished software.

It is the blueprint for a simple future proof system for family research. The goal of this repo is to establish simple standards that can be maintained with or without tooling. It also provides the spec to create tooling from scratch if you so wish and sample tools you can you use to get you started.

Repo, tools, and your archive

Three things live at arm’s length from each other, by design:

This repo (public): the spec (SPEC.md, TOOLING.md, AGENTS.md), the docs, the generic fha tools (once built), an empty archive-template/, and a fictional example-archive/ fixture.
The tools (public, in tools/): generic — they operate on any conforming archive and hold no family data. Publishing them is the manifestation of the spec. Tools are replaceable glue, regenerable from the spec.
Your archive (private, separate repo): your real family’s records, created from archive-template/, depending on this repo’s spec and tools but never living inside it. Public examples stay fictional; your groceries don’t go in the cookbook.

Public examples must remain fictional. Do not open issues or PRs containing real records about living people, private family documents, raw DNA files, or identifying photos. See PRIVACY.md.

How the two repos relate

In practice you end up with two separate repositories — one public, one private:

plainfile-family-history/   ← PUBLIC:  the spec + the generic tools (this repo)
my-family-archive/          ← PRIVATE: your real family's records

They are not technically linked. The only relationship is that your private archive uses the tools that live in this public repo. There are two ways to get those tools to your archive (decide once the tools are built — you don’t need to now):

Vendor (copy them in). Copy this repo’s tools/ folder into your private archive so the tools live beside your data. The archive becomes fully self-contained — it works on any machine, offline, forever, even if this repo disappears. Updating means re-copying tools/ when they improve. Recommended for a personal archive — it matches the “survives tool churn, usable from a USB stick” goal.
Install (once packaging exists). Not available yet — the tools are specified, not built. Once the fha suite is implemented and packaged, you’ll be able to install it from this repo (pip install git+https://github.com/YOURNAME/plainfile-family-history.git) and call fha from anywhere. Cleaner day-to-day (tools live in one place), but your archive then depends on the tools being installed separately. Until then, use the vendored-copy model above.

Either way, your private family data never enters this public repo. The public repo is the cookbook and the appliances; your private repo is your kitchen with your food in it.

What this is
What this is not
How it works
Repository layout
Quick start
The documents
Design principles
Status \& roadmap
A complementary project
Contributing
License

What this is

Plainfile is an archive-first system. The durable archive — plain text and standard file formats on disk — is the source of truth. Every other moving part (the search index, the AI assistant, any genealogy app or website) is an optional, replaceable helper built from the archive and rebuildable from scratch.

It is designed to be operated with an AI coding agent. You open the archive in Claude Code (or any agent that reads AGENTS.md), and the agent helps you process records, draft sourced claims, build family trees, and surface research leads — while a set of small deterministic tools (the fha command suite, specified in TOOLING.md) does the mechanical work. The spec is written so that all of that tooling can be regenerated from the documents, in any language, if it is ever lost.

What this is not

Not a finished app. The core fha tools (lint, index, id, stubs) are now implemented; the full suite (process, site, packet, etc.) is still being built per the roadmap below.
Not a database. No server, no proprietary store. Files are the truth; the index is a disposable cache.
Not a genealogy app that happens to store documents. It is the inverse: an archive that may feed a genealogy app via export.
Not a hosted service. Your data lives on your disk, in formats you can read with a text editor.

How it works

Your existing photos and documents plus FIVE record types, all plain Markdown/YAML on disk:

Type	What it is
Person `P-`	A human — identity, flags, and prose.
Source `S-`	A piece of evidence: a record, document, photo, interview.
Claim `C-`	A single sourced assertion (a date, place, relationship) living inside its source record, moving through a `suggested → accepted` review lifecycle.
Place `L-`	A physical location, identified by coordinates, with a dated name/jurisdiction history.
Hypothesis `H-`	An unsourced working theory — a guess, never a fact, until evidence promotes it to a claim.

Around those, a rebuildable index (SQLite, regenerated from the files) powers search, family-tree generation, contradiction detection, and a research report — none of it authoritative, all of it disposable. The operating loop is simple: capture → file → process → review → report, with human review the only gate to an accepted fact.

Repository layout

plainfile-family-history/
├── README.md            ← you are here
├── SPEC.md              ← the law: philosophy, data model, physical format, governance
├── TOOLING.md           ← implementation design for every supporting tool (the fha suite)
├── AGENTS.md            ← canonical operating instructions for the AI agent
├── CLAUDE.md            ← Claude Code entry point (defers to AGENTS.md)
├── docs/                ← supporting documentation
│   ├── GETTING_STARTED.md
│   ├── GLOSSARY.md
│   └── FAQ.md
├── archive-template/    ← empty skeleton (+ fha.yaml) to copy when starting your own (private) archive
├── example-archive/     ← a small, fully fictional worked example (+ its own fha.yaml)
├── tools/               ← the generic fha command suite (skeletal in v1; see TOOLING.md)
├── tests/               ← fixtures for the linter (skeletal in v1)
├── PRIVACY.md           ← example-data policy
└── .github/             ← issue templates, contributing guide

Quick start

You need an AI coding agent that can read project instructions and run shell commands — Claude Code is the reference harness. The spec is harness-agnostic; anything that reads AGENTS.md works.

Clone this repo and read SPEC.md end to end. It is the contract; everything else serves it.
Open the folder in your agent. It will read CLAUDE.md → AGENTS.md and know the rules before you say anything.
Build the tools. Declare tool-building mode and point the agent at the build order in TOOLING.md §15. The first milestone is the linter running clean on example-archive/.
Start your own archive. Copy the structure, drop your first scan or note into inbox/, and ask the agent to process it.

See docs/GETTING_STARTED.md for the full walkthrough.

The documents

Document	Read it for
SPEC.md	The complete specification — what exists, how it lives on disk, and the rules that never bend. Start here.
TOOLING.md	How every tool is built, in enough detail to rewrite it from scratch. The `fha` command suite, the index schema, the linter rules.
AGENTS.md	What an AI agent may and may not do inside the archive — the contract, the operating modes, the workflows.
docs/GETTING_STARTED.md	A practical first-session walkthrough.
docs/GLOSSARY.md	Every term and ID type defined.
docs/FAQ.md	Why files, why not a database, why AI, how durable is this really.

Design principles

The archive is the source of truth; tools are replaceable.
Durable, plain formats. .md, .txt, .csv, .jsonl, .yaml, .jpg, .tiff, embedded IPTC/XMP.
Every important fact traces to a source. Uncited prose is story or context, never fact.
AI suggestions are not facts. They enter a review queue and stay there until a human accepts them.
Nothing generated is load-bearing. Index, search, trees, the website — all rebuildable from the files.
Folder location is for human browsing; metadata carries meaning.
Stay light. Long-term durability beats short-term convenience.

Status & roadmap

Current: spec v1.2 — milestone 1 complete.

The first implementation milestone is done: fha lint runs on the example archive with no errors. The intended build sequence (detailed in TOOLING.md §15):

Shared foundations (_lib: parsing, dates, ID grammar, path resolution)
fha id, fha index, fha lint, fha stubs — the substrate (milestone 1: lint clean on the example archive)
fha process, view generators, the photo index
The session report, cross-reference pass, person packets
The static-site generator and GEDCOM export
Web-capture companion for record intake

A complementary project

A related project worth studying: if your interest is the research half — autonomous AI research loops, archive guides for specific countries, prompt templates for pushing a family tree backward — see autoresearch-genealogy by Matt Prusak. It and Plainfile arrived independently at the same files-first, Claude-Code-driven philosophy from different angles: that project is a research playbook, this one is the filing system the findings live in. They complement each other well.

Contributing

This is an early-stage spec and feedback is genuinely useful — especially from genealogists and from anyone building the tools against it. See .github/CONTRIBUTING.md. Issues and discussion welcome.

License

MIT. The spec, the documents, and any code built from them are free to use, adapt, and build on.

This site is open source. Improve this page.