Provenance

Know every record's origin, content, and future.

Archival OCR, HTR, and DACS-compliant description in one workflow — error-tolerant by design, traceable end to end.

The problem

OCR that catalogers can't trust is worse than no OCR.

Provenance is built for archival materials: damaged, multilingual, historical, and unlike the training set.

Arabic and Persian historical scripts

Naskh, Maghrebi, Ruqʿah, Nastaʿliq — per-line confidence, preserved diacritics, cataloger review.

Error tolerance by design

The first pass is wrong in known ways. Records are revisable without losing provenance of every edit.

DACS and EAD, not a lock-in schema

Output the standards your catalog already speaks.

What it does

Three stages, one trace.

01 · Capture

Ingest TIFF, PDF, and JPEG 2000. Layout analysis. Word-level bounding boxes preserved.

02 · Read

OCR and HTR with per-line confidence. Multi-script. Historical typography. Cataloger review on every confidence dip.

03 · Describe

DACS-compliant scope notes, biographical sketches, and container lists. Every field traceable to source.

Waitlist

Ready to pilot?

We take a small number of pilot institutions per quarter. Tell us about your collection.