OCR that catalogers can't trust is worse than no OCR.
Provenance is built for archival materials: damaged, multilingual, historical, and unlike the training set.
Arabic and Persian historical scripts
Naskh, Maghrebi, Ruqʿah, Nastaʿliq — per-line confidence, preserved diacritics, cataloger review.
Error tolerance by design
The first pass is wrong in known ways. Records are revisable without losing provenance of every edit.
DACS and EAD, not a lock-in schema
Output the standards your catalog already speaks.
Three stages, one trace.
01 · Capture
Ingest TIFF, PDF, and JPEG 2000. Layout analysis. Word-level bounding boxes preserved.
02 · Read
OCR and HTR with per-line confidence. Multi-script. Historical typography. Cataloger review on every confidence dip.
03 · Describe
DACS-compliant scope notes, biographical sketches, and container lists. Every field traceable to source.
Ready to pilot?
We take a small number of pilot institutions per quarter. Tell us about your collection.