Innovative Archives — Intelligence for living archives

Selected work

Representative engagements.

Regional archive

Arabic manuscript OCR pipeline

Rebuilt a historical Arabic handwriting recognition workflow after the first vendor's output proved untrusted by the cataloging team. Added per-line confidence, human-in-the-loop correction, and preserved original diacritics.

Collection size: ~120K folios
Scripts: Naskh, Maghrebi, Ruqʿah
Outcome: 3x cataloger throughput

University special collections

Oral history transcription at scale

Designed a time-coded transcription pipeline for decades of recorded interviews. Speaker diarization, custom vocabulary for regional dialect, and DACS-aligned finding aid integration.

Hours processed: 4,200+
Languages: English, Spanish, code-switching
Delivered: TEI XML + WAV fixity

Museum consortium

Cross-collection discovery layer

Built a unified search layer across five institutions' catalogs without forcing schema migration. Each institution kept its authority files; the layer translated at query time.

Collections: 5 institutions
Records: ~2.3M items
Schema: MARC, EAD, Dublin Core bridged

Corporate archive

Brand heritage digitization

Digitized fifty years of product packaging, advertisements, and internal memos. Metadata generation tuned for marketing vocabulary while preserving archival description standards.

Items: ~80K
Formats: Print, slide, 16mm
Handoff: DAM + Dublin Core

Government records office

Multilingual records migration

Migrated forty years of bilingual administrative records from proprietary format to open standards. Error-tolerant OCR with cataloger review on every confidence dip.

Records: ~1.8M pages
Languages: Arabic + English
Standard: OAIS-aligned

Academic library

Early-modern print layout analysis

Layout analysis for early-modern printed books with marginalia, running heads, and irregular pagination. Paleographer-reviewed output fed a critical-edition workflow.

Volumes: ~3,400
Period: 1480–1750
Output: TEI-P5 XML

What's next

Let's talk about your collection.

Start a project

Anonymized projects from twenty years of archival practice.