Work

Anonymized projects from twenty years of archival practice.

Every engagement starts with the records as they are — damaged, multilingual, and unlike anything in the training set.

Selected work

Representative engagements.

Regional archive

Arabic manuscript OCR pipeline

Rebuilt a historical Arabic handwriting recognition workflow after the first vendor's output proved untrusted by the cataloging team. Added per-line confidence, human-in-the-loop correction, and preserved original diacritics.

Collection size
~120K folios
Scripts
Naskh, Maghrebi, Ruqʿah
Outcome
3x cataloger throughput
University special collections

Oral history transcription at scale

Designed a time-coded transcription pipeline for decades of recorded interviews. Speaker diarization, custom vocabulary for regional dialect, and DACS-aligned finding aid integration.

Hours processed
4,200+
Languages
English, Spanish, code-switching
Delivered
TEI XML + WAV fixity
Museum consortium

Cross-collection discovery layer

Built a unified search layer across five institutions' catalogs without forcing schema migration. Each institution kept its authority files; the layer translated at query time.

Collections
5 institutions
Records
~2.3M items
Schema
MARC, EAD, Dublin Core bridged
Corporate archive

Brand heritage digitization

Digitized fifty years of product packaging, advertisements, and internal memos. Metadata generation tuned for marketing vocabulary while preserving archival description standards.

Items
~80K
Formats
Print, slide, 16mm
Handoff
DAM + Dublin Core
Government records office

Multilingual records migration

Migrated forty years of bilingual administrative records from proprietary format to open standards. Error-tolerant OCR with cataloger review on every confidence dip.

Records
~1.8M pages
Languages
Arabic + English
Standard
OAIS-aligned
Academic library

Early-modern print layout analysis

Layout analysis for early-modern printed books with marginalia, running heads, and irregular pagination. Paleographer-reviewed output fed a critical-edition workflow.

Volumes
~3,400
Period
1480–1750
Output
TEI-P5 XML
What's next

Let's talk about your collection.