
This is a project for generating an edition-specific OCR training file for Kraken for Theodorus Gaza’s Attic paraphrase of the Iliad. By using the facing pages of the Iliad edition that are printed in the some font, we can quickly generate ground truth which can then (it is hoped) be used to train a model which can accurately OCR the Attic paraphrase.



Iliad Pages

Paraphrase Pages

Transcribing some additional pages of the paraphrase itself may be more time-consuming, but will likely improve the generalization of the OCR training to the rest of the paraphrase pages: