Learning Robust Diachronic Representations of Ancient Greek

Learning Robust Diachronic Representations of Ancient Greek Letterforms

John Pavlopoulos, Spyros Barbakos, Lavinia Ferretti, Dionysis Voulgarakis, Asimina Paparrigopoulou, Maria Konstantinidou, Giuseppe De Gregorio, Isabelle Marthot-Santaniello, Paraskevi Platanou, Holger Essler· June 25, 2026 View original

Summary

This research introduces methods and datasets for learning robust representations of ancient Greek letterforms that account for centuries of handwriting variation. It proposes a similarity-weighted supervised contrastive loss and lacuna-driven augmentation, enabling CNNs and ResNets to achieve strong recognition and interpretable embeddings for historical text analysis.

Analyzing ancient texts presents a significant challenge due to the vast temporal variations in handwriting styles over centuries. This paper tackles this problem by focusing on ancient Greek, one of the longest continuously used writing systems, to develop robust diachronic representation learning. The researchers introduce three new datasets: Hell-Char for training (3rd-1st centuries BCE) and two evaluation sets, PaLit-Char (2nd-5th c. CE) and Med-Char (9th-14th c. CE), to capture this temporal evolution. To overcome issues like symbolic variation, scarce data, and systematic degradation common in historical manuscripts, two key strategies are proposed. First, a similarity-weighted supervised contrastive loss is used to bias embeddings, dynamically estimating inter-class similarities to improve character separation. Second, a lacuna-driven augmentation scheme simulates realistic manuscript corruptions, making the models more resilient to noise. When trained with these innovative strategies, both a lightweight Convolutional Neural Network (CNN) and a pre-trained ResNet demonstrated strong recognition performance. They produced embeddings that more effectively separated character classes compared to traditional methods like PCA or generic pre-trained models. These interpretable embeddings facilitate tasks such as clustering, identifying stylistic subgroups, and visualizing the diachronic evolution of letterforms, offering a transferable paradigm for representation learning under challenging historical data conditions.

Why it matters

For digital humanities, historical research, and AI professionals working with rare or ancient texts, this research provides advanced tools to accurately digitize, analyze, and understand historical documents, unlocking new insights from previously inaccessible data.

How to implement this in your domain

1Apply similarity-weighted supervised contrastive loss to train models for character recognition in other historical or variable handwriting datasets.
2Develop lacuna-driven augmentation schemes tailored to specific types of document degradation in historical archives.
3Utilize the proposed embedding techniques for clustering and identifying stylistic subgroups in large collections of historical manuscripts.
4Collaborate with digital humanities experts to integrate these representation learning methods into tools for paleography and textual criticism.

Who benefits

Digital HumanitiesArchival ScienceEducationAI DevelopmentCultural Heritage

Key takeaways

Diachronic representation learning is crucial for analyzing ancient texts with varying handwriting.
New datasets for ancient Greek letterforms span centuries of variation.
Similarity-weighted contrastive loss and lacuna-driven augmentation improve robustness.
Resulting embeddings enable clustering, stylistic analysis, and visualization of letterform evolution.

Original post by John Pavlopoulos, Spyros Barbakos, Lavinia Ferretti, Dionysis Voulgarakis, Asimina Paparrigopoulou, Maria Konstantinidou, Giuseppe De Gregorio, Isabelle Marthot-Santaniello, Paraskevi Platanou, Holger Essler

"arXiv:2606.24984v1 Announce Type: new Abstract: Learning representations that remain robust across centuries of variation in handwriting is a key challenge in diachronic representation learning. Taking one of the longest continuously used writing systems, ancient Greek, as a case…"

View on X

Originally posted by John Pavlopoulos, Spyros Barbakos, Lavinia Ferretti, Dionysis Voulgarakis, Asimina Paparrigopoulou, Maria Konstantinidou, Giuseppe De Gregorio, Isabelle Marthot-Santaniello, Paraskevi Platanou, Holger Essler on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

Learning Robust Diachronic Representations of Ancient Greek Letterforms

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Research

VISReg Enhances JEPA Training with Novel Regularization

Margaret Atwood Criticizes AI for "Garbage In, Garbage Out" Flaw

Podcast Explores Large Test-Time Compute and AI Model Budgets