HBA dataset

The dataset associated to this competition which is called the HBA dataset have been collected from the French digital library Gallica.

The HBA dataset is composed of 4436 real scanned ground-truthed one-page historical document images from 11 books (5 manuscripts and 6 printed books) in different languages and scripts published between the 13th and 19th centuries.

The following figures illustrate samples of pages for each book in the HBA dataset.

The links associated to the different “Book Id.” correspond to the URL links pointing to the selected historical books in the French digital library Gallica (only low resolution images are publicly available online).