HBA Competition

In conjunction with the 15^th IAPR International Conference on Document Analysis and Recognition ICDAR’19, the second edition of the HISTORICAL BOOK ANALYSIS COMPETITION (HBA) will be organized. The HBA competition will address a thriving topic of major interest of many researchers in different fields including (historical) document image analysis, image processing, pattern recognition and classification.

The HBA competition will provide a large experimental corpus and a thorough evaluation protocol to ensure a consistent comparison of image processing methods for historical document image analysis.

A challenging dataset which is called the HBA 1.0 dataset will be used at this occasion. The HBA 1.0 dataset is composed of 4,436 real scanned ground truthed historical document images from 11 books (5 manuscripts and 6 printed books) in different languages and scripts published between the 13^th and 19^th centuries. The HBA 1.0 dataset contains 2,435 and 2,001 manuscript and printed pages, respectively. It has been ground-truthed by annotating each foreground pixel. The ground truth information is currently available at pixel level. The ground truth of the HBA 1.0 dataset contains more than 7,58 billion annotated pixels.

Two nested challenges are proposed in the HBA competition. Firstly, the HBA competition will aim at evaluating how image analysis methods could discriminate the textual content from the graphical ones at pixel level. Secondly, it will aim at assessing the capabilities of the participating methods to separate the textual content according to different text fonts (e.g. lowercase, uppercase, italic, …) at pixel level.