Two nested challenges are proposed in the HBA competition:
1- Text/graphic separation
2- Font discrimination
The first challenge is interested in raising issues related only to how image analysis methods will perform for discriminating the textual content from the graphical ones. However, the second challenge will evaluate the capabilities of the participating methods to firstly distinguish between text and graphic, and secondly to separate the textual content according to different text fonts (e.g. lowercase, uppercase, italic, …). Indeed, the textual content can contain formatting such as many different typefaces and sizes which are associated to the structure level of the analyzed document. The logical level aims to interpret and recognize the different parts that compose a document image and specify the logical relationship between them (e.g. body text, legend, annotation, chapter title, …).
It is worth pointing out, however, that the defined classes in the ground truth have very different headcounts. Indeed, the textual content is predominant in monographs, compared to the graphical content. Moreover, among the textual content a great majority represent the body text while other character fonts are more marginal. This is compounded by the difficulty of the participating methods to perform on different types of content in historical books published at eras such as printed books from the 19th century or manuscripts from the 13th century.