Volume 18, 2019
A Web-based Semantic Navigation System for Migne’s Patrologia Graeca based on OCR extracted Page and Volume Numbers from the Table of Contents of Dorotheos Scholarios
Authors: Evagelos Varthis, Marios Poulos, Ilias Giarenis, Sozon Papavlasopoulos
Abstract: In this paper, the prototype of a new tool is presented for the navigation of a 19th century collection of Greek authors. This collection is published by Jacques Paul Migne and it is known today as Patrologia Graeca (PG). The project aspires to interconnect this vast amount of about 120000 scanned pages with the scanned Table of Contents (TOC) published by D.Scholarios in 1879. The D.Scholarios’s work contain summaries for the chapters and sub-chapters of PG, having next to them the corresponding volume and page number of the location in the PG. Using Optical Character Recognition (OCR) and pattern recognition techniques, we extract from D.Scholarios’s work the appropriate information in order to create links to the specific pages of PG. Our aim is to provide a Web Interface in which D.Scholarios’s work is used as a semantic compass for PG about the subjects it covers. The complete system consists by three main sections. A REST API backbone service for the scanned images of PG. OCR and pattern recognition techniques for extracting the volume and the page information from the scanned pages of D.Scholarios. A Web interface presenting the TOC by D.Scholarios with the appropriate functionality. The originality of our system lies in the interconnection of two different scanned texts for semantic enrichment and browsing convenience, especially if one is nearly 120000 pages and the other about 600 pages.
