Modern stack for old books: Engineering a digital library for Southeast Asian history
Spreading knowledge and facts in the current era of post-truth, short-form content and deep fakes is more important than ever. History and archives help us deal with the flood of sloppy and misleading information. But working with real archives comes with specific difficulties: large volumes of data, useful information hidden in noise, poor scan quality, handwritten texts that are hard to read.
In this talk, I will explain how we try to solve these problems by using modern technologies while building GRAC LAI, a digital library focused on documents related to the history of Southeast Asia.
We will cover original ways of getting a high-level view of thousands of images, automatic metadata generation for text and images, recent progress in handwritten text recognition models, the use of GPUs in the cloud to speed up processing and the platform to publish and share these archives with the general public.