Project Details

PERO - Pokročilá extrakce a rozpoznávání obsahu tištěných a rukou psaných digitalizátů pro zvýšení jejich přístupnosti a využitelnosti

Project Period: 1. 3. 2018 - 31. 12. 2022

Project Type: grant

Code: DG18P02OVV055

Agency: Ministry of Culture Czech Republic

Program: Program na podporu aplikovaného výzkumu a experimentálního vývoje národní a kulturní identity na léta 2016 až 2022 (NAKI II)

English title
Advanced content extraction and recognition for printed and handwritten documents for better accessibility and usability
Type
grant
Keywords

Optical character recognition, handwriting recognition, natural language processing, quality enhancement, language model, convolutional neural networks recurrent neural networks

Abstract

The project aims to create technology and tools which would improve accessibility of digitized historic documents. These tools, based on state of the art methods from computer vision, machine learning and language modeling, will enable existing digital archives and libraries to provide full-text search and content extraction for low quality historic printed and all hand written documents - which can not be automatically processed by the currently available tools. The project extends automation and capabilities of digitization pipeline by providing tools for automated quality assessment and control, quality improvement, automated text transcription of historic printed documents, semi-automated hand written text transcription, and automatic extraction of semantic information from semi-structured documents (e.g. library catalogs and birth records). The created tools and techniques will be validated by processing selected collections of digitized materials and by a pilot operation by cooperation with Moravian Library.

Team members
Smrž Pavel, doc. RNDr., Ph.D. (DCGM FIT BUT) , research leader
Bařina David, Ing., Ph.D. (DCGM FIT BUT) , team leader
Hradiš Michal, Ing., Ph.D. (DCGM FIT BUT) , team leader
Juránek Roman, Ing., Ph.D. (DCGM FIT BUT) , team leader
Zemčík Pavel, prof. Dr. Ing. (DCGM FIT BUT) , team leader
Beneš Karel, Ing. (DCGM FIT BUT)
Hájková Gabriela, Mgr. (DEAN FIT BUT)
Hříbek David, Ing. (DCGM FIT BUT)
Kodym Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Kopeczinski Daniela, Mgr. (DEAN FIT BUT)
Publications

2022

2021

2020

Products

2022

2021

2020

2019

Back to top