Project Details
DARPA Low Resource Languages for Emergent Incidents (LORELEI) - Exploiting Language Information for Situational Awareness (ELISA)
Project Period: 1. 9. 2015 - 31. 3. 2020
Project Type: contract
Code: HR001115C0115
Partner: University of Southern California
Speech processing, language, apeech mining
Speech processing in our proposal will be addressed by low-resource or language-agnostic technologies. Rather than concentrating on mining the content (for which, obviously, standard resources such as acoustic model, language model or pronunciation dictionary will be lacking), speech data will be handled by a multitude of "speech miners" that make minimum use of resources of the target language. The processing will begin with a reliable voice activity detection (VAD) capable of segmenting the signal into useful and useless portions. Often regarded as "not a rocket science", a good VAD is crucial for correct functioning of the following blocks and for human processing of speech input. Our work will improve on existing DNN-based VAD that proved its efficiency in a difficult RATS setting [Ng2012]. A processing with several phone posterior estimators with either mono-lingual or multilingual phoneme sets [Schwarz2009] will follow to provide the "miners" with a coherent low-dimensional representation. The first real "miner" will be language identification (LID) with a significant set of target languages (>60). Even if it is not sure that the target language will be in this set, LID will allow to detect segments in English or possibly in other languages for which we have ASR technology. We will follow our recent development of LID base on features derived from phone posteriors [Plchot2013] as well as on DNNs. We will also work on enrollment of a new language with very little data (down to one utterance). Another "miner" will perform basic speaking style recognition allowing to separate read speech from spontaneous. Finally, speaker recognition (SRE) or clustering will allow to gather information about speakers (in case they were previously enrolled) or at least to perform coarse speaker clustering, as for the analyst, the information on who is speaking can be equally important as what is said. Here, we will build up on our significant track in iVector-based SRE and will mainly work on automatic adaptation and calibration on unlabeled data-sets [Brummer2014]
Černocký Jan, prof. Dr. Ing. (UPGM FIT VUT) , team leader
Matějka Pavel, Ing., Ph.D. (UPGM FIT VUT) , team leader
Szőke Igor, Ing., Ph.D. (UPGM FIT VUT) , team leader
Beneš Karel, Ing. (UPGM FIT VUT)
Fér Radek, Ing. (UPGM FIT VUT)
Glembek Ondřej, Ing., Ph.D. (UPGM FIT VUT)
Kocour Martin, Ing. (UPGM FIT VUT)
Ondel Yang Lucas Antoine Francois, Mgr., Ph.D. (UPGM FIT VUT)
Skácel Miroslav, Ing. (UPGM FIT VUT)
Žmolíková Kateřina, Ing., Ph.D. (UPGM FIT VUT)
2019
- ALAM Jahangir, BOULIANNE Gilles, GLEMBEK Ondřej, LOZANO Díez Alicia, MATĚJKA Pavel, MIZERA Petr, MONTEIRO Joao, MOŠNER Ladislav, NOVOTNÝ Ondřej, PLCHOT Oldřich, ROHDIN Johan A., SILNOVA Anna, SLAVÍČEK Josef, STAFYLAKIS Themos, WANG Shuai and ZEINALI Hossein. ABC NIST SRE 2019 CTS System Description. In: Proceedings of NIST. Sentosa, Singapore: National Institute of Standards and Technology, 2019, pp. 1-6. Detail
- MATĚJKA Pavel, PLCHOT Oldřich, ZEINALI Hossein, MOŠNER Ladislav, SILNOVA Anna, BURGET Lukáš, NOVOTNÝ Ondřej and GLEMBEK Ondřej. Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge. In: Proceedings of Interspeech. Graz: International Speech Communication Association, 2019, pp. 2448-2452. ISSN 1990-9772. Detail
- BASKAR Murali K., WATANABE Shinji, ASTUDILLO Ramon, HORI Takaaki, BURGET Lukáš and ČERNOCKÝ Jan. Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text. In: Proceedings of Interspeech. Graz: International Speech Communication Association, 2019, pp. 3790-3794. ISSN 1990-9772. Detail
2018
- ALAM Jahangir, BHATTACHARYA Gautam, BRUMMER Johan Nikolaas Langenhoven, BURGET Lukáš, DIEZ Sánchez Mireia, GLEMBEK Ondřej, KENNY Patrick, KLČO Michal, LANDINI Federico Nicolás, LOZANO Díez Alicia, MATĚJKA Pavel, MONTEIRO Joao, MOŠNER Ladislav, NOVOTNÝ Ondřej, PLCHOT Oldřich, PROFANT Ján, ROHDIN Johan A., SILNOVA Anna, SLAVÍČEK Josef, STAFYLAKIS Themos and ZEINALI Hossein. ABC NIST SRE 2018 SYSTEM DESCRIPTION. In: Proceedings of 2018 NIST SRE Workshop. Athens: National Institute of Standards and Technology, 2018, pp. 1-10. Detail
- WIESNER Matthew, LIU Chunxi, ONDEL Yang Lucas Antoine Francois, HARMAN Craig, MANOHAR Vimal, TRMAL Jan, HUANG Zhongqiang, DEHAK Najim and KHUDANPUR Sanjeev. Automatic Speech Recognition and Topic Identification for Almost-Zero-Resource Languages. In: Proceedings of Interspeech. Hyderabad: International Speech Communication Association, 2018, pp. 2052-2056. ISSN 1990-9772. Detail
- PULUGUNDLA Bhargav, BASKAR Murali K., KESIRAJU Santosh, EGOROVA Ekaterina, KARAFIÁT Martin, BURGET Lukáš and ČERNOCKÝ Jan. BUT system for low resource Indian language ASR. In: Proceedings of Interspeech 2018. Hyderabad: International Speech Communication Association, 2018, pp. 3182-3186. ISSN 1990-9772. Detail
- BENEŠ Karel, KESIRAJU Santosh and BURGET Lukáš. i-vectors in language modeling: An efficient way of domain adaptation for feed-forward models. In: Proceedings of Interspeech 2018. Hyderabad: International Speech Communication Association, 2018, pp. 3383-3387. ISSN 1990-9772. Detail
2017
- LIU Chunxi, YANG Jinyi, SUN Ming, KESIRAJU Santosh, ROTT Alena, ONDEL Yang Lucas Antoine Francois, GHAHREMANI Pegah, DEHAK Najim, BURGET Lukáš and KHUDANPUR Sanjeev. An Empirical evaluation of zero resource acoustic unit discovery. In: Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017, pp. 5305-5309. ISBN 978-1-5090-4117-6. Detail
- HANNEMANN Mirko, TRMAL Jan, ONDEL Yang Lucas Antoine Francois, KESIRAJU Santosh and BURGET Lukáš. Bayesian joint-sequence models for grapheme-to-phoneme conversion. In: Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017, pp. 2836-2840. ISBN 978-1-5090-4117-6. Detail
- ONDEL Yang Lucas Antoine Francois, BURGET Lukáš, ČERNOCKÝ Jan and KESIRAJU Santosh. Bayesian phonotactic language model for acoustic unit discovery. In: Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017, pp. 5750-5754. ISBN 978-1-5090-4117-6. Detail
- GLEMBEK Ondřej. Summary report for project Exploiting Language Information for Situational Awareness (ELISA) For year 2017. Brno: University of Southern California, 2017. Detail
- PAPADOPOULOS Pavlos, TRAVADI Ruchir, VAZ Colin, MALANDRAKIS Nikolaos, HERMJAKOB Ulf, POURDAMGHANI Nima, PUST Michael, ZHANG Boliang, PAN Xiaoman, LU Di, LIN Ying, GLEMBEK Ondřej, BASKAR Murali K., KARAFIÁT Martin, BURGET Lukáš, HASEGAWA-JOHNSON Mark, JI Heng, MAY Jonathan, KNIGHT Kevin and NARAYANAN Shrikanth. Team ELISA System for DARPA LORELEI Speech Evaluation 2016. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, pp. 2053-2057. ISSN 1990-9772. Detail
- KESIRAJU Santosh, PAPPAGARI Raghavendra, ONDEL Yang Lucas Antoine Francois, BURGET Lukáš, DEHAK Najim, KHUDANPUR Sanjeev, ČERNOCKÝ Jan and GANGASHETTY Suryakanth V. Topic identification of spoken documents using unsupervised acoustic unit discovery. In: Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017, pp. 5745-5749. ISBN 978-1-5090-4117-6. Detail
2016
- KESIRAJU Santosh, BURGET Lukáš, SZŐKE Igor and ČERNOCKÝ Jan. Learning document representations using subspace multinomial model. In: Proceedings of Interspeech 2016. San Francisco: International Speech Communication Association, 2016, pp. 700-704. ISBN 978-1-5108-3313-5. Detail
- GLEMBEK Ondřej. Summary report for project Exploiting Language Information for Situational Awareness (ELISA) For year 2016. Brno: University of Southern California, 2016. Detail
2015
- GLEMBEK Ondřej, KESIRAJU Santosh and ONDEL Yang Lucas Antoine Francois. Summary report for project "ELISA" in Year 2015. Brno: University of Southern California, 2015. Detail