Natural-language, Voice-controlled Assistive System for Intelligent Buildings (ANVSIB)

Key facts

The ANVSIB project is funded by the Romanian Government through the Executive Agency for Higher Education, Research, Development and Innovation Funding (UEFISCDI), programme “Partnerships in priority areas”, “Collaborative Applied Research Projects”, project ID: PN-II-PT-PCCA-2013-4-0789, contract number 32/2014.

The project consortium is formed by three partners: (1) University Politehnica of Bucharest (UPB) through the Speech and Dialogue Research Laboratory, as Project Coordinator, (2) iWave Solutions (IWAVE), a Romanian IT&C company, as Partner #1 and (3) the Research Institute for Artificial Intelligence “Mihai Drăgănescu” (RACAI) from Romanian Academy, as Partner #2.

UPB implementation team is formed by Lect. Horia Cucu (Project manager), Prof. Corneliu Burileanu, Lect. Andi Buzo, Lucian Georgescu, Alexandru Caranica, Lect. Lucian Petrică, Radu-Sebastian Marinescu, Mihai Dogariu, Alina Bănică and Florentina Mincă.

IWAVE implementation team is formed by Marian Bădulescu (Project responsible for Partner #1), Adrian Preda, Mihai Carnu, Eduard Stanca, Ion Benegui, Bogdan Chira, Ştefan Moruzzi, Cristina Enache, Andreea Bădulescu and Ramona Bădulescu.

RACAI implementation team is formed by Ştefan Dumitrescu (Project responsible for Partner #2), Tiberiu Boroş and Dan Tufiş.

The project started in July 2014 and was completed by September 2017.

Summary

The main goal of this project is to create a Natural-language, Voice-controlled Assistive System for Intelligent Buildings. The resulting prototype from this project will be the proof-of-concept starting point for the implementation of voice-enabled assistive systems in homes, schools, hospitals or others.

As opposed to the standard smart room interface using buttons or software on static and mobile devices, the bidirectional voice-based interaction between a user and the smart room system that will be developed in this project brings a significant improvement in the quality of life, under many aspects. First, it is much more comfortable and natural for users to speak with/hear the system (speech is the most natural way of communication for humans). Hardware based interaction through static (e.g. wall-switches) and mobile devices (e.g. specialized software on smart-phones, tablets) will not be replaced, but will become the secondary means of control, needed to predefine complex scenarios. For elderly and disabled people, a voice interface may be the only way to interact with such a system, making it invaluable over traditional interaction methods. Furthermore, in sanitary environments such as hospitals, a voice interface brings significant health advantages, as users do not have to physically touch devices. Finally, a voice synthesis system is very important in case of emergencies. A simple alarm does not inform the users on the alarm cause (fire somewhere in the vicinity, water flooding due to a broken pipe, gas leak, etc.) and the required precautions, while a voice synthesis system could provide short, concise information on what happened and precise security instructions and may make a big difference in disaster scenarios.

To deploy a highly-scalable voice-enabled smart room, three main technical and scientific directions must be pursued: multilingual automatic speech recognition (ASR) in a smart room scenario, multilingual text-to-speech (TTS) synthesis and scalable, flexible hardware-software system integration. The above three challenges will be adequately distributed and approached by the three partners of the consortium:

University Politehnica of Bucharest, through the Speech and Dialogue research laboratory, with an extensive expertise in spoken language technology
iWave Solutions, a Romanian company which deploys IT&C hardware and software solutions since 2002, and
the Research Institute for Artificial Intelligence “Mihai Draganescu”, with an extensive expertise in TTS and natural language processing.

From the ASR point of view there are several scientific bottlenecks that were identified and will be addressed in this project: (1) robustness against noise, (2) distant speech recognition and (3) the accuracy of keyword spotting. An additional challenge for voice-controlled smart rooms is (4) the language dependence it creates, due to the fact that today’s speech recognition systems support only one language and moving to another language requires at least new acoustic, phonetic and language resources which are expensive to obtain. Hence, the solution is not scalable to other languages.

Multilingual speech synthesis presents scientific bottlenecks of its own. The following were identified and will be addressed in this project: (1) the synthesized voices must be natural, intelligible and pleasant (prosody performance will be addressed), (2) the system should be easily adaptable to other languages (multilingualism will be addressed), (3) speech resources for low-resourced languages (such as Romanian) have to be developed.

The envisaged end-products of this project are (1) a voice-controlled smart room prototype, (2) multilingual ASR for spoken term (command) detection functional model and (3) multilingual TTS functional model.

Project status

The first work-package was successfully finalized and the full technical report can be accessed here.

The second work-package was successfully finalized and the full technical report can be accessed here.

The third work-package was successfully finalized and the full technical report can be accessed here.

The fourth work-package was successfully finalized and the full technical report can be accessed here.

The project was successfully completed in September 2017.

Publications

Valentin Andrei, Horia Cucu, Andi Buzo, Corneliu Burileanu, “Perception study inspired method for automatically detecting the number of competing speakers,” in University “Politehnica” of Bucharest Scientific Bulletin, Series C, issue 4, pp. 131-142, Bucharest, Dec 2015, ISSN: 2286-3540.
Alexandru Caranica, Horia Cucu, Andi Buzo, Corneliu Burileanu, “Exploring Spoken Term Detection with a Robust Multi-Language Phone Recognition System,” in Rev. Téc. Ing. Univ. Zulia, vol. 38, no. 2, pp. 97-104, 2015, ISSN: 0254-0770.
Valentin Andrei, Horia Cucu, Lucian Petrică, “Considerations on Developing a Chainsaw Intrusion Detection and Localization System for Preventing Unauthorized Logging,” in Journal of Electrical and Electronic Engineering , vol. 3, issue 6, pp. 202-207, Dec 2015, ISSN: 2329-1613, doi:10.11648/j.jeee.20150306.15.
Horia Cucu, Andi Buzo, Corneliu Burileanu, “The SpeeD Grammar-based ASR System for the Romanian Language,” in Romanian Journal of Information Science and Technology, vol. 18, no. 1, pp. 33-53, Jan 2015, ISSN: 1453-8245.
Mihai Dogariu, Horia Cucu, Andi Buzo, Dragoş Burileanu, Octavian Fratu, “Speech Database Acquisition for Assisted Living Environment Applications,” in the Proceedings of the 8th Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest, 2015, pp. 191-196, ISBN 978-1-4673-7559-7.
Mihai Dogariu, Horia Cucu, Andi Buzo, Dragoş Burileanu, Octavian Fratu, “Speech Applications in the eWALL Project,” in the Proceedings of the 8th Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest, 2015, pp. 197-204, ISBN 978-1-4673-7559-7.
Tiberiu Boroş, Ştefan Daniel Dumitrescu, “Robust deep-learning models for text-to-speech synthesis support on embedded devices,” in the Proceedings of the 7th International Conference on Management of computational and collective intElligence in Digital EcoSystems (MEDES), Caraguatatuba/Sao Paulo, Brazil, 2015, pp. 98-102.
Alexandru Caranica, Horia Cucu, Andi Buzo, Corneliu Burileanu, “Survey on Multilingual Spoken Term Detection,” submitted to Romanian Journal of Information Science and Technology.
Alexandru Caranica, Horia Cucu, Corneliu Burileanu, François Portet, Michel Vacher, “Speech Recognition Results for Voice-controlled Assistive Applications,” in the Proceedings of the 9th Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest, 2017, 8p, ISBN 978-1-5090-6496-0.
Tiberiu Boroş, Ştefan Daniel Dumitrescu, “A “Small-Data” – Driven Approach to Dialogue Systems for Natural Language Human Computer Interaction,” in the Proceedings of the 9th Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest, 2017, ISBN 978-1-5090-6496-0.
Ştefan Daniel Dumitrescu, “Cassandra Smart-Home System Description,” in the Proceedings of the 9th Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest, 2017, ISBN 978-1-5090-6496-0.
Ştefan Daniel Dumitrescu, Tiberiu Boroş and Dan Tufiş, “RACAI’s Natural Language Processing pipeline for Universal Dependencies,” in the Proceedings the SIGNLL Conference on Computational Natural Language Learning, Vancouver, Canada, August 2017.