Natural-language, Voice-controlled Assistive System for Intelligent Buildings (ANVSIB)

Key facts

The ANVSIB project is funded by the Romanian Government through the Executive Agency for Higher Education, Research, Development and Innovation Funding (UEFISCDI), programme “Partnerships in priority areas”, “Collaborative Applied Research Projects”, project ID: PN-II-PT-PCCA-2013-4-0789, contract number 32/2014.

The project consortium is formed by three partners: (1) University Politehnica of Bucharest (UPB) through the Speech and Dialogue Research Laboratory, as Project Coordinator, (2) iWave Solutions (IWAVE), a Romanian IT&C company, as Partner #1 and (3) the Research Institute for Artificial Intelligence “Mihai Drăgănescu” (RACAI) from Romanian Academy, as Partner #2.

The UPB implementation team is formed by Lect. Horia Cucu (Project manager), Prof. Corneliu BurileanuLect. Andi Buzo, Lect. Lucian Petrică, Radu-Sebastian Marinescu, Mihai Dogariu, Alina Bănică and Florentina Mincă.

The IWAVE implementation team is formed by Marian Bădulescu (Project responsible for Partner #1), Adrian Preda, Mihai Carnu, Eduard Stanca, Ion Benegui, Bogdan Chira, Ştefan Moruzzi, Cristina Enache, Andreea Bădulescu and Ramona Bădulescu.

The RACAI implementation team is formed by Ştefan Dumitrescu (Project responsible for Partner #2), Tiberiu Boroş and Dan Tufiş.

The project started in July 2014 and is expected to be implemented by June 2016.


Summary

The main goal of this project is to create a Natural-language, Voice-controlled Assistive System for Intelligent Buildings. The resulting prototype from this project will be the proof-of-concept starting point for the implementation of voice-enabled assistive systems in homes, schools, hospitals or others.

As opposed to the standard smart room interface using buttons or software on static and mobile devices, the bidirectional voice-based interaction between a user and the smart room system that will be developed in this project brings a significant improvement in the quality of life, under many aspects. First, it is much more comfortable and natural for users to speak with/hear the system (speech is the most natural way of communication for humans). Hardware based interaction through static (e.g. wall-switches) and mobile devices (e.g. specialized software on smart-phones, tablets) will not be replaced, but will become the secondary means of control, needed to predefine complex scenarios. For elderly and disabled people, a voice interface may be the only way to interact with such a system, making it invaluable over traditional interaction methods. Furthermore, in sanitary environments such as hospitals, a voice interface brings significant health advantages, as users do not have to physically touch devices. Finally, a voice synthesis system is very important in case of emergencies. A simple alarm does not inform the users on the alarm cause (fire somewhere in the vicinity, water flooding due to a broken pipe, gas leak, etc.) and the required precautions, while a voice synthesis system could provide short, concise information on what happened and precise security instructions and may make a big difference in disaster scenarios.

To deploy a highly-scalable voice-enabled smart room, three main technical and scientific directions must be pursued: multilingual automatic speech recognition (ASR) in a smart room scenario, multilingual text-to-speech (TTS) synthesis and scalable, flexible hardware-software system integration. The above three challenges will be adequately distributed and approached by the three partners of the consortium:

  • University Politehnica of Bucharest, through the Speech and Dialogue research laboratory, with an extensive expertise in spoken language technology
  • iWave Solutions, a Romanian company which deploys IT&C hardware and software solutions since 2002, and
  • the Research Institute for Artificial Intelligence “Mihai Draganescu”, with an extensive expertise in TTS and natural language processing.

From the ASR point of view there are several scientific bottlenecks that were identified and will be addressed in this project: (1) robustness against noise, (2) distant speech recognition and (3) the accuracy of keyword spotting. An additional challenge for voice-controlled smart rooms is (4) the language dependence it creates, due to the fact that today’s speech recognition systems support only one language and moving to another language requires at least new acoustic, phonetic and language resources which are expensive to obtain. Hence, the solution is not scalable to other languages.

Multilingual speech synthesis presents scientific bottlenecks of its own. The following were identified and will be addressed in this project: (1) the synthesized voices must be natural, intelligible and pleasant (prosody performance will be addressed), (2) the system should be easily adaptable to other languages (multilingualism will be addressed), (3) speech resources for low-resourced languages (such as Romanian) have to be developed.

The envisaged end-products of this project are (1) a voice-controlled smart room prototype, (2) multilingual ASR for spoken term (command) detection functional model and (3) multilingual TTS functional model.


Project status

The first work-package was successfully finalized and the full technical report can be accessed here.

The second work-package was successfully finalized and the full technical report can be accessed here.


Publications

  • Valentin Andrei, Horia Cucu, Andi Buzo, Corneliu Burileanu, “Perception study inspired method for automatically detecting the number of competing speakers,” in University “Politehnica” of Bucharest Scientific Bulletin, Series C, accepted, in press.
  • Alexandru Caranica, Horia Cucu, Andi Buzo, Corneliu Burileanu, “Exploring Spoken Term Detection with a Robust Multi-Language Phone Recognition System,” in Rev. Téc. Ing. Univ. Zulia, vol. 38, no. 2, pp. 97-104, 2015, ISSN: 0254-0770.