Research Projects

Current Research Projects


Natural-language, Voice-controlled Assistive System for Intelligent Buildings (ANVSIB)

The ANVSIB project is funded by the Romanian Government through UEFISCDI, programme “Partnerships in priority areas”. The project consortium is formed by three partners: (1) University Politehnica of Bucharest through the Speech and Dialogue Research Laboratory, as Project Coordinator, (2) iWave Solutions, a Romanian IT&C company, as Partner #1 and (3) the Research Institute for Artificial Intelligence “Mihai Drăgănescu” from Romanian Academy, as Partner #2. The project started in July 2014 and is expected to be implemented by June 2016.

The main goal of this project is to create a Natural-language, Voice-controlled Assistive System for Intelligent Buildings. The resulting prototype from this project will be the proof-of-concept starting point for the implementation of voice-enabled assistive systems in homes, schools, hospitals or others. Read more


Phonetic Analysis of the Romanian Language: study and software applications (AFLR)

The AFLR project is funded by the Romanian Government through UEFISCDI, programme “Partnerships in priority areas”. The project consortium is formed by three partners: (1) Softwin Research, as Project Coordinator, (2) University Politehnica of Bucharest through the Speech and Dialogue Research Laboratory, as Partner #1 and (3) the Institute of Linguistics “Iorgu Iordan – Al. Rosetti” from Romanian Academy, as Partner #2. The project started in October 2014 and is expected to be implemented by June 2016.

The AFLR project will integrate the results of previous work (existing linguistic knowledge bases, linguistic tools and applications for exploiting linguistic data, etc.) in order to develop various products with scientific and commercial value: (1)  Phonetic Study for Romanian Language starting from the already existing linguistic data written in GRAALAN, (2) Romanian Morphological and Phonetic Dictionary, (3) The Phonetic Dictionary of Romanian Syllables, (4) Application of Speech Recognition for Romanian Language.


Automatic Baby-Language Recognition System (SPLANN)

The SPLANN project is funded by the Romanian Government through UEFISCDI, programme “Partnerships in priority areas”. The project consortium is formed by three partners: (1) Softwin Research, as Project Coordinator, (2) University Politehnica of Bucharest through the Speech and Dialogue Research Laboratory, as Partner #1 and (3) the Emergency Clinical Hospital “Sf. Pantelimon”, as Partner #2. The project started in October 2014 and is expected to be implemented by June 2016.

The SPLANN project aims to design and develop an automatic infant crying recognition system, linking neonatal knowledge with signal processing and pattern recognition methods. The goal is to obtain technologies, legally protected by patents, with a high degree of future applicability in health, child care and computer science, with real chances of being successfully exploited on the market.


Speech Processing in a Smart-Car Environment

The future smart-car will be equipped with many audio and speech processing systems performing spoken-commands recognition, voice-based driver authentication, text-to-speech synthesis, etc. In the context of SpeeD’s group expertise in the field, this project aims to develop such artificial intelligence systems for smart-cars. Read more


Enhanced Text to Speech (TTS) synthesis in Romanian

Text-to-speech synthesis has been an important area of research for SpeeD in the last 15 years. Several versions of a Romanian language TTS system were built successively in order to improve the performance of different constituent modules and consequently enhance the quality of the system. The main work was split in two different directions and a successive number of achievements has been accomplished, regarding both the Natural Language Processing (NLP) stage and speech generation techniques.

Currently, the system’s most important NLP sub-stages are: diacritic restoration, preprocessing and normalization (including acronym/ abbreviation, proper name, and sentence boundary detection), syllabification, letter-to-phone conversion, lexical stress positioning, and prosody prediction. Our team made great efforts to improve continuously all the NLP modules, by using those methods which can lead to the best possible results, and at the same time to increase the base of linguistics resources in Romanian for the TTS purpose. Another issue that is presently being approached is related to the developing of a new efficient prosody model for the NLP stage.

Also, we are developing two different speech engines: the first one uses a classic TD-PSOLA algorithm and is based on acoustic segment concatenation and multiple instances of non-uniform speech units (diphones and polyphones – to solve a number of difficult vowel-semivowel transitions), labeled (off-line) according to contextual and phonetico-prosodic information from the recorded speech corpus; the system uses a two-stage unit selection procedure for speech signal generation. The second uses a statistical parametric synthesis technique based on Hidden Markov Models (HMMs).


Enhanced Large Vocabulary Continuous Speech Recognition (LVCSR) for Romanian

Although Automatic Speech Recognition (ASR), i.e. transforming a speech signal into text, has been an important research direction since the 70’s, current academic and commercial systems are truly efficient only in specific conditions: medium vocabularies, small speech/speaker variability, lack of background noise, etc. For high-resourced languages, such as English, French, Mandarin, the performance of LVCSR systems in ideal conditions is much higher than for other languages which were disadvantaged by the lack of resources and the small number of speech researchers, such as Romanian.

Since 2008, SpeeD has started an intense research effort aiming to develop the first LVCSR system for Romanian. A prototype of this system was released in October 2011 and is available online ever since. Our current objective is to enhance this speech recognition system in order to improve its recognition rate and make it more robust to speech/speaker variability, background noise, etc.


Spoken Term Detection (STD) for under-resourced languages

Spoken Term Detection is a relatively new research direction (introduced in 2006) that aims at finding spoken content within a speech database by using a spoken query. STD systems are useful especially for under-resourced languages for which no phonetic dictionaries are available. In 2012 SpeeD participated at the Spoken Web Search competition (part of the MediaEval Benchmarking Initiative) and created its first STD system. Since then, we continued the research in this direction with the intention to improve the current performance of our STD system and also participate at the 2013 competition.

Our current approach involves adapting our Romanian ASR system to any other under-resourced language and then performing ASR for both query and speech database. Once the speech data is converted into text, the problem becomes a text searching one. Further on, the main difficulty is created by the inaccuracy of the ASR system. Hence, one have to deal with searching an approximate text query into an approximate text database. It is obvious that a higher accuracy of the ASR system yields to higher search performance. The project aims at building accurate acoustic models for under-resourced languages and providing efficient searching algorithms in approximate text databases.


Speaker recognition system

Speaker recognition is a generic name used for two distinct applications: speaker verification and speaker identification. Speaker verification systems have to decide whether a speech utterance belongs to a claimed speaker or not. Speaker identification systems are required to find out which is the speaker that uttered a given speech signal. In both cases the speakers characteristics are usually modelled with statistical models. The most common techniques are based on Gaussian Mixture Models (GMM). Common speech features, such as MFCC, PLP coefficients, etc., are used for modelling the acoustic characteristics of the speakers. The objective of the project is to build a GMM-based speaker identification system with state-of-the-art performances.