Speech Processing in a Smart-Car Environment

The future smart-car will be equipped with many audio and speech processing systems performing spoken-commands recognition, voice-based driver authentication, text-to-speech synthesis, etc. In the context of SpeeD’s group expertise in the field, this project aims to develop such artificial intelligence systems for smart-cars.


Objective #1. Spoken Command Recognition System. One particularity of driving a car is the fact that the driver must be focused on the driving and should not be distracted by the interaction with other auxiliary in-car systems such as the climate control system, the GPS system, the radio, his smart-phone, etc. In this context, the perfect interaction method with these other in-car systems is through voice. Voice-based interaction is hands-free and eyes-free and practically allows the driver to focus on driving. Even in the case of smart, self-driving cars, voice-based interaction with the smart-car is the most natural communication option for the driver. Voice-based interaction can be achieved through speech command recognition, speech understanding, response generation and text-to-speech synthesis.

Objective #2. Voice-based Driver Authentication System. The scenarios in which (i) the driver is accompanied inside the car by several other passengers or (ii) a car is shared between several drivers are very usual. In this context, a driver authentication system for the smart car is indispensable. In the first scenario, the driver authentication system enables the smart car to respond only to the driver’s commands and ignore any commands (or at least some critical commands) uttered by the passengers. In the second scenario, the driver authentication system enables the smart car to offer every driver (i) a customized driving experience (favorite music or radio stations, customized seat and mirrors adjustments, ex-GPS locations, etc.) and (ii) access to personal data (emails, social media accounts, etc.).

Objective #3. Text-to-Speech Synthesis (TTS) System. A natural communication between the driver and the smart-car involves both parties speaking. The text-to-speech synthesis system is responsible for synthesizing the smart-car’s answers to the driver’s requests and any smart-car’s warnings (seat-belt unfastened, headlights on, etc.).

Objective #4. Interactive Voice Response (IVR) System. A smart-car should be accessible from outside the vehicle also. Typically, a smart-car is always connected to the Internet and thus it can be remotely controlled using online web interfaces, smart-phone apps or phone-calls. A natural way of interacting with the smart-car is by telephoning it and uttering requests, similarly to the way the driver commands it while in the vehicle. This way of remote-controlling the car can be achieved through an interactive voice response system that allows the caller to navigate by voice through an option menu and choose the desired commands (e.g. start the heating system or come and pick me up from work). The IVR system would use the voice-authentication, spoken command recognition and TTS synthesis systems.

Objective #5. Speech Acquisition in a Smart Car Environment. Speech acquisition is an important issue in the design of an in-vehicle voice interaction system. The actual choice of microphones and their placement inside the car directly influences the performance of all the speech-based systems. Several microphones and microphone arrays are required to separate the driver’s speech signal from the passengers speech signals (when several people are speaking simultaneously), from the various known, permanent sources of noise (engine, outside traffic, radio, etc.), and finally from other sources of noise (honks, passengers noise, etc.).

Objective #6. Speech Signal Enhancement vs. Noise Reduction Techniques. The in-car environment is in itself a noisy environment. Various known, permanent sources of noise, such as the engine, the outside traffic, the radio, etc., and non-permanent noises, such as honks, passengers noise, etc. coexist inside a car. These unwanted audio signals need to be filtered-out before the actual speech commands uttered by the driver can be processed further. Moreover, the passengers speech, a type of “noise” which is much similar to the driver’s speech, needs to be filtered out, because the smart-car should be able to respond to the driver’s commands even when other passengers are speaking simultaneously.