Paralinguistic Datasets

ROMANIAN DEVA CRIMINAL INVESTIGATION AUDIO RECORDINGS (RODeCAR) DATASET

Version

Version 1.5 / 19-December-2024.

License

Licensed under Creative Commons BY-NC-ND 4.0.

Description

The Romanian Deva Criminal Investigation Audio Recordings (RODeCAR) dataset consists of approximately 7.5 hours of recorded audio data, with approximately 5 hours representing actual speech content, acquired from 20 speakers (4 female, 16 male) during interviews and questionings conducted by Romanian law enforcement agencies, in which all participants were persons of interest (guilty parties, suspects, witnesses, etc.).
Of the total speech content (excluding the prosecutors), 39.5% is deceptive (false), objectively determined by thorough review of the recordings and associated case notes, together with the original prosecutors, and using ulterior confessions and timeline reconstructions as evidence regarding the truthfulness of each participant’s statements.
Transcriptions (per segment) of all the recordings are also included.

Content summary:
No. of files: Nfiles = 26 (duration between 04:39 and 57:40) No. of speakers: Nspkrs = 20 (16 male, 4 female) + 2 (male; the prosecutors) Total content duration: Tdur0 = 07:32:40 <- all speech segments (including pauses) Total speech content duration: Tdur0_s = 04:45:51 <- all speech segments (excluding pauses) Participant speech content duration: Tdur_s = 03:27:29 <- participants' speech segments (excluding the prosecutors) Truthful speech content duration: Tdur_T = 02:05:36 <- participants' truthful speech segments (excluding . the prosecutors); 60.5% of Tdur_s Deceptive speech content duration: Tdur_F = 01:21:53 <- participants' deceptive speech segments (excluding . the prosecutors); 39.5% of Tdur_s

For many more details, please refer to the README file included with the dataset and also provided here.

Citation

All publications reporting on research using this corpus must acknowledge this by citing the following paper:

S. Mihalache, G. Pop, and D. Burileanu, “Introducing the RODeCAR Database for Deceptive Speech Detection,” Proceedings of the 10th International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Timisoara, Romania, pp. 1-6, 10-12 Oct. 2019, ISBN: 978-1-7281-0983-1, DOI:10.1109/SPED.2019.8906542

Download

To obtain the RODeCAR dataset, please fill out the End User License Agreement (EULA) available here and send it to serban.mihalache@upb.ro.