Paralinguistic Datasets


ROMANIAN DEVA CRIMINAL INVESTIGATION AUDIO RECORDINGS (RODeCAR) DATASET

License

Licensed under Creative Commons BY-NC-ND 4.0.

Description

The Romanian Deva Criminal Investigation Audio Recordings (RODeCAR) dataset consists of approximately 7.5 hours of recorded audio data, with approximately 5 hours representing actual speech content, acquired from 20 speakers (4 female, 16 male) during interviews and questionings conducted by Romanian law enforcement agencies, in which all participants were persons of interest (guilty parties, suspects, witnesses, etc.); 39.5% of the total speech content (excluding the prosecutors) is deceptive (false), objectively determined by thorough review of the recordings and associated case notes, together with the original prosecutors, and using ulterior confessions and timeline reconstructions as evidence regarding the truthfulness of each participant’s statements.

Content summary:
No. of files:                        Nfiles  = 26 (duration between 04:39 and 57:40)
No. of speakers:                     Nspkrs  = 20 (16 male, 4 female) + 2 (male; the prosecutors)
Total content duration:              Tdur0   = 07:32:40 <- all speech segments (including pauses)
Total speech content duration:       Tdur0_s = 04:45:51 <- all speech segments (excluding pauses)
Participant speech content duration: Tdur_s  = 03:27:29 <- participants' speech segments (excluding the prosecutors)
Truthful speech content duration:    Tdur_T  = 02:05:36 <- participants' truthful speech segments (excluding
.                                                          the prosecutors); 60.5% of Tdur_s
Deceptive speech content duration:   Tdur_F  = 01:21:53 <- participants' deceptive speech segments (excluding
.                                                          the prosecutors); 39.5% of Tdur_s

All publications reporting on research using this corpus have to acknowledge this by citing the following paper:

  • S. Mihalache, G. Pop, and D. Burileanu, “Introducing the RODeCAR Database for Deceptive Speech Detection,” Proceedings of the 10th International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Timisoara, Romania, pp. 1-6, 10-12 Oct. 2019, ISBN: 978-1-7281-0983-1, DOI:10.1109/SPED.2019.8906542

To obtain the RODeCAR dataset, please fill out the End User License Agreement (EULA) available here and send it to serban.mihalache@upb.ro.