========================================================================================== ====== The Romanian Deva Criminal Investigation Audio Recordings (RODeCAR) Dataset ====== ========================================================================================== Serban MIHALACHE, Dragos BURILEANU Speech and Dialogue Research Laboratory (SpeeD) National University of Science and Technology POLITEHNICA Bucharest serban.mihalache@upb.ro dragos.burileanu@upb.ro Version 1.5 / 19-December-2024 ========================================================================================== 1. INTRODUCTION The Romanian Deva Criminal Investigation Audio Recordings (RODeCAR) dataset consists of approximately 7.5 hours of recorded audio data, with approximately 5 hours representing actual speech content, acquired from 20 speakers (4 female, 16 male) during interviews and questionings conducted by Romanian law enforcement agencies, in which all participants were persons of interest (guilty parties, suspects, witnesses, etc.). Of the total speech content (excluding the prosecutors), 39.5% is deceptive (false), objectively determined by thorough review of the recordings and associated case notes, together with the original prosecutors, and using ulterior confessions and timeline reconstructions as evidence regarding the truthfulness of each participant's statements. Transcriptions (per segment) of all the recordings are also included. ========================================================================================== 2. DESCRIPTION AND DETAILS ------------------------------------------------------------------------------------------ 2.1. DESCRIPTION The RODeCAR dataset was constructed using media files acquired in 9 older criminal cases, investigating murder, sexual assault, and fraud charges, stored in DVD format. After an initial filtering of the content, to select only the actual interactions with the interviewees, all 20 involved speakers were manually identified (ID) and associated with an ID number, with special values reserved for the prosecutors (to facilitate the exclusion of their associated content from the dataset). Gender was also taken into account, having 4 females and 16 males. The audio tracks were then extracted using the FFmpeg framework and saved in 16-bit PCM format, at 16 kHz sampling rate. The 26 resulting audio recordings are sorted into three distinct categories, depending on the content type and how the participants are involved: - Questionings (Q) = interrogations of participants by the prosecutor in a formal environment and following a strict procedure; this is the most stressful scenario for the participants. - Interviews (I) = interactions between the prosecutor and the participant in an informal environment (often more familiar to the questioned party); this is the least stressful scenario for the participants. - Testimonies (M) = uninterrupted, free-form recounting or confessions given by the participants, often as a follow-up to a previous interrogation. There are 14 Q-recordings, 10 M-recordings, and 2 I-recordings available in the dataset. For each file, semi-automatic segmentation was employed. First, a simple speech detection algorithm, using local energy and pause duration features, was used to provide a rough estimation of the start and stop times of each speech segment not considering diarization. Afterwards, all segment timestamps were manually corrected and associated with the correct speaker. A segment is defined, thus, as a portion of speech from a single speaker, either 1) separated by a pause of at least 400 ms from other portions of speech from the same speaker, or 2) separated from other portions of speech from a different speaker, regardless of the onset delay duration. For ambiguous cases (e.g. the prosecutor and the participant cutting each other off), the stop time of the interviewer's segment is chosen at the end of the last phone in their speech, coinciding with the start time of the participant's segment, i.e. considering a null speech onset latency. In order to determine the truthfulness of the participants' statements as clearly and accurately as possible, a meticulous manual review of the recordings and associated notes was conducted together with the prosecutor who originally investigated these cases, in order to determine the truthfulness of the content. The binary annotation (truthful [T] / deceptive [i.e., false, F]) is made per speech segment, but in a global sense; e.g. a short segment, containing factually accurate information, found within a longer speaker turn engaging in deceptive behavior, will also be labeled as deceptive. This is further supported by the psychological argument that the state of mind the participants find themselves in when lying will be sustained by the long-term goal of deceiving the prosecutor, the deception cues still being present in the participants' speech. The prosecutor's speech segments were automatically classified as truthful, and should be excluded in order to balance class distribution. Finally, the inevitable uncertainty concerning the finer details or arising from unavailable information regarding the investigations is addressed by associating a confidence level to each interaction (file), varying from 70% to 100%. These can be used in experiments either to filter the content, or to replace the binary distribution with a continuous truth value, converting the task to a regression problem; e.g. speech segments associated with a 70% confidence level would be given a score of 0.7, etc. The dataset consists of 7 hours and 32 minutes of total content (Tdur0), of which 4 hours and 46 minutes represent actual speech content (Tdur0_s). Out of this, 3 hours and 27 minutes represent the participants' speech segments (excluding the prosecutors; Tdur_s); 2 hours and 6 minutes represent the truthful speech content (Tdur_T), while 1 hour and 21 minutes represent the deceptive speech content (Tdur_F). The truthful (T) content comprises 60.5% of the total speech content (excluding the prosecutors), with the other 39.5% being labeled as deceptive (i.e., false, F). ------------------------------------------------------------------------------------------ 2.2. GDPR NOTICE In accordance with the European Union General Data Protection Regulation (EU GDPR 2016/679), all speech intervals including names or other non-disclosable personal information have been replaced with silence intervals (signal samples with values equal to zero). ------------------------------------------------------------------------------------------ 2.3. CONTENT SUMMARY No. of files: Nfiles = 26 (duration between 04:39 and 57:40) No. of speakers: Nspkrs = 20 (16 male, 4 female) + 2 (male; the prosecutors) Total content duration: Tdur0 = 07:32:40 <- all speech segments (including pauses) Total speech content duration: Tdur0_s = 04:45:51 <- all speech segments (excluding pauses) Participant speech content duration: Tdur_s = 03:27:29 <- participants' speech segments (excluding the prosecutors) Truthful speech content duration: Tdur_T = 02:05:36 <- participants' truthful speech segments (excluding the prosecutors); 60.5% of Tdur_s Deceptive speech content duration: Tdur_F = 01:21:53 <- participants' deceptive speech segments (excluding the prosecutors); 39.5% of Tdur_s ========================================================================================== 3. PACKAGE STRUCTURE ./Annotation ./Annotation/RODeCAR_bff.xlsx ----- This file contains the global dataset annotation. - The "Master" sheet gives details (and hyperlinks) for each audio file (content duration, content type, file names, speaker IDs and genders). - The individual file sheets provide timestamps (start and stop times) for each speech segment, as well as segment durations, inter-segmental pause durations, speaker IDs and genders, and the truthful/deceptive (T/F) annotation. - Additionally, the transcriptions of the speech segments are provided (unintelligible words have been transcribed with the keyword "NEINTELIGIBIL" and utterances of people's names have been transcribed with the keyword "NUME"). - The individual file sheets also include confidence scores representing the annotation reliability (with 100% meaning complete confidence), which can be incorporated within more advanced training strategies. ./Annotation/silence_list.xlsx ----- This file describes the silence intervals used to replace utterances of people's names. - The individual file sheets provide timestamps for each silence interval. ./Annotation/speaker_list.csv ----- An additional CSV file that contains a list of all speaker IDs and genders. Provided for ease of use. ./Annotation/F1_2_Q1.csv ----- (26 files) Additional CSV files that contain the information included in the global annotation file ("RODeCAR_bff.xlsx"). Provided for ease of use. ./Annotation/F1_2_Q2.csv ./Annotation/F1_3_Q1.csv ./Annotation/F1_4_M1.csv ./Annotation/F1_4_Q1.csv ./Annotation/F1_5_Q1.csv ./Annotation/F2_1_M1.csv ./Annotation/F2_2_Q1.csv ./Annotation/F3_1_M1.csv ./Annotation/F3_3_M1.csv ./Annotation/F3_4_Q1.csv ./Annotation/F3_5_Q1.csv ./Annotation/F3_8_M1.csv ./Annotation/F3_15_M1.csv ./Annotation/F3_15_Q1.csv ./Annotation/F3_21_Q1.csv ./Annotation/F6_1_Q1.csv ./Annotation/F6_2_I1.csv ./Annotation/F6_5_M1.csv ./Annotation/F6_6_I1.csv ./Annotation/F6_7_Q1.csv ./Annotation/F6_10_Q1.csv ./Annotation/F7_3_M1.csv ./Annotation/F8_3_M1.csv ./Annotation/F10_1_Q1.csv ./Annotation/F11_6_M1.csv ./Files_WAV ----- This folder contains the 26 audio recordings, as detailed in the DESCRIPTION section. ./Files_WAV/F1_2_Q1.wav ./Files_WAV/F1_2_Q2.wav ./Files_WAV/F1_3_Q1.wav ./Files_WAV/F1_4_M1.wav ./Files_WAV/F1_4_Q1.wav ./Files_WAV/F1_5_Q1.wav ./Files_WAV/F2_1_M1.wav ./Files_WAV/F2_2_Q1.wav ./Files_WAV/F3_1_M1.wav ./Files_WAV/F3_3_M1.wav ./Files_WAV/F3_4_Q1.wav ./Files_WAV/F3_5_Q1.wav ./Files_WAV/F3_8_M1.wav ./Files_WAV/F3_15_M1.wav ./Files_WAV/F3_15_Q1.wav ./Files_WAV/F3_21_Q1.wav ./Files_WAV/F6_1_Q1.wav ./Files_WAV/F6_2_I1.wav ./Files_WAV/F6_5_M1.wav ./Files_WAV/F6_6_I1.wav ./Files_WAV/F6_7_Q1.wav ./Files_WAV/F6_10_Q1.wav ./Files_WAV/F7_3_M1.wav ./Files_WAV/F8_3_M1.wav ./Files_WAV/F10_1_Q1.wav ./Files_WAV/F11_6_M1.wav ./README.txt ----- The current file. ========================================================================================== 4. CHANGE LIST Version 1.4 -> Version 1.5 - Updated README file and EULA. Version 1.3 -> Version 1.4 - Added speech transcriptions. --- The transcriptions are included in the global annotation file ("RODeCAR_bff.xlsx") and in the corresponding additional CSV files provided for ease of use. --- Unintelligible words have been transcribed with the keyword "NEINTELIGIBIL" and utterances of people's names have been transcribed with the keyword "NUME". - The enhanced versions of the audio recordings (described in the original paper) are no longer included. --- This explains a discrepancy between the information found within this README file and the original paper ("Introducing the RODeCAR Database for Deceptive Speech Detection" by Mihalache et al., 2019). - Updated README file and EULA. Version 1.2 -> Version 1.3 - Updated README file and EULA. Version 1.1 -> Version 1.2 - Added (and processed) one additional audio recording (F11_6_M1). --- Consequently, the no. of speakers has increased from 19 to 20 (1 additional male speaker included) and all content durations have increased accordingly. --- This explains a discrepancy between the information found within this README file and the original paper ("Introducing the RODeCAR Database for Deceptive Speech Detection" by Mihalache et al., 2019). - Updated README file and EULA. Version 1.0 -> Version 1.1 - All speech intervals including names or other non-disclosable personal information have been replaced with silence intervals, in accordance with the European Union General Data Protection Regulation (EU GDPR 2016/679). --- An additional annotation file (./Annotation/silence_list.xlsx) has been added, which includes individual file sheets that provide timestamps for each silence interval used in each audio recording. ========================================================================================== 5. CITATION All publications reporting on research using this corpus must acknowledge this by citing the following paper: S. Mihalache, G. Pop, and D. Burileanu, "Introducing the RODeCAR Database for Deceptive Speech Detection," Proceedings of the 10th International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Timisoara, Romania, pp. 1-6, 10-12 Oct. 2019, ISBN: 978-1-7281-0983-1, DOI:10.1109/SPED.2019.8906542 ========================================================================================== 6. ACKNOWLEDGMENTS This work was supported in part by the Romanian Ministry of Research and Innovation, UEFISCDI, project SPIA-VA, agreement 2SOL/2017, grant PN-III-P2-2.1-SOL-2016-02-0002. We also give special thanks to Prof. Tiberiu Medeanu, PhD, Professor at the Faculty of Law, West University of Timisoara, Romania, who made this work possible. ==========================================================================================