==========================================================================================
====== The Romanian Deva Criminal Investigation Audio Recordings (RODeCAR) Dataset ======
==========================================================================================
Serban MIHALACHE, Dragos BURILEANU
Speech and Dialogue Research Laboratory (SpeeD)
National University of Science and Technology POLITEHNICA Bucharest

serban.mihalache@upb.ro
dragos.burileanu@upb.ro

Version 1.5 / 19-December-2024
==========================================================================================
1. INTRODUCTION

The Romanian Deva Criminal Investigation Audio Recordings (RODeCAR) dataset consists of
approximately 7.5 hours of recorded audio data, with approximately 5 hours representing
actual speech content, acquired from 20 speakers (4 female, 16 male) during interviews and
questionings conducted by Romanian law enforcement agencies, in which all participants
were persons of interest (guilty parties, suspects, witnesses, etc.).
Of the total speech content (excluding the prosecutors), 39.5% is deceptive (false),
objectively determined by thorough review of the recordings and associated case notes,
together with the original prosecutors, and using ulterior confessions and timeline
reconstructions as evidence regarding the truthfulness of each participant's statements.
Transcriptions (per segment) of all the recordings are also included.
==========================================================================================
2. DESCRIPTION AND DETAILS
------------------------------------------------------------------------------------------
2.1. DESCRIPTION

The RODeCAR dataset was constructed using media files acquired in 9 older criminal cases,
investigating murder, sexual assault, and fraud charges, stored in DVD format.

After an initial filtering of the content, to select only the actual interactions with the
interviewees, all 20 involved speakers were manually identified (ID) and associated with
an ID number, with special values reserved for the prosecutors (to facilitate the
exclusion of their associated content from the dataset). Gender was also taken into
account, having 4 females and 16 males. The audio tracks were then extracted using the
FFmpeg framework and saved in 16-bit PCM format, at 16 kHz sampling rate.

The 26 resulting audio recordings are sorted into three distinct categories, depending on
the content type and how the participants are involved:
- Questionings (Q) = interrogations of participants by the prosecutor in a formal
                     environment and following a strict procedure; this is the most
                     stressful scenario for the participants.
- Interviews (I)   = interactions between the prosecutor and the participant in an
                     informal environment (often more familiar to the questioned party);
                     this is the least stressful scenario for the participants.
- Testimonies (M)  = uninterrupted, free-form recounting or confessions given by the
                     participants, often as a follow-up to a previous interrogation.
There are 14 Q-recordings, 10 M-recordings, and 2 I-recordings available in the dataset.

For each file, semi-automatic segmentation was employed. First, a simple speech detection
algorithm, using local energy and pause duration features, was used to provide a rough
estimation of the start and stop times of each speech segment not considering diarization.
Afterwards, all segment timestamps were manually corrected and associated with the correct
speaker. A segment is defined, thus, as a portion of speech from a single speaker, either
1) separated by a pause of at least 400 ms from other portions of speech from the same
speaker, or 2) separated from other portions of speech from a different speaker,
regardless of the onset delay duration. For ambiguous cases (e.g. the prosecutor and the
participant cutting each other off), the stop time of the interviewer's segment is chosen
at the end of the last phone in their speech, coinciding with the start time of the
participant's segment, i.e. considering a null speech onset latency.

In order to determine the truthfulness of the participants' statements as clearly and
accurately as possible, a meticulous manual review of the recordings and associated notes
was conducted together with the prosecutor who originally investigated these cases, in
order to determine the truthfulness of the content. The binary annotation (truthful [T] /
deceptive [i.e., false, F]) is made per speech segment, but in a global sense; e.g. a
short segment, containing factually accurate information, found within a longer speaker
turn engaging in deceptive behavior, will also be labeled as deceptive. This is further
supported by the psychological argument that the state of mind the participants find
themselves in when lying will be sustained by the long-term goal of deceiving the
prosecutor, the deception cues still being present in the participants' speech.

The prosecutor's speech segments were automatically classified as truthful, and should be
excluded in order to balance class distribution.

Finally, the inevitable uncertainty concerning the finer details or arising from
unavailable information regarding the investigations is addressed by associating a
confidence level to each interaction (file), varying from 70% to 100%. These can be used
in experiments either to filter the content, or to replace the binary distribution with a
continuous truth value, converting the task to a regression problem; e.g. speech segments
associated with a 70% confidence level would be given a score of 0.7, etc.

The dataset consists of 7 hours and 32 minutes of total content (Tdur0), of which 4 hours
and 46 minutes represent actual speech content (Tdur0_s). Out of this, 3 hours and 27
minutes represent the participants' speech segments (excluding the prosecutors; Tdur_s);
2 hours and 6 minutes represent the truthful speech content (Tdur_T), while 1 hour and 21
minutes represent the deceptive speech content (Tdur_F).

The truthful (T) content comprises 60.5% of the total speech content (excluding the
prosecutors), with the other 39.5% being labeled as deceptive (i.e., false, F).
------------------------------------------------------------------------------------------
2.2. GDPR NOTICE

In accordance with the European Union General Data Protection Regulation (EU GDPR
2016/679), all speech intervals including names or other non-disclosable personal
information have been replaced with silence intervals (signal samples with values equal to
zero).
------------------------------------------------------------------------------------------
2.3. CONTENT SUMMARY

No. of files:                           Nfiles  = 26 (duration between 04:39 and 57:40)
No. of speakers:                        Nspkrs  = 20 (16 male, 4 female) + 2 (male;
                                                              the prosecutors)
Total content duration:                 Tdur0   = 07:32:40 <- all speech segments
                                                              (including pauses)
Total speech content duration:          Tdur0_s = 04:45:51 <- all speech segments
                                                              (excluding pauses)
Participant speech content duration:    Tdur_s  = 03:27:29 <- participants' speech
                                                              segments (excluding the
                                                              prosecutors)
Truthful speech content duration:       Tdur_T  = 02:05:36 <- participants' truthful
                                                              speech segments (excluding
                                                              the prosecutors); 60.5% of
                                                              Tdur_s
Deceptive speech content duration:      Tdur_F  = 01:21:53 <- participants' deceptive
                                                              speech segments (excluding
                                                              the prosecutors); 39.5% of
                                                              Tdur_s
==========================================================================================
3. PACKAGE STRUCTURE

./Annotation
./Annotation/RODeCAR_bff.xlsx   -----   This file contains the global dataset annotation.
                                    -   The "Master" sheet gives details (and hyperlinks)
                                        for each audio file (content duration, content
                                        type, file names, speaker IDs and genders).
                                    -   The individual file sheets provide timestamps
                                        (start and stop times) for each speech segment, as
                                        well as segment durations, inter-segmental pause
                                        durations, speaker IDs and genders, and the
                                        truthful/deceptive (T/F) annotation.
                                    -   Additionally, the transcriptions of the speech
                                        segments are provided (unintelligible words have
                                        been transcribed with the keyword "NEINTELIGIBIL"
                                        and utterances of people's names have been
                                        transcribed with the keyword "NUME").
                                    -   The individual file sheets also include confidence
                                        scores representing the annotation reliability
                                        (with 100% meaning complete confidence), which can
                                        be incorporated within more advanced training
                                        strategies.
./Annotation/silence_list.xlsx  -----   This file describes the silence intervals used to
                                        replace utterances of people's names.
                                    -   The individual file sheets provide timestamps for
                                        each silence interval.
./Annotation/speaker_list.csv   -----   An additional CSV file that contains a list of all
                                        speaker IDs and genders. Provided for ease of use.
./Annotation/F1_2_Q1.csv        -----   (26 files) Additional CSV files that contain the
                                        information included in the global annotation file
                                        ("RODeCAR_bff.xlsx"). Provided for ease of use.
./Annotation/F1_2_Q2.csv
./Annotation/F1_3_Q1.csv
./Annotation/F1_4_M1.csv
./Annotation/F1_4_Q1.csv
./Annotation/F1_5_Q1.csv
./Annotation/F2_1_M1.csv
./Annotation/F2_2_Q1.csv
./Annotation/F3_1_M1.csv
./Annotation/F3_3_M1.csv
./Annotation/F3_4_Q1.csv
./Annotation/F3_5_Q1.csv
./Annotation/F3_8_M1.csv
./Annotation/F3_15_M1.csv
./Annotation/F3_15_Q1.csv
./Annotation/F3_21_Q1.csv
./Annotation/F6_1_Q1.csv
./Annotation/F6_2_I1.csv
./Annotation/F6_5_M1.csv
./Annotation/F6_6_I1.csv
./Annotation/F6_7_Q1.csv
./Annotation/F6_10_Q1.csv
./Annotation/F7_3_M1.csv
./Annotation/F8_3_M1.csv
./Annotation/F10_1_Q1.csv
./Annotation/F11_6_M1.csv
./Files_WAV                     -----   This folder contains the 26 audio recordings, as
                                        detailed in the DESCRIPTION section.
./Files_WAV/F1_2_Q1.wav
./Files_WAV/F1_2_Q2.wav
./Files_WAV/F1_3_Q1.wav
./Files_WAV/F1_4_M1.wav
./Files_WAV/F1_4_Q1.wav
./Files_WAV/F1_5_Q1.wav
./Files_WAV/F2_1_M1.wav
./Files_WAV/F2_2_Q1.wav
./Files_WAV/F3_1_M1.wav
./Files_WAV/F3_3_M1.wav
./Files_WAV/F3_4_Q1.wav
./Files_WAV/F3_5_Q1.wav
./Files_WAV/F3_8_M1.wav
./Files_WAV/F3_15_M1.wav
./Files_WAV/F3_15_Q1.wav
./Files_WAV/F3_21_Q1.wav
./Files_WAV/F6_1_Q1.wav
./Files_WAV/F6_2_I1.wav
./Files_WAV/F6_5_M1.wav
./Files_WAV/F6_6_I1.wav
./Files_WAV/F6_7_Q1.wav
./Files_WAV/F6_10_Q1.wav
./Files_WAV/F7_3_M1.wav
./Files_WAV/F8_3_M1.wav
./Files_WAV/F10_1_Q1.wav
./Files_WAV/F11_6_M1.wav
./README.txt                    -----   The current file.
==========================================================================================
4. CHANGE LIST

Version 1.4 -> Version 1.5
- Updated README file and EULA.

Version 1.3 -> Version 1.4
- Added speech transcriptions.
--- The transcriptions are included in the global annotation file ("RODeCAR_bff.xlsx") and
    in the corresponding additional CSV files provided for ease of use.
--- Unintelligible words have been transcribed with the keyword "NEINTELIGIBIL" and
    utterances of people's names have been transcribed with the keyword "NUME".
- The enhanced versions of the audio recordings (described in the original paper) are no
  longer included.
--- This explains a discrepancy between the information found within this README file and
    the original paper ("Introducing the RODeCAR Database for Deceptive Speech Detection"
    by Mihalache et al., 2019).
- Updated README file and EULA.

Version 1.2 -> Version 1.3
- Updated README file and EULA.

Version 1.1 -> Version 1.2
- Added (and processed) one additional audio recording (F11_6_M1).
--- Consequently, the no. of speakers has increased from 19 to 20 (1 additional male
    speaker included) and all content durations have increased accordingly.
--- This explains a discrepancy between the information found within this README file and
    the original paper ("Introducing the RODeCAR Database for Deceptive Speech Detection"
    by Mihalache et al., 2019).
- Updated README file and EULA.

Version 1.0 -> Version 1.1
- All speech intervals including names or other non-disclosable personal information have
  been replaced with silence intervals, in accordance with the European Union General Data
  Protection Regulation (EU GDPR 2016/679).
--- An additional annotation file (./Annotation/silence_list.xlsx) has been added, which
    includes individual file sheets that provide timestamps for each silence interval used
    in each audio recording.
==========================================================================================
5. CITATION

All publications reporting on research using this corpus must acknowledge this by citing
the following paper:

S. Mihalache, G. Pop, and D. Burileanu, "Introducing the RODeCAR Database for Deceptive
Speech Detection," Proceedings of the 10th International Conference on Speech Technology
and Human-Computer Dialogue (SpeD), Timisoara, Romania, pp. 1-6, 10-12 Oct. 2019, ISBN:
978-1-7281-0983-1, DOI:10.1109/SPED.2019.8906542
==========================================================================================
6. ACKNOWLEDGMENTS

This work was supported in part by the Romanian Ministry of Research and Innovation,
UEFISCDI, project SPIA-VA, agreement 2SOL/2017, grant PN-III-P2-2.1-SOL-2016-02-0002.

We also give special thanks to Prof. Tiberiu Medeanu, PhD, Professor at the Faculty of
Law, West University of Timisoara, Romania, who made this work possible.
==========================================================================================