Downloads


ROMANIAN READ-SPEECH CORPUS (RSC)

License

Licensed under Creative Commons BY-NC-ND 3.0.

Description

“RSC” is a read speech corpus collected by Speech and Dialogue Research Laboratory. The recordings were made under different conditions (various microphones and various audio recording systems), using an online audio recording application developed by the same research group. The speakers were mainly students and staff of Faculty of Electronics, Telecommunications and Information Technology from University “Politehnica” of Bucharest.

The corpus consists of 136,120 audio files collected from 164 Romanian native speakers. Each audio file contains utterances from literature, online news and isolated words in Romanian language. In general, there are between 130 and 11,000 audio files per speaker. The total size of the database is around 100 hours. The average length of an utterance is 2.6 seconds.

“RSC” is split into training, and evaluation sets, as follows:

  • training set: 133,616 files from 156 speakers
  • evaluation set: 2,504 files from 21 speakers (out of which 13 speakers are also part of the training set)

Note: the above overlap occurs only in terms of speakers (voices), not in terms of utterances.

If you use this corpus in your research please cite one of the following papers:

  • Alexandru-Lucian Georgescu, Horia Cucu, Andi Buzo, Corneliu Burileanu,“RSC: A Romanian Read Speech Corpus for Automatic Speech Recognition,” submitted to 12th International Conference on Language Resources and Evaluation, 2020.
  • Alexandru-Lucian Georgescu, Horia Cucu, Corneliu Burileanu, “SpeeD’s DNN Approach to Romanian Speech Recognition,” in the Proceedings of the 9th Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest, 2017, 8p, ISBN 978-1-5090-6496-0.
  • Horia Cucu, Andi Buzo, Lucian Petrică, Dragoş Burileanu and Corneliu Burileanu, “Recent Improvements of the SpeeD Romanian LVCSR System“, in the Proceedings of the 10th International Conference on Communications (COMM), Bucharest, 2014, pp. 111-114.

Download Romanian Read-Speech Corpus (pass: rsc2019!)

 


KITE – A SPEECH DATABASE FOR UAV CONTROL

“Kite” is multi-modal dataset for the control of unmanned aerial vehicles (UAVs). Please see Kite website for details and download information.

 


RAW EMG CORPUS

License

Licensed under Creative Commons BY-NC-ND 3.0.

Description

This corpus is comprised of raw EMG data containing 13 of the most used gestures. Data was collected from able-bodied young males and females aged between 20-25. The acquisition hardware was the Myo Armband from Thalmic Labs. The armband has 8 surface EMG sensors placed circularly around the forearm.

There are 2 types of subjects:

  • “First-time” subjects: they had no prior experience about the setup and the experiment.
  • “Experimented” subjects: they know the setup and the experiment.

The “first-time” subjects had to do 2 rounds of gestures; in the first one the gesture was shown and they instructed to freely repeat it; for the second round they were shown how to correctly execute the gesture and asked to repeat the experiment. “Experimented” subjects only did the second round, because they knew how to perform each gesture.

All the gestures in the corpus are depicted in the following figures:

All the files in the corpus have the same format: XXXX_YY_G_T where:
XXXX is the ID of the user. If it starts with 0 it’s a “first time” subject and if it starts with “2” is an “experimented” subject.
YY is the label of the gesture.
G is the gender of the user: M for males and F for females
T is the type of gesture: L for free gesture (first round of “first-time” subjects) and A for assisted gesture (second round of “first-time” subjects).

Download EMG Raw Corpus (pass: emg)

 


MUSIC CORPUS

License

Licensed under Creative Commons BY-NC-ND 3.0.

Description

This corpus consists of 100 small songs and musical exercises, recorded in MIDI format. The recordings were made using a Roland organ with 5 octaves (range C2-C7), directly connected to a laptop. Each octave has 12 semitones, summing a total of 61 notes, all used in the recordings. All 100 songs are monophonic, meaning that a single note is played at a time, without overlapping. Some songs are repeated on different octaves to obtain a minimum number of occurrences for all notes.

The dataset is split into training and evaluation sets, as follows:

  • training set: 90 files representing 1 hour and 7 minutes of audio recordings, with length between 6s and 150s
  • evaluation set: 10 files representing around 10 minutes of audio recordings, with length between 12s and 120s

The recordings were made under normal conditions, without noise. The songs were recorded using MidiEditor, in the following manner: the recording mode from MidiEditor was activated before each song being played and the program was stopped after the song was finished (so there will be silence at the beginning and end of each recording). Each MIDI file has an associated WAV file and a TXT file. The text file has information regarding the played note, represented in a MIDI number, as well as the onset and offset of the note, measured in seconds. This file has the following format:
Onset [s] Offset [s] Note [MIDI number]

Download Music corpus (pass: music)


RODIGITS SPEECH CORPUS

License

Licensed under Creative Commons BY-NC-ND 3.0.

Description

“RoDigits” speech corpus was collected by Speech and Dialogue Research Laboratory. The recordings were made under different conditions (various microphones and various audio recording systems), using an online audio recording application developed by the same research group. The speakers were mainly students of Faculty of Electronics, Telecommunications and Information Technology from University “Politehnica” of Bucharest.

The corpus consists of 15,389 audio files collected from 154 Romanian native speakers. Each audio file contains the utterances of 12 random digits [0-9] in Romanian language. In general, there are 100 audio files per speaker. There are several exceptions: for 11 speakers the corpus comprises only 99 audio files per speaker. The total size of the database is around 38 hours. The average length of an utterance is 8.7 seconds.

“RoDigits” speech corpus is split into training, development and evaluation sets, as follows:

  • training set: 11120 files – 80 files from 139 speakers (file IDs between 1-50 and 71-100)
  • development set: 2780 files – 20 files from 139 speakers (file IDs between 51-70)
  • evaluation set: 1489 files – ~100 files from 15 speakers

If you use this corpus in your research please cite one of the following papers:

  • Alexandru Lucian Georgescu, Alexandru Caranica, Horia Cucu, Corneliu Burileanu, “RoDigits – a Romanian connected-digits speech corpus for automatic speech and speaker recognition,” in University “Politehnica” of Bucharest Scientific Bulletin, Series C, vol. 80, issue 3, pp. 45-62, Bucharest, 2018, ISSN: 2286-3540.

Download RoDigits Speech Corpus (pass: rodigits)

Note: a first version of the corpus, available online before December 8, 2017, comprised some corrupted files. If you downloaded the corpus before this date please download the correct version which is now available.