A Public Speech Emotion Recognition Dataset
Databases of emotional speech are divided into two main categories, the ones that contain utterances of acted emotional speech and the ones that contain spontaneous emotional speech. Both categories have benefits and limitations.
The Acted Emotional Speech Dynamic Database (AESDD) is a publically available speech emotion recognition dataset. It contains utterances of acted emotional speech in the Greek language.
The motive for the creation of the database was the absence of a publically available high-quality database for SER in Greek, a realization made during the research on an emotion-triggered lighting framework for theatrical performance [1]. The database utterances with five emotions: anger, disgust, fear, happiness, and sadness.
The first version of the speech emotion recognition dataset was created in collaboration with a group of professional actors, who showed vivid interest in the proposed framework. Dynamic (in AESDD) refers to the intention of constantly expanding the database through the contribution of actors and performers that are involved, or interested in the project. While the call for contribution addresses to actors, the SER models that are trained on the AESDD are not exclusively performance-oriented.
The first version of the AESDD was presented in [2].
In [4], subjective evaluation experiments were carried out on the database, to assess human accuracy in recognizing the intended emotion in AESDD utterances. The accuracy of human listeners was estimated at around 74%.
Publications
- Vryzas, N., Liatsou, A., Kotsakis, R., Dimoulas, C., & Kalliris, G. (2017, August). Augmenting Drama: A Speech Emotion-Controlled Stage Lighting Framework. In Proceedings of the 12th International Audio Mostly Conference on Augmented and Participatory Sound and Music Experiences (p. 8). ACM.
- Vryzas, N., Kotsakis, R., Liatsou, A., Dimoulas, C. A., & Kalliris, G. (2018). Speech emotion recognition for performance interaction. Journal of the Audio Engineering Society, 66(6), 457-467.
- Vryzas, N., Vrysis, L., Kotsakis, R., & Dimoulas, C. (2018, September). Speech emotion recognition adapted to multimodal semantic repositories. In 2018 13th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP) (pp. 31-35). IEEE.
- Vryzas, N., Matsiola, M., Kotsakis, R., Dimoulas, C., & Kalliris, G. (2018, September). Subjective Evaluation of a Speech Emotion Recognition Interaction Framework. In Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion (p. 34). ACM.
Download
The last version of the AESDD, as well as tools and documentation on the way the database is organized, can be found in the following link:
Acted Emotional Speech Dynamic Database
If you use the AESDD for scientific research please cite [2] and [4].
Contribute to the AESDD Speech Emotion Recognition Datasets
The SpeechEmotionRecognition.xyz project aims at the formation of an open-sourced, anonymized multilingual speech emotion recognition datasets. The recordings of emotional speech that are submitted to the site, will be validated, and be provided publically for research non-commercial purposes.
The resulting dataset will be an extension of the Acted Emotional Speech Dynamic Database (AESDD).
Volunteers can contribute by recording their voice online at
https://speechemotionrecognition.xyz
fb: https://www.facebook.com/Speech-Emotion-Recognition-109562454044394/?modal=admin_todo_tour
Contact
Other available datasets by M3C
If there are any questions or if you would like to contribute to the project, please contact Nikolaos Vryzas (nvryzas@auth.gr) or use the form below: