This dataset follows a three-class classification scheme, according to the Speech/Music/Other (SMO) taxonomy. Considering that Speech/Music classification is an easy task, the setup of a more demanding scenario is required. A third class (Other) is introduced, containing a vast number of audio signals (i.e. environmental sounds, human and animal bioacoustics, weather phenomena, engines, motors, other machinery and many types of noise sounds), not listed as speech or music, making the dataset simulate better real-world classification scenarios. To make results comparable across algorithms of different creators, a linear data splitting and 3-fold cross validation is recommended. Split points are 0.33 and 0.66.
Publications
Vrysis, L., Tsipas, N., Dimoulas, C., & Papanikolaou, G. (2017, May). Extending Temporal Feature Integration for Semantic Audio Analysis. In Audio Engineering Society Convention 142. Audio Engineering Society.
License
If you wish to use the dataset for academic purposes, please cite the aforementioned publications. Data from GTZAN Music-Speech dataset is also included. You should also add the required reference.