LVLib SMO v1 for General Audio Classification

This dataset follows a three-class classification scheme, according to the Speech/Music/Other (SMO) taxonomy. Considering that Speech/Music classification is an easy task, the setup of a more demanding scenario is required. A third class (Other) is introduced, containing a vast number of audio signals (i.e. environmental sounds, human and animal bioacoustics, weather phenomena, engines, motors, other machinery and many types of noise sounds), not listed as speech or music, making the dataset simulate better real-world classification scenarios. To make results comparable across algorithms of different creators, a linear data splitting and 3-fold cross validation is recommended. Split points are 0.33 and 0.66.


Vrysis, L., Tsipas, N., Dimoulas, C., & Papanikolaou, G. (2017, May). Extending Temporal Feature Integration for Semantic Audio Analysis. In Audio Engineering Society Convention 142. Audio Engineering Society.


If you wish to use the dataset for academic purposes, please cite the aforementioned publications. Data from GTZAN Music-Speech dataset is also included. You should also add the required reference.