LVLib SMO v3 for General Audio Classification

This dataset is formulated to highlight data normalization issues within CNN topologies. It is an exact copy of the LVLib-SMO-v1, but with modified gain. In particular, the LVLib-SMO-v3 was generated after applying random gain ({0, -10, -20, -30} dBFS) for each fold and class, following a 3-fold setup. For this reason, and to make results comparable across algorithms of different creators, a linear data splitting and 3-fold cross validation is mandatory. Split data for each class linearly at spit points 0.33 and 0.66 to form the folds.


Vrysis, L., Tsipas, N., Thoidis, I., & Dimoulas, C. (2020). 1D/2D Deep CNNs vs. Temporal Feature Integration for General Audio Classification. Journal of the Audio Engineering Society, 68(1/2), 66-77.


If you wish to use the dataset for academic purposes, please cite the aforementioned publications. Data from GTZAN Music-Speech dataset is also included. You should also add the required reference.