LVLib SMO v4 for General Audio Classification

LVLib-v4 follows the same 3-class scheme as the LVLib SMO family but featuring data deformations: variable gain and additive noise. There is evidence that deep learning architectures require larger amounts of data for meaning-ful training and are more prone to get a dataset bias, in contrast to conventional learning methods. In addition to this, CNNs are sensitive to data deformations (loudness variations, additive noise, etc.). LVLib-v4 was designed to simulate real-world scenarios, trying to expose these weaknesses. In specific, data is distorted by adding noise, according to the following protocol: 1st fold: Music is contaminated with Speech, Speech with Music and Other with machinery noise, 2nd fold: Music is contaminated with Other, Speech with machinery noise and Other with Music, 3rd fold: Music is contaminated with machinery noise, Speech with Other and Other with Speech. The Signal-to-Noise-Ratio (SNR) of the additive noise follows a Gaussian distribution (9 ±3 dB). In addition to this, the dataset was further de-formed after applying random gain ({0, -10, -20, -30} dBFS) for each fold and class, following a 3-fold set-up.


If you wish to use the dataset for academic purposes, please cite the aforementioned publications. Data from GTZAN Music-Speech dataset is also included. You should also add the required reference.