LVLib v2 for General Audio Detection

LVLib v2 includes continuous real-world recordings with speech, music, silence and ambient noise classes. Uncompressed audio format is used (44,100Hz, 16bit, mono), whereas the majority of the captures were made with the use of mobile terminals (smartphones) and not professional recording equipment. It includes a variety of audio content, combining speech, radio and TV broadcasts, music, environmental and household sounds, electronic devices’ noise, ringing, etc., along with their semantic annotation, defining class-labeled segments. It is split into two subsets. The first one is a sixty minutes compilation of sounds collected from libraries and recordings and is recommended for training. The second can be used for measuring the performance of the system and it is annotated in order to highlight audio events in terms of transition points.


Vrysis, L., Tsipas, N., Dimoulas, C., & Papanikolaou, G. (2016). Crowdsourcing audio semantics by means of hybrid bimodal segmentation with hierarchical classification. Journal of the Audio Engineering Society, 64(12), 1042-1054.


If you wish to use the dataset for academic purposes, please cite the aforementioned publications.


You can get the dataset from here.