Download metadata file from here.
Get the data
- First download the raw data following the instructions in the following datasets:
- MUSIC21 (https://github.com/roudimit/MUSIC_dataset)
- AVSSBench (https://github.com/OpenNLPLab/AVSBench)
- Solos (https://github.com/JuanFMontesinos/Solos)
- URMP (https://labsites.rochester.edu/air/projects/URMP.html)
- Use
training/load_sms.py
script (from https://github.com/ilpoviertola/SAGANet repo) to download sounding object segmentation masks and process the raw videos.