Download metadata file from here.

Get the data

  1. First download the raw data following the instructions in the following datasets:
    1. MUSIC21 (https://github.com/roudimit/MUSIC_dataset)
    2. AVSSBench (https://github.com/OpenNLPLab/AVSBench)
    3. Solos (https://github.com/JuanFMontesinos/Solos)
    4. URMP (https://labsites.rochester.edu/air/projects/URMP.html)
  2. Use training/load_sms.py script (from https://github.com/ilpoviertola/SAGANet repo) to download sounding object segmentation masks and process the raw videos.