Actual Network: https://github.com/f90/Wave-U-Net-Pytorch. We have used this as a baseline and restructured it to use the appropriate dataset.
Saraga Carnatic Dataset:
It has five stems: Mixed, Vocal, Violin, Mrindangam right, and Mrindangam Left. Converting Mridangam left and right into a single audio file (mridangam) Expecting Four stem outputs, namely: Vocal, violin, mridangam and others
The dataset is trained to extract stems: Mrindangam Left, Mrindangam Right, Vocal and Violin.
Metrics | With Bleeding Effects |
---|---|
SDR | -0.19096690404060424 |
The dataset is trained to extract Three stems: Mridangam, Vocal and Violin. This is with some minor changes in the code in data loading. The mridangal left and right are added together. Where ever there is a secondary vocal, it is added to the primary vocal. Ghatam files are removed.
Metrics | With Bleeding Effects |
---|---|
SDR | 1.166956417870889 |
The dataset is trained to extract stems: Mridangam (left+right), Vocal(/s) and Violin, Bleeding of the sources is reduced considerably to achieve higher performance.
Metrics | With Bleed | Without Bleed |
---|---|---|
SDR | 1.166956417870889 |
MUSDB18HQ is internally artificially bled to evaluate the performance of the effect of leakage. The actual source is maintained dominant and other sources are bled to it with a volume reduction of 10dB. Trained on PyTorch: Wave-U-Net network.
Metrics | Actual | With Bleed | After Bleed Removal |
---|---|---|---|
SDR | 2.309013108265547 | 0.9656615837211856 | 1.729928257040701 |