Avatar-based speech separation by Upload AI LLC

The idea is based on Dual-path Transformer. We add a modulator to incorporate speaker information and thus achieve personalized speech models. The overview of the model architecture is shown below.

Prerequisites

(↑up to contents)

Download the Avatar10Mix2 dataset, which contains audios recorded from 10 speakers:

cd datasets
sh download_avatar10mix2.sh
cd ..

Install dependencies:

pip install -r requirements.txt

How to run the code

(↑up to contents) The training and testing code for separating speech from ambient noise is provided in speech_vs_ambient. Change the directory to speech_vs_ambient and run the following commands:

Training

python train.py --exp_dir exp/speech_vs_ambient

Testing

python eval.py --exp_dir exp/speech_vs_ambient

Check results

(↑up to contents) We provide a simple webpage to review good test examples, which can be found at

exp/speech_vs_ambient/vis/examples/

The training curves are logged with Tensorboard. To view them, run

tensorboard --logdir exp/speech_vs_ambient/lightning_logs/

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
datasets		datasets
doc		doc
speech_vs_ambient		speech_vs_ambient
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Avatar-based speech separation by Upload AI LLC

Contents

Prerequisites

How to run the code

Check results

About

Releases

Packages

Languages

cajal/AvaTr

Folders and files

Latest commit

History

Repository files navigation

Avatar-based speech separation by Upload AI LLC

Contents

Prerequisites

How to run the code

Check results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages