Skip to content
/ AvaTr Public

Official implementation of the paper "AvaTr: One-Shot Speaker Extraction with Transformers"

Notifications You must be signed in to change notification settings

cajal/AvaTr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Avatar-based speech separation by Upload AI LLC


The idea is based on Dual-path Transformer. We add a modulator to incorporate speaker information and thus achieve personalized speech models. The overview of the model architecture is shown below. AvaDPT

Contents

Prerequisites

(↑up to contents)

  1. Download the Avatar10Mix2 dataset, which contains audios recorded from 10 speakers:
cd datasets
sh download_avatar10mix2.sh
cd ..
  1. Install dependencies:
pip install -r requirements.txt

How to run the code

(↑up to contents) The training and testing code for separating speech from ambient noise is provided in speech_vs_ambient. Change the directory to speech_vs_ambient and run the following commands:

  1. Training
python train.py --exp_dir exp/speech_vs_ambient
  1. Testing
python eval.py --exp_dir exp/speech_vs_ambient

Check results

(↑up to contents) We provide a simple webpage to review good test examples, which can be found at

exp/speech_vs_ambient/vis/examples/

The training curves are logged with Tensorboard. To view them, run

tensorboard --logdir exp/speech_vs_ambient/lightning_logs/

About

Official implementation of the paper "AvaTr: One-Shot Speaker Extraction with Transformers"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages