Skip to content

Latest commit

 

History

History
77 lines (48 loc) · 2.23 KB

voice-to-text-and-translate.md

File metadata and controls

77 lines (48 loc) · 2.23 KB

Voice to text and translation

This demo is a simple voice to text and translation application. It uses Whisper model to transcribe the voice and then translate the text to the target language.

Whisper model

This demo can run on an older GPU like NVIDIA GeForce GTX 960. But we still introduce it in AWS-G5 as a standard baseline. You may choose as you like.

Environment setup

Install CUDA

First, make sure you've installed the NVIDIA driver and CUDA Toolkit according to the Prepare the CUDA environment in AWS G5 instances undert Ubuntu 24.04 article.

Prepare the environment

sudo apt install portaudio19-dev virtualenv

git clone https://github.com/hardenedlinux/hard-voice.git

cd hard-voice
virtualenv .local
source .local/bin/activate

pip install -r requirements.txt

Configure

You can configure by modifying these lines:

Select model size

# Select from the following models: "tiny", "base", "small", "medium", "large"
model = whisper.load_model("small")

In our test, small is good enough.

Select target transcription language

No, you don't need to specify transcription language. Whisper model will detect it automatically. Say, Whisper model know what language you are speaking. Amazing, huh?

Select target translation language

    options = {"fp16": False, "language": None, "task": "translate"}

If you set it to None, the Whisper model will detect it as English in default.

Run the demo

python run.py

Then open your browser and visit http://localhost:7860.

How to use

First, click record button and say something. You can say various languages, say Chinese, Japanese...etc, and whisper model will detect and translate it automatically.

Then, click transcribe button to get the the text you've just said.

Whisper transcribe

Finally, click translate button to get the translation.

Whisper translate