Use Whisper / Whisper.cpp for voice recognition #416

MathiasSchindler · 2023-01-26T11:33:29Z

It would be nice to use whisper instead of vosk for the speech recognition on the server part, as it current seems to outperform other models in terms of quality of speech recognition.

pajowu · 2023-01-26T11:37:46Z

I agree, integrating whisper would be nice. I already played around with options for that a bit, something like https://github.com/jianfch/stable-ts or a python wrapper for https://github.com/ggerganov/whisper.cpp (ggerganov/whisper.cpp#9 or https://github.com/o4dev/whispercpp.py) would be needed to get word-level timestamps.

Note: I think we should add this, but not replace vosk with it, as vosk has much lower inference times and therefore is especially useful on slower machines

anuejn · 2023-01-26T14:00:03Z

some more thoughts on this: how do we do this in a performant and cross platform way? Sure whisper.cpp would be one option but it would also be cool to use something like tvm for general purpose gpu support and coreML for apple platform accelerators. Is there any ready made abstraction over these? would we need to invent something new (would that be too much work, etc...).

clstaudt · 2023-03-06T14:57:05Z

I recently used whisperX to transcribe some interviews. I believe the large model and perhaps even the medium model would perform significantly better than the current transcription. Inference times are a factor though - with GPU support I was able to transcribe at 7x speed with the large model.

pajowu · 2023-03-06T15:20:46Z

We are currently working on something similar for the transcribee project. I'm not sure we have the time right now to integrate it into audapolis, but once we have a working solution for transcribee, it should be relatively simple to integrate it into audapolis (however we might run into some problems with packaging this in a reliable cross-plattform way)

clstaudt · 2023-03-06T17:37:22Z

@pajowu What is the mission of transcribee? How is it different from audapolis? Please add a Readme. :)

pajowu · 2023-03-06T17:39:51Z

While with audapolis the focus was on editing multimedia and transcription was only a by-product, transcribee focusses fully on transcription. We only started working on it last week and will add a proper readme soon. Until then, you can have a look at the project description on the prototypefund website

clstaudt · 2023-03-06T17:44:17Z

@pajowu Nice. I'm very interested as an ML engineer and podcaster. Please add some "help wanted" and "good first issue" tickets soon, I'd love to contribute.

anuejn added the enhancement New feature or request label Feb 12, 2023

ThiloteE mentioned this issue Mar 6, 2024

🪩 add whisper support #467

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Whisper / Whisper.cpp for voice recognition #416

Use Whisper / Whisper.cpp for voice recognition #416

MathiasSchindler commented Jan 26, 2023

pajowu commented Jan 26, 2023

anuejn commented Jan 26, 2023

clstaudt commented Mar 6, 2023 •

edited

Loading

pajowu commented Mar 6, 2023

clstaudt commented Mar 6, 2023

pajowu commented Mar 6, 2023

clstaudt commented Mar 6, 2023

Use Whisper / Whisper.cpp for voice recognition #416

Use Whisper / Whisper.cpp for voice recognition #416

Comments

MathiasSchindler commented Jan 26, 2023

pajowu commented Jan 26, 2023

anuejn commented Jan 26, 2023

clstaudt commented Mar 6, 2023 • edited Loading

pajowu commented Mar 6, 2023

clstaudt commented Mar 6, 2023

pajowu commented Mar 6, 2023

clstaudt commented Mar 6, 2023

clstaudt commented Mar 6, 2023 •

edited

Loading