Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How increase the number of max speakers? #42

Open
Razatator opened this issue Apr 27, 2024 · 14 comments
Open

How increase the number of max speakers? #42

Razatator opened this issue Apr 27, 2024 · 14 comments
Assignees

Comments

@Razatator
Copy link

Razatator commented Apr 27, 2024

I tried to increase the number of speakers to at least 8 but I end up with error messages such as : "NameError: name 'model_voice_path08' is not defined. Did you mean: 'model_voice_path00'?"
I modified three folders but this is obviously not enough. How to have 8 speakers ?
Modif.zip
Thanks

Ps : In app_rvc.Py look line 307 and 1409 = "auto" compute mode I find this in wisperX documentation

@R3gm
Copy link
Owner

R3gm commented Apr 28, 2024

Hi there! It seems like model_voice_path08 is missing in line 2007 of app_rvc.py, which might be causing the problem. I'm currently making some changes to improve how these variables are handled. Feel free to check out the latest version in the dev_24_3 branch https://github.com/R3gm/SoniTranslate/tree/dev_24_3

@Razatator
Copy link
Author

Razatator commented Apr 28, 2024

I Roger. Thanks for your support. 👍 I will try it... . It's not working after I updated for the new version. "error out of range" The new app_rvc are modified and its specified for " i in range(6): # Loop from 00 to 05" line 2184. I modified all +3, same in max TTS speaker but it's isnt working.
Capture d’écran du 2024-04-28 18-06-26

Capture d’écran du 2024-04-28 18-19-39

I will retry later with the old configuration if its work. 8 speakers are useful for not cut the videos in multiple parts.

I will encounter a problem with the new untouched app_rvc line 295. I fix it "self.vci = ClassVoices() #(only_cpu=cpu_mode)" and it's not working after in gradio. I will return on old configuration and retry later. Probably I do restart after ? GPT suck for control "disinformation"

R3gm added a commit that referenced this issue Apr 30, 2024
@R3gm
Copy link
Owner

R3gm commented Apr 30, 2024

Hi @Razatator,
Thanks for your insights. I've made the necessary changes based on your suggestions. I appreciate your help in identifying those missing variables. With your guidance, I've updated the code to include 12 speakers in the dev branch.

@Razatator
Copy link
Author

Hi Roger Thanks more, its working now. 💯
I've forgotten to upgrade all the packages in the dev branch. I do try in long vidéo and more speakers. I put "auto" instead of float 16, and this chooses the optimal mode I think. I had lots of errors linked to the fact that I hadn't seen that there were more packages, this allowed me to see that you changed a lot of things
Cuda fail again in largeV3 but it worked in old linux kernel (some times)
For your requirements you specify whisperx Mr Bain git branch, but now its working with pip install.
Dans la version precedente du logiciel j'ai rencontré un probleme tres chiant avec pyannnote audio 3.1, il inverse systematiquement l'ordre des locuteurs contrairement à la version 2.1. Souvent obligé de tout réencoder et dérégler les locuteurs. Je ne l'ai pas encore éssayé sur la nouvelle branche.
Quand on traduit vers une langue ce serait bien que le choix des tts se mette automatiquement dans la zone du pays traduit. (j'ai vu qu'ils étaient définis dans language et app_rvc suivant le pays).
In the previous version of the software I encountered a very annoying problem with pyannnote audio 3.1, it systematically reverses the order of the speakers unlike version 2.1. Often forced to re-encode everything and disrupt the speakers. I haven't tried it on the new branch yet.
When translating into a language it would be good if the choice of tts was automatically placed in the zone of the translated country. (I saw that they were defined in language and app_rvc depending on the country).
In document-to-text translation. This is indeed the translation, but in plain text. It would be great to be able to keep the layout, images and page numbers.
In document to audio translation. You should be able to choose to cut into segments by specifying the chapter pages. (I sent a 200 page document and it gave me an 8 hour file).
The best would be to have an option that makes a video file of the translation of the document showing the text laid out on the screen. This would be great for presenting books where there are illustrations by choosing the text and illustrations or only the illustrations. (we can always dream, can’t we).
In the output of the translated document name it would have been better to have the same output name as 'sample' as for the subtitles it becomes quite difficult to search for the correct one when they are all called sub.trad or sub.ori . have the name of the original document with the mention sub.ori or trad.
I thank you for your work and I will try and get back to you if there are any errors in the console.

@R3gm
Copy link
Owner

R3gm commented May 3, 2024

Thanks, I'll be working on improving some of the features you mentioned

@R3gm R3gm self-assigned this May 3, 2024
@Razatator
Copy link
Author

I test the software, and it works faster than the previous one, much faster.
I have a problem burning subtitles since the last update, I pass them into Avidemux for that or VLC. I checked my internal packages and there doesn't seem to be anything missing. Just not the same version of gcc
Capture d’écran du 2024-05-03 23-10-28
Capture d’écran du 2024-05-04 10-41-56

In the imitation of voices by XTTS an error appears in terminal and not bloc gradio '[ERROR] Error: 'list' object has no attribute 'shape'
[WARNING] TTS auxiliary will be utilized rather than TTS: XTTS/AUTOMATIC.wav'
Capture d’écran du 2024-05-04 10-44-43

All work faster, its wonderfull. Thanks.
sonitr-razatator.zip

@R3gm
Copy link
Owner

R3gm commented May 4, 2024

Could you please run this command in the terminal to verify if half-precision is supported: python -c "import torch; torch.tensor([1.0, 2.0, 3.0], dtype=torch.float16, device='cuda')" ?

@Razatator
Copy link
Author

I Roger
My card not support float16 at full rate I read about in NVIDIA https://forums.developer.nvidia.com/t/fp16-support-on-gtx-1060-and-1080/53256
This card have compute capability of 6.1
When I ran this command nothing more happen
Capture d’écran du 2024-05-05 12-24-23
Capture d’écran du 2024-05-05 13-25-01
When the GPU is busy in "diarizing" it take around 2.4 2.9 GB VRAM of 6144mb
I try float 16 and gradio output =
"error
Requested float16 compute type, but the target device or backend do not support efficient float16 computation."

Support for me only int8 and int8_float32 who work good. Around one month off trying to run in float16, changing kernels and packages, cuda version. But its just Cuda and Nvidia who limit this for your paid a new card ! My card support float16 but Nvidia don't support my card! And one month to find the answer.

For a 17mb.mp4- 57 it and 4.50mn it Done in 4mn49 !
Capture d’écran du 2024-05-05 13-26-28
Now with the new version "Google batch" work all time!
Pyannote 3.1 are made the same mistake again, it reverse all time the speaker list, same in all versions of Sonitranslate.

Yesterday with Pyannote 2.1 he not reconize the woman for first speaking and reverse the order too. This happen some time. Only one solution for this is to cut in segment and traduce separatly. This happen when they have a sound or a music of intro or a bad sound, the diarizing make mistake and jump the speaker list. Normally I cut the intro but this time no solution.
The only solution would be to have in the application an integrated player with the possibility of putting a banner on the time that each speaker must be started to be recognized and assigned by their SK number.

In "accelerate audio max" change in app_rvc.py for 1.3 max, 1.9 its just crazy in output. I haven't tested Overlaps Reduction but its a good idea.

I will change my computer, just, it's overpriced for a very good laptop with 16GB VRAM. Another starting computer have RTX4060 8GB, but I think it's not enought for the future. So I'm saving and in the meantime I'm optimizing, but I'm happy to be able to use your software with my computer and to test the 'minimal configuration'.

See you later

@Razatator
Copy link
Author

I Roger I tested the new package update and it's work fine. Thanks for the subtitle name and "auto" mode. I ordered a new MSI Laptop with a 4090 and an I9, ten days to wait. I am now testing dubbing using xtts_automatic with Openvoice because Freevc often has hallucinations. The result in French is very good. And I like Xtts which gives me the ability to save voice models. Because I found voice models in French on Hugging-Face but I haven't yet fully understood how to install them. Neither found a way to test them first. When I have some time I will shearch about. See you and Thanks for yyour support. Tell me if you want me to test any particular functions.

@Razatator
Copy link
Author

I Roger The conversion PDF to video return an Error and burn subtitle in video translation too (but I don't care, I can burn in another)
" Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0
Conversion failed! "
I verified all my installed package. ffmpeg is good to, all lib extra. But again this error
I don't know how to fix it. In forums they explain a lot for nothing fixed. I removed cache, restart and again and again.
I verified all requirements and its good too, return gradio 0.25 to 0.19, but nothing. I've updated all new dev.24_3 and requirements. resize to 420p, change boder parameters, but same issue. after tts an error.
Capture d’écran du 2024-05-11 17-07-24
Capture d’écran du 2024-05-11 19-02-37
Thanks in advance

@R3gm
Copy link
Owner

R3gm commented May 11, 2024

I've made adjustments to avoid using ffmpeg, hopefully, this resolves any further issues. I haven't conducted extensive testing for videobooks yet

@Razatator
Copy link
Author

Razatator commented May 12, 2024

Thank I will try it, I showed this issue in f and another, they say its a font file do to be specified or libraries not installed .
I make conda activate base conda update ffmpeg
conda activate sonitr conda update ffmpeg and it install some more peripheral packages will pip not installed. Burning subtitles work again too.
Capture d’écran du 2024-05-12 07-52-25
Capture d’écran du 2024-05-12 07-58-41
It's working 👍 but not screen reader type
original
Capture d’écran du 2024-05-12 08-21-27
exit
Capture d’écran du 2024-05-12 08-10-43
This will be for a future update, thank you for your support. See you later

@R3gm
Copy link
Owner

R3gm commented May 16, 2024

For now, videobook is capable of showcasing the PDF images on the screen for each page.

@Razatator
Copy link
Author

Hello Roger, I've seen my problem is an unstable anaconda. He haved bug and I do reinstall conda and my env. Same the problem with fffmpeg are a internal conda problem.
I received my new Raider GE68, and a 4090 is a game changer, by putting on int8_float16 batch 32 and large V3 model, it has no 'cuda run out of memory' problem on pop_os. In addition, the program goes much faster.
I haven't tried the videobooks yet. But I again have a problem burning the subtitles. I don't know which librarie I'm missing, I've read about more, I'll eventually get there directly.
I installed dev-24.3 directly without going through main branch (git clone --single-branch --branch dev_24_3 https://github.com/r3gm/SoniTranslate.git). After a few adjustments it went well on cuda 12.1
sonitr-12_1.zip
See you later
They released a new version of pyannote.audio 3.20, 3.11 works well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants