Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v.0.0.6 XTTS Generation works as it should but VoiceCraft has several issues #7

Open
Bookwald opened this issue Apr 21, 2024 · 7 comments

Comments

@Bookwald
Copy link

As of v.0.0.6, VoiceCraft produced audio that repeats lines and uses two voices. VoiceCraft also doesn't output all the sentences but rather the first few. Generation time is very long. I've seen it eat up 24GB VRAM and an additional 70GB RAM only to output 37 seconds of audio after 7 minutes of generation time.

Sample: https://sndup.net/v4x5/

@lukaszliniewicz
Copy link
Owner

lukaszliniewicz commented Apr 21, 2024

Yes, VoiceCraft is far from perfect. Voice cloning is superior to XTTS, but it's not reliable - sometimes it doesn't generate the whole text it was given or drastically changes the pitch, basically hallucinates. I think it was made primarily with speech editing in mind, not TTS per se. Their TTS model is very new and they will hopefully publish a better one soon. It's usable for generating relatively short fragments and manually regenerating sentences until it produces something decent. I will expose more settings in the GUI, perhaps you can improve the results by playing with them. As for VRAM/RAM consumption and speed, I'm afraid I cannot do anything about it now until the authors release a new model or implement something like deepspeed. How many seconds of the sample were you using? Try playing with this - from 3 to 12, different lengths can work well for different voice samples.

@lukaszliniewicz
Copy link
Owner

I can see they uploaded new TTS models yesterday, but I can't find the actual files. Will add them when I do. I should be able to expose the additional API settings in the GUI today. But I think for longer generations I'd recommend using XTTS with RVC for best voice cloning results.

@lukaszliniewicz
Copy link
Owner

lukaszliniewicz commented Apr 22, 2024

I added the other parameters to the GUI under "Advanced settings". Please try disabling the cache (set the value from 1 to 0, it drastically reduced VRAM usage for me) and play a little with the others (try setting "stop repetition" to 2, for example, or possibly "sample batch size" to 2). I will update the API to use the newer models tomorrow and enable model selection in the GUI.

@Bookwald
Copy link
Author

Thanks for working on this. I'm looking forward to trying out the newer models. I'll test disabled cache and repetition.

@lukaszliniewicz
Copy link
Owner

lukaszliniewicz commented Apr 23, 2024

I've added model selection to the GUI. Please try both the 330M model and the larger 830M model. Also, the cache is now cleared after every generation, which was an update to the original code that I overlooked. Perhaps this will solve the issue with excessive VRAM usage without disabling cache.

@Bookwald
Copy link
Author

With VoiceCraft I still had to disable cache to reduce VRAM usage. I'm getting a repeat of the first line of my reference audio at the beginning of each sentence of my input text. However, I'm finding XTTS to work excellent with RVC on top. XTTS captures the way the person speaks and RVC gives the texture of the voice back.

@lukaszliniewicz
Copy link
Owner

lukaszliniewicz commented Apr 24, 2024

I tested it today on a rented vm with a 3090 and it used about 12GB of VRAM (with cache on). There were no instances of reference audio in the generations and generally the quality was... decent, though I still think it's easier to generate long texts with XTTS (fewer regenerations are needed). Have you updated / reinstalled VoiceCraft API? Anyway, I'm glad that XTTS + RVC works well for you :)
PS. I generated this today using VoiceCraft (no regenerated sentences, took about 9m on a 3090, I don't remember which model it was): http://sndup.net/p4q9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants