Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where to find French punctuation models #415

Open
Sircam19 opened this issue Jan 20, 2023 · 5 comments
Open

Where to find French punctuation models #415

Sircam19 opened this issue Jan 20, 2023 · 5 comments

Comments

@Sircam19
Copy link

Hello. I love what your team has created, it an amazingly impressive tool. What I have noticed was that in exporting subtitles (SRTs) all text is contained within one time duration. I then realised that there wasn't a punctuation model for French. Where could I find one? Or how could I fix punctuation. Your tool has so much potential and glad I came across it.

Hallo. Ich liebe, was Ihr Team geschaffen hat, es ist ein erstaunlich beeindruckendes Werkzeug. Was mir aufgefallen ist, ist, dass beim Exportieren von Untertiteln (SRTs) der gesamte Text in einer Zeitdauer enthalten ist. Dann habe ich festgestellt, dass es kein Interpunktionsmodell für Französisch gibt. Wo könnte ich eines finden? Oder wie könnte ich die Zeichensetzung korrigieren. Ihr Tool hat so viel Potenzial, und ich bin froh, dass ich es gefunden habe.

@Sircam19
Copy link
Author

If I have found additional punctuation models how can I add them directly into Audapolis. Where in the file structure of the app can additional punctuation models be placed.

@anuejn
Copy link
Member

anuejn commented Feb 12, 2023

Sadly using out-of-tree punctuation models is currently not supported. However, if you link us to the french punctuation model, we could add it / you can make a pull request

@Sircam19
Copy link
Author

Hello Anuejn. So happy to see progression on this tool as I think it is amazing and is SO full of potential. I am not a coder, but am trying to learn...So perhaps what I found, and will provide as a link, is not the way to go. However, from what I can determine alot of punctuation models are based on the Europarl project. I found a multilanguage model under the Oliverguhr language models available on hugging face. I wondered if this data set could be used to inform / support the language models within audapolis that are missing punctuation models. Here's the link --> https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large. Again perhaps I am off base as I am not a coder but very interested. Merci.

@anuejn
Copy link
Member

anuejn commented Feb 12, 2023

We are currently using the punctuator2 python library for punctuation reconstruction and would need a model for that - if it should be drop in. The model you linked uses a different library that would require additional work to integrate.

@Sircam19
Copy link
Author

Thanks anuejn. I knew it couldn't be that simple :-) I'll look around for what I can find and would like to help AND learn at the same time. Reiterating again, happy to help and really enjoy the Audapolis. It's amazing. Merci.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants