Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge with konele: change to RecognitionService #1

Open
Felicis opened this issue Feb 23, 2021 · 8 comments
Open

merge with konele: change to RecognitionService #1

Felicis opened this issue Feb 23, 2021 · 8 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@Felicis
Copy link
Owner

Felicis commented Feb 23, 2021

the current UI is crap. changing this to/adding a RecognitionService would make it possible to use it from konele (alphacep#126 (comment)).

@Felicis Felicis changed the title change to change to RecognitionService Feb 23, 2021
@Felicis Felicis added enhancement New feature or request help wanted Extra attention is needed labels Feb 23, 2021
@Felicis Felicis self-assigned this Feb 23, 2021
@Felicis Felicis changed the title change to RecognitionService merge with konele: change to RecognitionService Feb 23, 2021
@Felicis
Copy link
Owner Author

Felicis commented Feb 23, 2021

starting INFO on how to implement:

Kaljurand/K6nele#68 (comment):

The 2 services included with Kõnele use a cloud server, i.e. you need to be online. However, you can use Kõnele as a front-end to any service that implements Android's SpeechRecognizer API (https://developer.android.com/reference/android/speech/SpeechRecognizer), incl. offline services. Other than that there is no special support for offline speech recognition.

Kaljurand/K6nele#38 (reply in thread):

An example of an app that implements the RecognizerService-interface is https://github.com/Kaljurand/K6nele-service. You can use it as it is, and just override the "Server URL" to point to your server (that is based on https://github.com/alumae/kaldi-gstreamer-server ) and specify the supported locales in the "Server locales" field.

Or you can modify this app to wrap your own Android recognizer, whether offline or cloud-based.

Kaljurand/K6nele#63 (comment):

The app needs to implement the service and declare it in the manifest (see e.g. https://github.com/Kaljurand/K6nele-service/blob/8a5ad02e19eb943736e02b44131a2c2bb34351b0/app/src/main/AndroidManifest.xml#L75), then Kõnele (or any other app) can find it.

Kaljurand/K6nele#38 (comment):

implement the RecognitionService-interface for an existing speech recognition server (or even offline service). This implementation would be a separate app, i.e. no changes to Kõnele are necessary: the service and its supported languages simply show up in the "Recognition languages & services" list allowing you to use them in the Kõnele speech keyboard and voice search panel.

@Felicis Felicis added this to the merge with konele milestone Feb 23, 2021
@khlsvr
Copy link

khlsvr commented Mar 11, 2021

Is the pre-release from 7 days ago a merge with kõnele, or an improvement to your earlier implementation?

I saw you mention a 5 seconds delay when opening the keyboard. Is this temporary? I have been using the k6nele a little bit with estonian and there is no load time, but then again the model is not on the phone. Can't it be loaded to memory or something if I need to access the keyboard again after a while? I mean I use speech recognition quite much, but now I have been stuck with my fingers more for a few months. The 5s delay could be a deal breaker not sure. My use case is that I am out a lot, and especially now during the winter I prefer to send short messages with speech-to-text. I have actually had to learn couple simple estonian phrases and teach them to couple acquaintances so I can query them quickly in estonian lol. Not sure if they are proper estonian but couple common ones I have used "kas sa oled väljas?" "kuhu läksid?" "kuhu sa lähed? and "kus sa oled?" It recognizes them quite ok, except "kus" it confuses with "kuus" returning "6" quite often :) Disclaimer: Estonian is quite close to finnish (which we use).

btw is there a way to ping kaljurand and nshmyrev if they know something about the 5sec delay? It seems I can only mention you Felicis with @ symbol here since I guess you are the only one having talked on this repo

@Felicis
Copy link
Owner Author

Felicis commented Mar 11, 2021

The pre-release is just a binary for the earlier implementation, so nothing new added yet (I uploaded it mostly as a proof of concept for those who don't want to compile it themselves). And I have yet to start the merge with k6nele, but that's the next planned step.

I totally agree that this 5 second delay might be a deal breaker and has to be reduced somehow. I think, too, that it's the time it takes to load the model to ram and activate it, but I don't know how to fix it yet. I will have to dig a little into the vosk code and see what I find there... 🕳️

Honestly I don't know how to ping them here, but you can open an issue for the 5 seconds on the main vosk-android-demo project, where nshmyrev should see it could provide some insight. I wouldn't ping kaljurand for this, since this is an offline ASR problem and k6nele is currently only about online speech recognition.

@khlsvr
Copy link

khlsvr commented Mar 26, 2021

I have been too lazy to not even try your proof of concept version.. until now, and man I should have tried it earlier! With the english version I didn't notice any 5s lag, it was maybe 1 second. I will have a field test tomorrow and see if there is more lag. I will be testing on a Xiaomi Redmi note 7 with 6GB ram, Lineage OS 17.1 without gapps. Also the UI may be actually better than using the k6nele, as there is a quick access to couple useful characters, or at least two, the . and ?. The <space> key would be useful too. If you end a sentence and start a new one, it will continue without adding a space there. I would add the <space> somewhere.

I wonder how hard it is to train a language model. I can do speech-to-text for some of my acquintances in english, but for some I would want finnish.

@Kaljurand
Copy link

Hi, some comments to the previous discussion:

  • maybe it's possible to ping me now, now that I've written this comment.

  • I don't know how to reduce the loading delay, but e.g. the Catalan models from https://github.com/ccoreilly/LocalSTT/releases/tag/2020-12-03 also load in just 1 sec (on Pixel 2). Note that the Kõnele mic button does not light up properly when using the https://github.com/ccoreilly/LocalSTT service, but this will be improved in the next version (of Kõnele).

  • the Kõnele IME actually tries to offer a configurable touch/buttons solution (only documented in Estonian: http://kaljurand.github.io/K6nele/docs/et/user_guide.html#lausung-kui-nupuvajutus)

    • some buttons/touches are hard-coded (newline button, last char delete button, double tap adds a space)
    • the mic button swipes can be defined by the user, e.g. right swipe adds a period if you want
    • you can define your own buttons panel, for e.g. punctuation, but also for arbitrary outputs like "where are you?" (which is perhaps simpler than having to learn to say "kus sa oled?", and teaching its meaning to all your friends ;))

The buttons panel is a bit clumsy to use though: it's fixed to being a 3-column grid (in portrait mode), it takes too much vertical space, and speech is entered by long-pressing the panel switching button (lower right corner).

@khlsvr
Copy link

khlsvr commented Apr 5, 2021

Thanks for bringing LocalSTT into my attention. I ended up on coqui-ai/STT Matrix room where I received a ton of help getting the Finnish DeepSpeech models for LocalSTT. Big thanks especially to @ftyers who laid everything in front of me to be able to learn and proceed on eahc step. He has an interesting project going on by the way. He's creating acoustic models of something like 40 different languages right now. Anyway, I, or we finally got the Finnish done, but the results are not satisfactory. :) Apparently DeepSpeech may be a better option than kaldi/vosk in a highly inflected language like Finnish, and the process to get the models with the help of coqui fellas seemed straight forward so I went for it. However the Mozilla project has only 1 hour of Common Voice data on Finnish, so probably the acoustic model is the bottleneck why it doesn't work nearly aswell as the English kaldi/vosk model available here.

I'm not gonna quit here. I will attempt recording my own voice for another hour perhaps (or will see how tedious it will become) and try create an improved acoustic model a bit later. For the language model I used Opensubtitles Finnish subs + Finnish set from the European Parliament. That was probably a bit of an overkill. The language model file kenlm.scorer file ended up 216M in size while the acoustic model file model.tflite 46M. In case someone is interested the files are available on @ftyers's repo which he gave permission to share the link. There one can find all the acoustic models he's setting up from Common Voice data https://tepozcatl.omnilingo.cc/

Even with almost 300M sized apk, there was no significant loading time/lag using it from the kõnele app.

Now after you mentioned I noticed these swipe gesture possibilities in kõnele, which I may try tweak if I get the Finnish model understand me better one day.

As for this app here, after using it more now, I have a few improvements in mind for it, if it's to be improved :) Well, just the backspace key to delete letters is very slow and it's needed quite often after all. It would be nice to be able to delete like 5 characters at a time using another button. You can delete like that with google's STT btw. And yeah, a space key could be also useful. I forgot what else, there might have been something but I forgot

@Felicis
Copy link
Owner Author

Felicis commented Jun 30, 2021

Hi all,
I've been wanting to get started on this, but things keep getting in the way. I don't know when I'll find the time and energy to work on this. I anyone wants to start on it, feel free to do so. I will be happy to build on it / continue with it.
Cheers to y'all

@Tombstone2K
Copy link

Any update on this?
@nshmyrev @Felicis
You can make VOSK into a RecognitionService.....so any app which supports RecognitionService can use VOSK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants