Skip to content

Generate anki deck using wiktionary articles, translations using google, example sentences using GPT.

Notifications You must be signed in to change notification settings

sowcow/serbian-anki-deck

Repository files navigation

What

Auto-generated Anki deck with:

  • Serbian words from the open-subtitles 2018 frequency list (huge list but duplication due to word forms)
  • translations (using translate-shell tool, google service specifically)
  • Wiktionary articles for both scripts where present
  • fallback example sentences generated by GPT AI models

Translation goes before Wiktionary articles because it is shorter and more concentrated. For new words it makes sense to not rely on translation only and to check Wiktionary articles info. AI-generated example sentences should not be solely considered to be the source of truth too.

AI has been useful in filtering questionable words though (I processed only the first 5k words).

The idea of the deck is inspired by the existing "Serbo-Croatian Frequency Word List" deck while I have it way more usable and try to be as complete as possible.

The deck tries to be very informative so tablet could be the best way to use it.

But

It goes as is. I didn't verify the content manually since I'm a learner myself. Actually for this reason AI-produced example sentences go below of other content. Feel free to create an issue at https://github.com/sowcow/serbian-anki-deck/issues but keep in mind that I don't check these often.

Also quality-wise there is rare random unrelated Wiktionary content presence but it is not worth looking into.

How

It should be easy to adapt to other languages. If you need only Wiktionary articles based on words list then the first two and the last two dependencies should be enough. Also if you are reusing the cache (using the same Serbian word list, translations, examples but fetching updated Wiktionary articles) then only the first two and the last two are needed. The final two could be replaceable by some other tooling/library.

Install dependencies:

  • ruby-lanugage
  • run bundle install in the root of the repo (it probably needs ruby-dev package if it fails on this step)
  • translate-shell terminal tool
  • esc2html terminal tool
  • cligpt terminal tool (it needs OpenAI credential to run, free account lasts for a couple of thousand requests, not all countries credit is good for them for paid accounts) or chatgpt-wrapper tool (it's limitation now is that every 150 requests are followed by one hour break) (config.rb has examples of configuration for both options)
  • anki-cli-unofficial terminal tool
  • sqlite3 (terminal tool is used)

Configure most of things:

  • config.rb - constants, NOTE: current configuration does not fetch any new example sentences (only cache is used)
  • 2_compose_deck.rb - what you use and in what order

Run numerated ruby files in their order.

License

  • MIT License for code.
  • CC-by-sa-4.0 for used and generated content.