Skip to content

hiAndrewQuinn/finstem

Repository files navigation

finstem - simple tool for command-line Finnish stemming

Stems Finnish words. Takes any kinds of words you can throw at it. Even has its own tiny REPL!

image

📹 Video - silent install, 2023.12.07

output-fast.webm

The above is 10x to give a feel for how the commands work.

Normal-speed video: https://youtu.be/85qwsrGdwZs

Normal-speed video:

Quickstart

On Ubuntu 22.04

Tested on a totally fresh Vagrant install of Ubuntu 22.04. You probably already have some or all of these installed.

# Install the prerequisites
yes | sudo apt update
yes | sudo apt install pip python-is-python3
yes | sudo apt install voikko-fi python-libvoikko python3-click

# clone the repo and run the command!
git clone https://github.com/hiAndrewQuinn/finstem
cd finstem

python finstem.py --help
python finstem.py 'Näin' 'tervetuloa' 'kiltti' 'kissa' 'Nimeni' 'on' 'Jeff'

For scripters

finstem supports (experimental) CSV, TSV and JSON formats.

CSV format example

python finstem.py 'Näin' 'tervetuloa' 'kiltti' 'kissa' '.' 'Nimeni' 'on' 'Jeff' --format CSV | csvlook

image

TSV format example

python finstem.py 'Näin' 'tervetuloa' 'kiltti' 'kissa' '.' 'Nimeni' 'on' 'Jeff' --format TSV | awk '{print $3 " <~> " $2 " <~> " $1}'

image

JSON format example

python finstem.py 'hyvää' 'huomenta' --format JSON | \
while IFS= read -r line; do
    echo "$line" | jq .
done

image

Use with caution. I haven't used proper libraries for these yet.

Advanced

Passing a list of words in a text file

echo 'sana' > words.txt
echo 'vaimonille' >> words.txt
echo 'kirjoja' >> words.txt

# Pass each line as an argument to finstem.py
cat words.txt | xargs -n 1 python finstem.py

image

Interactive mode

Requires fzf.

echo '' | fzf --print-query \
	--preview-window='bottom:50%' \
	--preview "echo {q} | tr ' ' '\n' | xargs -I _ python finstem.py _" \
	--bind "enter:execute(echo {q} | tr ' ' '\n' | xargs -I _ python finstem.py _)+abort"

If you don't feel like typing out all that, just run finstem-interactive.sh.

For use with finfreq10k when reading a book

finfreq10k is an Anki deck containing the 10,0000 most common Anki words in order, made by yours truly. Using it in combination with finstem creates a powerful way to target your vocabulary practice to the words you have actually read that day.

Other screenshots

image

image

About

Simple Finnish stem word finder

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published