Skip to content

Digital-Pushkin-Lab/Russian_frequency_lists

Repository files navigation

Russian_frequency_lists_for_children

This repository contains word lists that ave been created from several corpora.

Wordlist_Detcorpus_50000 is a list of 50 000 lemmas with their frequencies from DetCorpus - corpus of Russian literature for children, including more than 2,097 prose works written in Russian between the 1920s and 2010s and aimed at children and adolescents.

Wordlist_Detcorpus_nonfiction is a list of the 20 000 most frequent lemmas from the non-fiction subcorpus of DetCorpus.

Columns in the word lists

lemma is the normalized word forms, lemmatization made by Mystem analyzer. abs_frequency is the raw, absolute frequency value showing how many times lemma occurs in the corpus. ipm (items per million) is the normalized frequency value.

About

Collection of word lists with frequencies

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages