From b801dbf8c01f5537805721fa27b2ec472fe1f58a Mon Sep 17 00:00:00 2001 From: Vincent Date: Thu, 27 Jan 2022 12:20:31 +0100 Subject: [PATCH] fix flashtext docs --- docs/docs/extractors/flashtext.md | 36 +++++++++++++++---------------- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/docs/docs/extractors/flashtext.md b/docs/docs/extractors/flashtext.md index e0e08ea..24f600e 100644 --- a/docs/docs/extractors/flashtext.md +++ b/docs/docs/extractors/flashtext.md @@ -9,12 +9,12 @@ ``` This entity extractor uses the [flashtext](https://flashtext.readthedocs.io/en/latest/) library -to extract entities using [lookup tables](https://rasa.com/docs/rasa/nlu-training-data#lookup-tables). +to extract entities. This is similar to [RegexEntityExtractor](https://rasa.com/docs/rasa/components#regexentityextractor), but different in a few ways: -1. `FlashTextEntityExtractor` takes only `lookups`, **not** regex patterns +1. `FlashTextEntityExtractor` uses token-matching to find entities, **not** regex patterns 2. `FlashTextEntityExtractor` matches using whitespace word boundaries. You cannot set it to match words regardless of boundaries. 3. `FlashTextEntityExtractor` is *much* faster than `RegexEntityExtractor`. This is especially true @@ -25,6 +25,8 @@ use the parameter `non_word_boundaries` ## Configurable Variables +- **path**: the path to the lookup text file +- **entity_name**: the name of the entity to attach to the message - **case_sensitive**: whether to consider case when matching entities. `False` by default. - **non_word_boundaries**: characters which shouldn't be considered word boundaries. @@ -42,26 +44,24 @@ pipeline: min_ngram: 1 max_ngram: 4 - name: rasa_nlu_examples.extractors.FlashTextEntityExtractor - case_sensitive: True - non_word_boundary: - - "_" - - "," + case_sensitive: False + path: path/to/file.txt + entity_name: country - name: DIETClassifier epochs: 100 ``` -You must include [lookup tables](https://rasa.com/docs/rasa/nlu-training-data#lookup-tables) in your NLU data. This -might look like: +You must include a plain text file that contains the tokens to detect. +Such a file might look like: + ```yaml -nlu: -- lookup: country - examples: | - - Afghanistan - - Albania - - ... - - Zambia - - Zimbabwe +Afghanistan +Albania +... +Zambia +Zimbabwe ``` -In this example, anytime a user's utterance contains an exact match for a country from the lookup table above, + +In this example, anytime a user's utterance contains an exact match for a country, `FlashTextEntityExtractor` will extract this as an entity with type `country`. You should include a few examples with this entity in your intent data, like so: @@ -69,5 +69,5 @@ this entity in your intent data, like so: - intent: inform_home_country examples: | - I am from [Afghanistan](country) - - My family is from [Albania](country + - My family is from [Albania](country) ```