Skip to content

Commit

Permalink
fix flashtext docs
Browse files Browse the repository at this point in the history
  • Loading branch information
koaning committed Jan 27, 2022
1 parent 8a4fa88 commit b801dbf
Showing 1 changed file with 18 additions and 18 deletions.
36 changes: 18 additions & 18 deletions docs/docs/extractors/flashtext.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@
```

This entity extractor uses the [flashtext](https://flashtext.readthedocs.io/en/latest/) library
to extract entities using [lookup tables](https://rasa.com/docs/rasa/nlu-training-data#lookup-tables).
to extract entities.

This is similar to [RegexEntityExtractor](https://rasa.com/docs/rasa/components#regexentityextractor), but
different in a few ways:

1. `FlashTextEntityExtractor` takes only `lookups`, **not** regex patterns
1. `FlashTextEntityExtractor` uses token-matching to find entities, **not** regex patterns
2. `FlashTextEntityExtractor` matches using whitespace word boundaries. You cannot set it
to match words regardless of boundaries.
3. `FlashTextEntityExtractor` is *much* faster than `RegexEntityExtractor`. This is especially true
Expand All @@ -25,6 +25,8 @@ use the parameter `non_word_boundaries`

## Configurable Variables

- **path**: the path to the lookup text file
- **entity_name**: the name of the entity to attach to the message
- **case_sensitive**: whether to consider case when matching entities. `False` by default.
- **non_word_boundaries**: characters which shouldn't be considered word boundaries.

Expand All @@ -42,32 +44,30 @@ pipeline:
min_ngram: 1
max_ngram: 4
- name: rasa_nlu_examples.extractors.FlashTextEntityExtractor
case_sensitive: True
non_word_boundary:
- "_"
- ","
case_sensitive: False
path: path/to/file.txt
entity_name: country
- name: DIETClassifier
epochs: 100
```
You must include [lookup tables](https://rasa.com/docs/rasa/nlu-training-data#lookup-tables) in your NLU data. This
might look like:
You must include a plain text file that contains the tokens to detect.
Such a file might look like:
```yaml
nlu:
- lookup: country
examples: |
- Afghanistan
- Albania
- ...
- Zambia
- Zimbabwe
Afghanistan
Albania
...
Zambia
Zimbabwe
```
In this example, anytime a user's utterance contains an exact match for a country from the lookup table above,

In this example, anytime a user's utterance contains an exact match for a country,
`FlashTextEntityExtractor` will extract this as an entity with type `country`. You should include a few examples with
this entity in your intent data, like so:

```yaml
- intent: inform_home_country
examples: |
- I am from [Afghanistan](country)
- My family is from [Albania](country
- My family is from [Albania](country)
```

0 comments on commit b801dbf

Please sign in to comment.