Skip to content

Commit

Permalink
Add parameter encoding to flashtext
Browse files Browse the repository at this point in the history
  • Loading branch information
BelenSantamaria committed Jan 28, 2022
1 parent ea92f35 commit e770f6c
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs/docs/extractors/flashtext.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ use the parameter `non_word_boundaries`
- **entity_name**: the name of the entity to attach to the message
- **case_sensitive**: whether to consider case when matching entities. `False` by default.
- **non_word_boundaries**: characters which shouldn't be considered word boundaries.
- **encoding**: the name of the encoding used to read the lookup text file.

## Base Usage

Expand Down
4 changes: 3 additions & 1 deletion rasa_nlu_examples/extractors/flashtext_entity_extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ def get_default_config() -> Dict[Text, Any]:
"non_word_boundaries": "",
"path": None,
"entity_name": None,
"encoding": None,
}

def __init__(
Expand All @@ -61,9 +62,10 @@ def __init__(
self.keyword_processor = KeywordProcessor(
case_sensitive=config["case_sensitive"]
)
self.encoding = config.get("encoding")
for non_word_boundary in config["non_word_boundaries"]:
self.keyword_processor.add_non_word_boundary(non_word_boundary)
words = pathlib.Path(self.path).read_text().split("\n")
words = pathlib.Path(self.path).read_text(encoding=self.encoding).split("\n")
if len(words) == 0:
rasa.shared.utils.io.raise_warning(
f"No words found in the {pathlib.Path(self.path)} file."
Expand Down

0 comments on commit e770f6c

Please sign in to comment.