Skip to content

Commit

Permalink
Merge pull request #164 from BelenSantamaria/encoding-flashtext
Browse files Browse the repository at this point in the history
Encoding flashtext
  • Loading branch information
koaning authored Jan 31, 2022
2 parents ea92f35 + 01a3562 commit fe12dbc
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs/docs/extractors/flashtext.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ use the parameter `non_word_boundaries`
- **entity_name**: the name of the entity to attach to the message
- **case_sensitive**: whether to consider case when matching entities. `False` by default.
- **non_word_boundaries**: characters which shouldn't be considered word boundaries.
- **encoding**: the name of the encoding used to read the lookup text file.

## Base Usage

Expand Down
5 changes: 4 additions & 1 deletion rasa_nlu_examples/extractors/flashtext_entity_extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ def get_default_config() -> Dict[Text, Any]:
"non_word_boundaries": "",
"path": None,
"entity_name": None,
"encoding": None,
}

def __init__(
Expand All @@ -63,7 +64,9 @@ def __init__(
)
for non_word_boundary in config["non_word_boundaries"]:
self.keyword_processor.add_non_word_boundary(non_word_boundary)
words = pathlib.Path(self.path).read_text().split("\n")
words = (
pathlib.Path(self.path).read_text(encoding=config["encoding"]).split("\n")
)
if len(words) == 0:
rasa.shared.utils.io.raise_warning(
f"No words found in the {pathlib.Path(self.path)} file."
Expand Down

0 comments on commit fe12dbc

Please sign in to comment.