Skip to content

Off by 1 error in Tokenizer #5043

Discussion options

You must be logged in to vote

Hi, the tokenizer returns a Doc object rather than just a list of tokens. You can inspect the tokens like this and see that there are 6:

doc = tokenizer(s)
print([t.text for t in doc])
# ['Hello', 'world', ',', 'I', 'am', 'Zaf']

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / tokenizer Feature: Tokenizer
2 participants
Converted from issue

This discussion was converted from issue #5043 on December 11, 2020 00:44.