Skip to content

TexTa: a free text tagger that extracts contextual tags

Notifications You must be signed in to change notification settings

biavarone/free_text_tagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TexTa

TexTa is a tagger that extracts contextual information from free text.

Given a free text, the script is able to extract information about 4 categories: activities, emotions, interactions and places. For each of these categories there is a dictionary, which contains a list of sub-categories.

Text given in input is parsed and then matched to the sub-categories by handwritten rules, which take into account syntactic information (lemmas, Parts-Of-Speech, dependency structure, ...).

Requirements

Installing spaCy and needed models

  • Install spaCy via pip or your preferred method (see here for more details)

    pip install -U spacy

  • Download language model

    python spacy -m download en_core_web_sm

Input

  • text

[choose how to pass the text to the file and how to get the output]

Output

For each category returns a matches list containing:

  • a numeric id for the matched sub-category
  • a number that states the point in the sentence where the match starts
  • a number that states the point in the sentence where the match ends

e.g. "We're playing games" will return this output:

  • [(5133706519360878345, 2, 3), (5133706519360878345, 2, 4), (5133706519360878345, 3, 4)]

  • 5133706519360878345 is the id for the sub-category 'leisure'

  • 2,3 is the span for 'playing'

  • 2,4 is the span for 'playing games'

  • 3,4 is the span for 'games'

! notice that in the span interval, the first number is included, the second one is NOT included

About

TexTa: a free text tagger that extracts contextual tags

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages