Skip to content

biobeats/free_text_tagger

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Free Text Tagger

This repository contains all scripts to extract contextual information from a free text.

Given a free text, the script is able to extract information about 4 categories: activities, emotions, interactions and places. For each of these categories there is a dictionary, which contains a list of sub-categories.

Text given in input is parsed and then matched to the sub-categories by handwritten rules, which take into account syntactic information (lemmas, Parts-Of-Speech, dependency structure, ...).

Requirements

  • Requires Python 3.x
  • Requires the following Python libraries:

Input

  • Text (string)

-- choose how to pass string to the main script --

Output

For each category returns a matches list containing:

  • a numeric id for the matched sub-category
  • a number that states the point in the sentence where the match starts
  • a number that states the point in the sentence where the match ends

e.g. "We're playing games" will return this output:

  • [(5133706519360878345, 2, 3), (5133706519360878345, 2, 4), (5133706519360878345, 3, 4)]

  • 5133706519360878345 is the id for the sub-category 'leisure'

  • 2,3 is the span for 'playing'

  • 2,4 is the span for 'playing games'

  • 3,4 is the span for 'games'

! notice that in the span interval, the first number is included, the second one is NOT included

About

Free text tagger with contextual tags

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%