Skip to content

To Provide Cognitive search capability to search against medical agency databases like FDA and EMA via a natural language question and return relevant results in order to help with accelerating regulatory submissions for Eli Lilly

Notifications You must be signed in to change notification settings

prateeshreddy/Text-Summarization-Using-Advanced-NLP-Methods

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text Summarization Using Advanced NLP Methods

A part of this research has started by my team working for Eli Lilly as interns in Summer 2021 and further continued. For more information look into the presentation and contact me for further details

Lilly RegQuest :

To Provide Cognitive search capability to search against database like FDA and EMA (European medical agency) via a natural language question and return relevant results in order to help with accelerating regulatory submissions for Eli Lilly

RegQuest_Web

Types on Text Summarization :

Extractive Summarization: Rely on the existing text that has phrases, sentences to create a new summary. So, we need to identify key words or sentences of the existing text.

Abstractive Summarization: Uses advanced NLP techniques to generate an entirely new summary which will not contain phrases or sentences that exist in the original text. It is closer to what humans usually expect from text summarization. The process is to understand the original document and rephrase the document to a shorter text while capturing the key points.

Example: This type of summarization is what you do while trying to briefly explain about the book you read/ movie you saw to your friend.

Types_of_summ

Summarization Algorithms Implemented :

1. TextRank Algorithm : It is a extractive based summarization algorithm. Similarities between sentence vectors are then calculated and stored in a matrix. The similarity matrix is then converted into a graph, with sentences as vertices and similarity scores as edges, for sentence rank calculation. Finally, a certain number of top-ranked sentences form the final summary

2. BART Transformers : Bidirectional and Auto-Regressive Transformers is a abstractive summarization algorithm where BERT is a Bidirectional Transformer with a Masked Language Modelling uses seq2seq/machine translation architecture. GPT is a autoregressive model which uses left to right decoder.

3. T5 Transformers : It is a pre-trained abstractive summarization algorithm introduced by Google, which also uses Transformers along with the encode-decode approach.Using the model generated from T5ForConditionalGeneration, we create the summary IDs, which are further decoded to generate the summary of a particular text.

4. GPT-2 Algoritm : It is a abstractive summarization algorithm developed by OpenAI which stands for Generative Pre-trained Transformer which uses seq2seq algorithm. It uses masked self-attention and larger context and vocabulary size.

Our New Approach:

Text Summarization Algorithms are very recent cutting edge NLP research it has been dramatically improved after first introduction of attention models and transformer. Most of the research is still undergoing..

All of the Algorithms mentioned above pick the keywords from the question and search through huge data to form a corpus to then summarize the asked question.

One idea implemented by our team for Eli Lilly is Summarizing by using BART Question and Answer Library to get a corpus of sentences related to the question and then summarizing this corpus instead of the whole data. We believe this will give more accurate summary.

This Approach makes much more sense as it now tries to understand the question and search on relevant data just like a human. At the end of the day Isn't that what we are trying to achieve using Deep Learning ?

Limitations :

It tries to find words to get a best answer. If the best answer is wrong, it will still produce it. Summary Concentrates more on correctness than human readable. We still do not have concrete proof that this works and I wish to further pursue this research and learn at the same time contribute to the Open Source for the amazing Natural language Processing enthusiasts out here.

References:

About

To Provide Cognitive search capability to search against medical agency databases like FDA and EMA via a natural language question and return relevant results in order to help with accelerating regulatory submissions for Eli Lilly

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages