Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can this model be used for the Generative Question Answering? #197

Open
AayushSameerShah opened this issue Mar 6, 2023 · 10 comments
Open

Comments

@AayushSameerShah
Copy link

I am looking for this model to fine-tune my own data (such as medical science) and after the training, I want it to be able to answer the questions. Then I am not looking for the "extractive answers" where it returns the start and end sequence (which is pretty much related to the given context scenario) but a "generative case" where I train the model with my data and then answer the question, and from its (the model's) own understanding for my data, it should be able to give me the answers.

Please let me know if anybody knows how to achieve that with this model!
Thank you so much 🤗

@Coriana
Copy link

Coriana commented Mar 6, 2023

It sounds like you are looking for something like https://github.com/sanjeevanahilan/nanoChatGPT
I am currently also trying to do similar, but have yet to even get data gathering working right, let alone working with it.

@arivero
Copy link

arivero commented Mar 7, 2023

Prompt / Completion tasks are usually trained with a 0.01 factor in the loss for the prompt. At least this is the default in openAI API. I do not see such parameter in the fine tuning here.

@AayushSameerShah
Copy link
Author

@arivero so you mean this can not be fine-tuned for the Question Answering?

@arivero
Copy link

arivero commented Mar 8, 2023

@AayushSameerShah my guess is that the loss function must be customised to decide how to evaluate the prediction of prompt tokens.

@AayushSameerShah
Copy link
Author

AayushSameerShah commented Mar 9, 2023

@arivero Thanks for the response, I have followed some of the threads in this library but now I am thinking to shift on the huggingface. I can understand I am taking the discussion out of the context of this thread, but please pardon me if I do so.

I am less experienced with huggingface transformers. But from my findings, I have observed that it provides 2 types of pipelines which can be helpful in my case:

  • text-generation
  • question-answering

The text-generation simply writes the text from the prompt like: "When I was 12 I went to" and the rest is filled by model. There is no question answering. Even if I question something in the prompt it continues the question instead of answering it. That makes sense.

Then there is the question-answering pipeline which takes 2 inputs: 1st Question and 2nd Context. Now based on these two it "extracts" the answer from the context.

This seems to be working but it fails in generalization because it "extracts" the answer and does not generate it. Additionally, we need to give the context with the question to get the answer, which is not intuitive.


What I am asking for is...

If there are models which can be fine-tuned on some specific dataset (say medical) and then can answer the question by themselves. I am not sure once fine-tuned, it requires the context to be given or not, but either way, it should return some response to the question by generating it. Like how "DaVinci" does for example in the notebooks.

Do you have any idea how can I move forward with this? I have found that these models on huggingface to workwith because they seem promissing:

  • GPT-Neo-125M
  • GPT-Neo-1.3B
  • GPT-J-6B

Can be my open-source mates and I can go forward with them, but don't know how to solve my problem with these models and how to get the training done.

It will really be a huge help from you buddy,
Thanks a lot 🤗

@timothylimyl
Copy link

hi @AayushSameerShah , seems like you want to kind of replicate what BioGPT is doing. You can check out their paper:

https://arxiv.org/pdf/2210.10341

@AayushSameerShah
Copy link
Author

Hello @timothylimyl
Thank you so much! I have found the direction with your answer. I have checked out the BioGPT but after further research, I could see that instead of "training" the model with specific data, I need to "retrieve" the data based on the question, and then I will generate the answer.

So it is the GenerativeQA approach. It is to use "haystack". This framework has many QA pipelines. And I am interested in 2 of them.

  1. RAG Pipeline (RAG approach)
  2. Seq2seq pipeline (LFQA approach)

Where I have found that the RAG generates small (one-liner answers) while LFQA is giving answers in the passage. Which is really something I was looking for.

So, on giving a bunch of different documents and storing them in the data storage, the pipeline retrieves the documents from the question embeddings and the reader reads those documents and then generates the answers.

Though the answer quality isn't on the level of GPT-3 that can work pretty well.

--

Now I have another query. Along with my unstructured data (wiki pages, blogs ..) I also want to feed the structured data in a tabular fashion.

But there this LFQA pipeline fails because it is unable to find meaning!

For that Haystack has a "TableReader" which can take tables as inputs, My hopes rose! But when I tried that, it returns a single word or "extractive" response. Such as on asking "Which country won the highest number of medals in the Olympics 2022?" it returns "USA".

I am looking for a generative response here. With some explanation.

Is it possible? Please direct.

If I summarise my whole situation, then I am looking for a "generative way of answering" where I should be able to put the unstructured+ structured data as the context and then on querying, the model should generate some answer.

Thanks 🙏

@timothylimyl
Copy link

seems like a prompt engineering problem, you can try out different instruction prompts for starters

@laurentm255
Copy link

Hi @AayushSameerShah : I've just stumbled upon this page since I am trying to do exactly what you're describing )

We're now 5 months ahead :

  • did you stick with Haystack's Long-Form Question Answering (LFQA) pipeline ?
  • Which LLM did you use with it, in order to have a "generative" QA talkative enough ?
  • did you fine tune this LLM ? did you use LoRA ?
  • I guess you used a document retriever / reader for the LFQA to work : interested to you have your feedback also ! :-)

thanks very much for your help ;)
Regards, Laurent (France).

@AayushSameerShah
Copy link
Author

AayushSameerShah commented Jul 26, 2023

Hie @laurentm255 👋,
Actually, I've got an opportunity to explore a LOT in this field. Have faced many problems, failed trainings and found ways to get out of this maze.

Luckily, I have provided a comprehensive response by clicking my response link where I have tried to explore the available options that we currently have by giving examples.

Hopefully, that might give a bit of direction.
Thank you, please let me know if there is anything fuzzy.
🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants