-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate InstructIR with MTEB #905
Comments
Hi @henilp105, this seems related to @orionw work on followIR. Generally the code does not allow datasets not available through huggingface. Huggingface however does a dataset script which essentially just fetches from GitHub. However, since it is available on GitHub under an open license I would just upload it to HF. |
Thanks @KennethEnevoldsen , I will refer the followIR PR, I will be uploading the dataset on hf and start the implementation of it. |
Amazing! 🙌 @hanseokOh following up on kaistAI/InstructIR#3 have you already started on the integration / maybe you can coordinate with @henilp105 ? |
This is exciting, thanks @henilp105! I think it'd be a great addition -- I wanted to add it myself but never had the time. @hanseokOh if you already started doing it, let me know and we can adjust these details! For the technical details to put this in MTEB:
Let me know if this makes sense @henilp105 and we can discuss further! |
Thanks @orionw these are great insights. I have uploaded the dataset to Hugging Face henilp105/InstructIR. I'll keep this thread updated and reach out if I encounter any roadblocks.
We would need to have a common naming for all the instructions across datasets as in followIR we have |
Great point @henilp105. I think something like |
Thanks, I think that 3 underscores would be good. Also Since multiple datasets may contain various subsets, not all of which are present in each dataset, how should we evaluate that? like |
Good question,
|
Sorry for being late 😂 Yes, as @Muennighoff and @orionw mentioned, I am trying to merge InstructIR dataset into MTEB repository and I also made hf dataset repo for it! Also, I agree that like what @orionw said, it would be good to integrate only main part (not including details for ablation, such as |
Thanks @hanseokOh, I was unable to find the dataset repo link on GitHub. I would be happy to assist with the integration part, and I also believe that integrating only the main part is the right approach. Please feel free to chime in the PR for any suggestions, I would be happy to fix them. |
I am interested in integrating InstructIR into MTEB. Currently, the dataset for InstructIR is only available on GitHub (https://github.com/kaistAI/InstructIR) and not on Hugging Face. Could you advise on the best approach to integrate it directly from GitHub? Should we use the
dataset_transform
to download and run it directly, or is there an alternative method that you recommend?Thanks and Regards,
Henil
CC: @Muennighoff @KennethEnevoldsen
The text was updated successfully, but these errors were encountered: