Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a BM25 baseline? #990

Open
malteos opened this issue Jun 26, 2024 · 3 comments
Open

Adding a BM25 baseline? #990

malteos opened this issue Jun 26, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@malteos
Copy link
Contributor

malteos commented Jun 26, 2024

Hi all,

I made a little script to evaluate the MTEB retrieval tasks with BM25 (especially to have a strong baseline for the multilingual tasks that): https://gist.github.com/malteos/178a1b77ac362cd7857a054e2d9c07cb

Is this something worth adding to this repo or the mteb scripts repo? If so, where do you recommend to add this? I could make PR then.

Best
Malte

@imenelydiaker imenelydiaker added the enhancement New feature or request label Jun 26, 2024
@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Jun 26, 2024

Wouldn't it be possible to write it as an Encoder module to avoid wrapping the MTEB module?

(e.g. see #888)

@KennethEnevoldsen
Copy link
Contributor

from comment #888 I think it is great to add this in as well. I would love to in the future more natively support BM25 models

@malteos will you make the PR? We might want to also discuss potential refactors to allow for native implementation of BM25 (or similar).

Note that the BM25MTEB wrap might be unecc. instead, you can probably do:

tasks = mteb.get_tasks(... task_type == "retrieval")

@malteos
Copy link
Contributor Author

malteos commented Jul 3, 2024

I will make a PR in the next weeks. I cannot promise any exact date though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants