-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
End-to-end integration / testing with leaderboard #932
Comments
Completely agree! I would love to merge the two repositories! |
+1 to this issue @Muennighoff! I hope to take a look at the latter 3 of these bullet points at the end of next week: making it easier to add results / mirror to Github and calculate the leaderboard automatically without refreshes.
I was wondering about this myself - I think adding tests is a great starting place. It is a little tricky as the solution to the latter three involves setting the leaderboard up as a mirror on Github and doing automatic pushes, so it would draw from the One potential solution to this is to add another test to |
That's amazing! 馃殌 I think ideally there'd be two tests, sth like:
I think we only need to make sure the leaderboard code runs without erroring out which could likely be done by just parametrizing it a bit and then we can feed in the results folder & metadata as parameters for the tests. Anyways, I think the best solution here will become clearer as we advance on the other issues 馃 |
I would add the results to To avoid influencing the existing leaderboard too much it might be ideal to keep the existing one as is for now and create a new leaderboard for development. |
Agree as I will likely break it a few times before it's fixed haha. I've created mteb/leaderboard-in-progress which we can rename when it's sync'd correctly. |
I've created a mteb/leaderboard Github which calculates the leaderboard results daily via Github actions (a full refresh) and syncs to the Huggingface Space mteb/leaderboard-in-progress. The 1 hour refresh of all models can happen in the background at night while the space runs virtually instantaneously using those cached files! It'd be nice to monitor it for a days or two before making it work on the main space. For those couple days, is it okay to pause any new commits to the leaderboard space? I had to make a large number of refactors and it will be a pain to try to resolve new conflicts. Does Saturday/Sunday work for the switchover @Muennighoff? Trying to find a time that will cause the least impact if it goes down for a few hours during the transition and I'm not sure when the most active usage of the space is. NOTE: this doesn't use the new |
That's amazing! Your suggestion sounds good to me & we cannot commit anything for a few days (also cc @tomaarsen). I'm not sure it would even go down but any date is fine I think. For the |
Yea I do think we should start using Should we also add any updates on how to add models etc.? |
Seems good! I'll abstain from commits on the HF Leaderboard Space in the next few days.
|
There's a few things that could be improved re: the leaderboard & codebase integration 馃
paths.json
file in there + adding the model specs to the leaderboard. Ideally, we would allow users to just submit all of this via PR and it would automatically update similar to how people add their models to alpaca, e.g. Add Together-MoA, Together-MoA-Lite to AlpacaEval聽tatsu-lab/alpaca_eval#342cc @orionw who had some great ideas on this & is a GitHub actions wizard 馃獎
The text was updated successfully, but these errors were encountered: