List all potential test benchmarks #63

faneshion · 2024-02-28T14:26:01Z

List all most used datasets in RAG researches, and we will add them to the benchmarks.

THUDM/webglm-qa from huggingface: https://huggingface.co/datasets/THUDM/webglm-qa
NaturalQuestions from huggingface: https://huggingface.co/datasets/natural_questions
ASQA dataset from huggingface: https://huggingface.co/datasets/din0s/asqa #64
Trivia QA from huggingface: https://huggingface.co/datasets/trivia_qa
Hotpot QA from huggingface: https://huggingface.co/datasets/hotpot_qa
WikiEval from huggingface: https://huggingface.co/datasets/explodinggradients/WikiEval

FBzzh · 2024-02-29T06:55:03Z

MMLU from huggingface: https://huggingface.co/datasets/cais/mmlu
PopQA from huggingface: https://huggingface.co/datasets/akariasai/PopQA
WebQuestions from hugginggace: https://huggingface.co/datasets/web_questions
FEVER from hugginggace: https://huggingface.co/datasets/fever
FeTaQA from hugginggace: https://huggingface.co/datasets/DongfuTingle/FeTaQA

FBzzh · 2024-02-29T07:06:02Z

MedMCQA from hugginggace: https://huggingface.co/datasets/medmcqa
GSM8K from hugginggace: https://huggingface.co/datasets/gsm8k
BBH from github: https://github.com/suzgunmirac/BIG-Bench-Hard
SQuAD from hugginggace: https://huggingface.co/datasets/squad
SQuAD_v2 from hugginggace: https://huggingface.co/datasets/squad_v2
Wizard-of-Wikipedia(WoW) from hugginggace: https://huggingface.co/datasets/chujiezheng/wizard_of_wikipedia

Wenshansilvia · 2024-03-18T12:35:01Z

Select and implement typical benchmarks, collect RAG papers that utilized these benchmarks, and try to reproduce evaluation result in the paper.

List benchmark and related papers & metrics.
Produce testset using baseline RAG in the paper. Pack testset as dataset format and upload to HuggingFace.
Reproduce evaluation result in RAGEval.

Eli5 @QianHaosheng , ASQA @bugtig6351 , Fever @henan991201

faneshion assigned yanqiangmiffy, Wenshansilvia, RZFan525, bugtig6351, QianHaosheng, henan991201, FBzzh, LittleSunshineQi and youngbeauty250 Feb 28, 2024

faneshion pinned this issue Feb 28, 2024

faneshion added this to the Version 0.2 milestone Feb 28, 2024

This was referenced Mar 20, 2024

Feature/add asqa benchmark #86

Closed

add asqa benchmark #87

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List all potential test benchmarks #63

List all potential test benchmarks #63

faneshion commented Feb 28, 2024 •

edited

Loading

FBzzh commented Feb 29, 2024

FBzzh commented Feb 29, 2024

Wenshansilvia commented Mar 18, 2024

List all potential test benchmarks #63

List all potential test benchmarks #63

Comments

faneshion commented Feb 28, 2024 • edited Loading

FBzzh commented Feb 29, 2024

FBzzh commented Feb 29, 2024

Wenshansilvia commented Mar 18, 2024

faneshion commented Feb 28, 2024 •

edited

Loading