Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add validate metrics #30

Open
Tracked by #23
Wenshansilvia opened this issue Feb 5, 2024 · 1 comment
Open
Tracked by #23

add validate metrics #30

Wenshansilvia opened this issue Feb 5, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@Wenshansilvia
Copy link
Collaborator

Wenshansilvia commented Feb 5, 2024

@FBzzh @yuanpcr you can list all potential metrics for the validate task in this issue. For more details about the validate task, you can refer to issue #13 .

@Wenshansilvia Wenshansilvia mentioned this issue Feb 5, 2024
7 tasks
@faneshion faneshion added this to the Version 0.1 milestone Feb 6, 2024
@faneshion faneshion added the enhancement New feature or request label Feb 23, 2024
@faneshion faneshion changed the title add validator metrics add validate metrics Feb 23, 2024
@bugtig6351
Copy link
Collaborator

bugtig6351 commented Apr 26, 2024

Here are some metrics related to the answer groundness.

  • Knowledge F1. A lexical overlap metric used for knowledge-grounded dialogue, which checks the F1 score between the tokens of gold passages and model responses.
  • Knowledge F1 ++. A variant of K-F1 that discounts tokens from user question or the conversation history in the model response.
  • Faithfulness (RAGAS). Use LLM to extract the statements in the model response, and then determines whether these statements can be inffered from the given contexts.
  • FActScore. A LLM-based method that breaks down the generated text into a series of atom facts, and then evaluates whether these facts are supported by the knowledge source.
  • QUIP-Score. An n-gram overlap measure that quantifies the degree to which a generated passage consists of exact spans found in a text corpus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants