You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@FBzzh@yuanpcr you can list all potential metrics for the validate task in this issue. For more details about the validate task, you can refer to issue #13 .
The text was updated successfully, but these errors were encountered:
Here are some metrics related to the answer groundness.
Knowledge F1. A lexical overlap metric used for knowledge-grounded dialogue, which checks the F1 score between the tokens of gold passages and model responses.
Knowledge F1 ++. A variant of K-F1 that discounts tokens from user question or the conversation history in the model response.
Faithfulness (RAGAS). Use LLM to extract the statements in the model response, and then determines whether these statements can be inffered from the given contexts.
FActScore. A LLM-based method that breaks down the generated text into a series of atom facts, and then evaluates whether these facts are supported by the knowledge source.
QUIP-Score. An n-gram overlap measure that quantifies the degree to which a generated passage consists of exact spans found in a text corpus.
@FBzzh @yuanpcr you can list all potential metrics for the
validate
task in this issue. For more details about thevalidate
task, you can refer to issue #13 .The text was updated successfully, but these errors were encountered: