Chapter 08.05: Evaluation Metrics
Here we answer the question on how to evaluate the generated outputs in open ended text generation. We first explain BLEU [1] and ROUGE [2], which are metrics for tasks with a gold reference. Then we introduce diversity, coherence [3] and MAUVE [4], which are metrics for tasks without a gold reference such as open ended text generation. You will also learn about human evaluation.
Lecture Slides
References
- [1] Papineni et al., 2002
- [2] Lin, 2004
- [3] Su et al., 2022
- [4] Pillutla et al., 2021