Chapter 08.05: Evaluation Metrics

Here we answer the question on how to evaluate the generated outputs in open ended text generation. We first explain BLEU [1] and ROUGE [2], which are metrics for tasks with a gold reference. Then we introduce diversity, coherence [3] and MAUVE [4], which are metrics for tasks without a gold reference such as open ended text generation. You will also learn about human evaluation.

Lecture Slides

References

[1] Papineni et al., 2002
[2] Lin, 2004
[3] Su et al., 2022
[4] Pillutla et al., 2021

« Chapter 08.04: Decoding Hyperparameters & Practical considerations