Chapter 08.05: Evaluation Metrics

Here we answer the question on how to evaluate the generated outputs in open ended text generation. We first explain BLEU [1] and ROUGE [2], which are metrics for tasks with a gold reference. Then we introduce diversity, coherence [3] and MAUVE [4], which are metrics for tasks without a gold reference such as open ended text generation. You will also learn about human evaluation.

Lecture Slides

References