Chapter 4: BERT

BERT (Bidirectional Encoder Representations from Transformers) [1] is a transformer-based model, designed to generate deep contextualized representations of words by considering bidirectional context, allowing it to capture complex linguistic patterns and context-dependent meanings. It achieves this by pretraining on large text corpora using masked language modeling and next sentence prediction objectives, enabling it to learn rich representations of words that incorporate both left and right context information.

References