Chapter 12: Multilinguality
Multilinguality in NLP refers to the ability of models to understand and generate text across multiple languages, enabling more inclusive and versatile applications. This is achieved by training models on diverse multilingual datasets, allowing them to learn language-agnostic representations and transfer knowledge between languages. In this chapter you will learn about cross-lingual embeddings and multilingual transformers, which enable models to perform tasks like translation or text classification across different languages.
-
Chapter 12.01: Why Multilinguality?
We need multilingual models to bridge language barriers, enhance global communication, and ensure equitable access to information and technology across diverse linguistic communities. These models enable seamless translation, cross-lingual information retrieval, and multi-language support in applications, allowing people to interact with technology in their native languages. By addressing the challenges of linguistic diversity, multilingual models promote inclusivity, facilitate international collaboration, and democratize access to digital resources and services globally.
-
Chapter 12.02: Cross-lingual Word Embeddings
Cross-lingual word embeddings create a shared vector space for words from multiple languages, allowing models to understand and process text across different languages seamlessly. In this chapter we describe the two main training strategies and look at examples of models to train those. Additionally, we look at unsupervised learning of multi-lingual word embeddings.
-
Chapter 12.03: (Massively) Multilingual Transformers
As we have previously seen, transformers are the working horse for modern NLP. In this section we learn how transformers are adapted to work in multilingual settings and discuss typical issues when training those. Furthermore we look at zero-shot cross lingual transfer capabilities.