Interpretable Machine Learning (IML) | Chapter 7: Counterfactuals and Adversarial Examples

This chapter deals with further local analyses. First, counterfactuals are examined which search for data points in the neighborhood of an observation that lead to a different prediction. Second, the robustness of an ML model is checked by explicitely searching for malicious inputs, which are called adversarial examples.

Chapter 7.1: Counterfactual Explanations (CE)
Counterfactual explanations (CE) analyze how single data points need to be changed to produce a certain prediction outcome. The idea behind CE and its mathematical foundation are topic of this section.
Chapter 7.2: Methods & Discussion of CEs
This section discusses several CE methods and presents their advantages and limitations.
Chapter 7.3: Local Explanations: Adversarial Examples
Adversarial ML studies the robustness of ML models to malicious input. This chapter provides usage examples for different types of data and lists ways to constuct adversarial examples (ADEs).
Chapter 7.4: Increasing Trust in Explanations
There are different dimensions that increase or decrease the trust in an ML model which is looked at in this section.