NeurIPS 2020: Review on Explainable AI (XAI)

13 min readDec 16, 2020

Hot topics, applications and our favourite contributions

This year’s Conference for Neural Information Processing Systems (NeurIPS) was jam-packed with many different amazing contributions in the field of Explainable AI (XAI). The increasing number of contributions on the subject of XAI shows the importance of this research area, as it aims to make the black box models understandable and thus paves the way for real application.

At the same time, a large number of contributions makes it increasingly difficult to keep an eye on new and relevant topics. For this reason, we would like to give you a personal overview of the latest hot topics in the XAI area that were presented at NeurIPS 2020. Let’s start.

1. Hot topics 🔥

Causality meets XAI
SHAP + Causality = Improved Explanations
Counterfactual explanations
Evaluating explanations

2. Papers we ❤️

Fourier-transform-based attribution priors improve the interpretability and stability of deep learning models for genomics
Towards Interpretable Natural Language Understanding with Explanations as Latent Variables
Model Agnostic Multilevel Explanations
Learning outside the Black-Box: The pursuit of interpretable models

3. XAI applications for social good 🌍

4. Virtual conferencing 💻

5. Final thoughts

1. Hot topics 🔥

Causality meets XAI

“I would rather discover one cause than be king of Persia.”
Democritus (c. 460 BC — c. 370 BC)

Causality in AI is attracting increasing attention from AI researchers around the world. The same holds for the field of Explainable AI. One of the most interesting ideas that we observed at this year’s NeurIPS conference were those that worked on merging causality principles with explainability techniques to unbox black-box AI systems.

So, what is causality?

There is a well-known saying in statistics: “correlation does not imply causation”. Indeed, the correlation between observed variables does not imply the existence of a causal relationship between them (check this out for fun examples of this misconception). Yet, while rules of causality are ubiquitous and we (humans) base all of our decisions on learned casual structures, current machine learning methods mainly focus on correlations without causation.

Deep learning models already proved to surpass human performance in pattern recognition, however still struggling to recognise even the simplest casual relationships. Judea Pearl, Turing Award winner and pioneer of the field of causality, in the book The Book of Why. The New Science of Cause and Effect (brilliant read about “the Causal Revolution”), emphasises that learning from the data alone can be misleading when causality is neglected — ML researchers should strive to build models that incorporate casual structures.

In order to achieve true multi-task generalisation capabilities, future Machine Learning models should be able to understand not only correlations but incorporate the causal structure of the underlying data.

SHAP + Causality = Improved Explanations

SHapley Additive exPlanations or SHAP is a model-agnostic explainability method that remains one of the most popular tools for understanding the black-box nature of complex AI systems. “Model-agnostic” means that the method does not require access to the computational graph of the ML algorithm, hence it means that these types of XAI methods can be applied to any black-box algorithm.

SHAP provides local explanations by attributing scores to each feature (e.g. pixel) in the input for individual data points. Each feature is represented as an agent in the game-theoretic sense, where the final score, namely the feature importance representing the agent’s “average” reward. Shapley values are based on a principled mathematical foundation and meet various requirements.

Differences between standard (pink and gray) Shapley values and Asymmetric Shapley Values that incorporate partial casual structure. Dataset used for training was synthetically generated with sex and age being prime influences in the causal graph. We can observe that the inclusion of partial casual significantly changes the explanations. Image source.

In this year’s NeurIPS, several papers were dedicated towards the incorporation of causal structure into the Shapley values framework: “Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability” and “Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models”. In these two papers, users provide the partial causal structure of the features. Authors of both papers argue that the inclusion of this additional causal information leads to more precise explanations that better distribute the importance among features in terms of the underlying causal structure of the data.

Counterfactual explanations

Counterfactual explanations was another hot topic at NeurIPS 2020. Imagine you are asking for a loan and after an AI system analysed your application, you were (unfortunately) rejected. Popular explainability methods like SHAP or LIME simply evaluate the importance of features in your application. However, it could be more meaningful to provide an explanation of how much you would need to change the input to the AI system to approve the loan. In other words, you are not just interested in an explanation that states that your income is important, but rather, you are interested in an explanation that says how much higher your income would have to be in order to be granted a loan.

These types of explanations are so-called counterfactual explanations and they are one of the basic building blocks in Causal Inference theory. In recent years, various methods for generating counterfactual explanations for deep neural networks have been proposed. In this year’s conference, Stratis Tsirtsis and Manuel Gomez-Rodriguez presented their work on the connection between interpretable and strategic machine learning. In their NeurIPS paper “Decisions, Counterfactual Explanations and Strategic Behavior”, the authors studied counterfactual explanations from the game-theoretic point of view and proposed several novel algorithms for the generation of counterfactual explanations that, when measured quantitatively, have resulted in greater user benefit.

Evaluating explanations

The evaluation of explainability methods is another hot topic in XAI research. This was brought up during the tutorial on “Explaining Machine Learning Predictions: State-of-the-art, Challenges, and Opportunities” by Himabindu Lakkaraju, Julius Adebayo, Sameer Singh. In the introductory but comprehensive overview of XAI research, the tutorial covered local- and global approaches, explanations for different modalities, and as well as evaluation of XAI techniques.

In particular, the tutorial problematised existing issues regarding XAI evaluation, such as perturbations creating out-of-distribution samples, inconsistent user evaluation studies, and the difference between a correct and a useful explanation. In the tutorial, they also repeated some well-known failure modes for explainability methods such as:

Fidelity — that some explainability methods do not properly reflect the underlying model e.g., by Adebayo et al. 2018
Fragility — that explanations can be easily manipulated e.g., by Dombrowski et al. 2019
Stability (or “robustness”) — that some explanations are sensitive to changes in input Alvarez et al. 2018 and in hyperparameters e.g., by Bansal et al. 2020.

We recommend anyone new in this space to check out the tutorial.

2. Papers we ❤️

At NeurIPS 2020, like every year, there is a vast variety of papers to check out. Now we give you an overview of our favourite papers.

Fourier-transform-based attribution priors improve the interpretability and stability of deep learning models for genomics by Tseng et al.

Local explanation methods as part of the XAI research deal with making the behaviour of the black box model for single data points (e.g. an image, a sentence, or a time series) visible and comprehensible. Especially in safety-critical areas, such as disease recognition, it is crucial to understand the reason behind a decision in order to trust it.

In the intersection of XAI and genomics, one can train a deep learning model to map genomic DNA sequences to protein-DNA binding data and then use a model-agnostic (without deeper knowledge of the network properties) local explanation method to identify which features of the DNA string were important for the network’s behaviour (e.g., the classification!). The explanations, in this case, correspond to small pieces of DNA sequences, so-called ‘motifs’, which are responsible for the DNA discoverable properties (which is the black box output that we wish to uncover).

However, a common issue of most explanation methods is that they can be visually noisy. With the assumption that ‘what you see is what you get’ (meaning the network properties might be to blame), the genetics group from Stanford optimised the network training as follows: for the Fourier-based prior approach, they made the simple assumption that random noise equals high frequencies (events that happen a lot in time series) whereas the true signal refers to low frequencies (only certain events).

The plot gives a good impression of the smoothing effect the Fourier-based prior approach has on both the input signal (two upper graphs) as well as the associated important features/motifs (two lower graphs). Image source.

Thus, only interested in the underlying true signal, they added an additional term to the loss of the network, which penalises very short motifs (i.e. high frequencies). In the figure above, we can see that the method indeed produces smoother-looking explanations, which is quite an achievement for an approach that is not part of the explanation method at all.

This teaches us once again that sometimes a small step for the network can be a big step for the explanation.

Towards Interpretable Natural Language Understanding with Explanations as Latent Variables by Zhou et al

Self-explanation of black-box systems are important not only for understanding ML systems, but they can also be used as additional information for supervised learning tasks e.g., shown in works by Srivastava et al. 2017, Zhou et al. 2020, Quin et al. 2020). Continuing the previous work of Hancock et al. 2018, Fatema et al. 2019 and Camburu et al. 2018, this NeurIPS 2020 paper aims to train a model that not only accurately predicts the correct label, but also returns a meaningful explanation about the prediction.

Briefly, the authors propose a probabilistic method called ELV which treats explanations as latent variables. ELV jointly optimizes an explanation generation model, as well as an explanation-augmented classification model via the Expectation-Maximization algorithm.

During the E-step they train an explanation generation model using only a small set of labeled data with explanations as well as unlabeled data. During the M-step they use the trained explanation generation model to train a discriminative model, e.g. a text classifier.

For both the explanation generator as well as sentiment classifier the authors make use of different pre-trained state-of-the-art language models, such as BERT. Image source.

Although the method itself could potentially be biased towards man-made explanations, it provides a good solution for interpreting decisions from a black-box model. Since the proposed method, so far, only has been applied toNLP tasks, it would be very interesting to see what this method would look like for other machine learning tasks, e.g., image classification.

Learning outside the Black-Box: The pursuit of interpretable models by Crabbe et al.

A global model-agnostic interpretability approach that was wildly inspiring was done by a researcher from the University of Cambridge. The Symbolic Pursuit approach starts from a highly complex deep neural network and mimics the behavior of the network by simpler functions, which are easier to understand by humans. Thereby, the method generates a global function, consisting of single simple (symbolic) functions added iteratively until the same input-output generation of the black box is reached.

Once the best function is found, this method gives not only an importance score of the model’s features (by the size of the corresponding function coefficient, awesome!) but also for their interaction (by the weights in higher-order terms). Furthermore, they provide a quantitative measure of the quality of the explanation method (by the residuals between the output of the black box and the explanation function).

Explaining complex math by more math: a global model distillation

This is a beautiful approach, which might have a lot of potential as a starting point for future work.

Model Agnostic Multilevel Explanations by Natesan et al.

In our daily discussion with other humans, we sometimes fall short and misinterpret an explanation given to us, so why should that be any different when it comes to machines? Thanks to researchers at IBM we can now look at an explanation from many different perspectives while using only one model.

The Model agnostic multilevel explanation (MAME) method, explains different levels of the network decision beautifully visualised as a tree. The leafs of the tree represent explanations for a specific class for individual input instances (e.g images, sentences). Going now up the tree the algorithm identifies in each following layer groups of explanations that are closely related (L2-norm between two explanations). Each group leaf is represented by the “average” explanation of the group, which includes common characteristics of the class. These groups then become bigger and bigger, which in turn, means the explanations become more and more general in terms of feature representations of a class — until we have reached the top of the tree, where the global explanation of the class is represented!

Here you can see the tree using LIME values as the explanation method with the different explanations becoming more and more global towards the root. Image source.

This method has broad applicability, meaning essentially any explanation method can be adapted for visualisation.

So our takeaway for you is: “from peak holes to sledgehammers there are many ways to find out what is inside a black box and it is your decision”.

3. XAI applications for social good 🌍

We thoroughly enjoyed the NeurIPS workshop on Tackling Climate Change with Machine Learning. During this day, the “Interpretability in Convolutional Neural Networks for Building Damage Classification in Satellite Imagery” presented by Thomas Chen was another great talk showing that established XAI methods can be utilised for social good. Gradient-weighted Class Activation Mapping or “Grad-CAM” was used on different kinds of post- natural-disaster imagery to depict which parts of the building crop lead to a certain model classification.

In this way (although the technique itself is not new), they show that XAI can help debug climate models while they learn; what the neural components are looking at, qualitatively. We celebrate that XAI can in fact help debug networks for tackling climate change and that it further can improve prediction models — in this case, to provide insight into the preferred ways to train models for better classifications of building damage.

On the first row, there are images of buildings post-disaster, and on the bottom, their corresponding explanations. GRAD-CAM explainability method is used and shows what parts of the image that make the base model predict a certain classification e.g., “no damage”, “minor damage”, “major damage” and “destroyed”. Image *source*.

During the WiML Affinity Workshop, we also stumbled upon another familiar, but the very important application of XAI methods — healthcare. Amongst others, Jessica Schrouff from Google research spoke about “Responsible AI for Healthcare” in a Sponsor Expo talk. Issues were highlighted that ML research does not necessarily (nor easily) translate into practice. Trust is still a big issue in healthcare. She explained how interpretability techniques such as Testing with CAVs or “TCAVs” were used to help define a limited set of human-sensible clinical concepts to better understand what triggered the predictions for Electronic Health Records (EHR).

On the topic of responsible AI, we also enjoyed Charles Isbell’s invited keynote talk “You Can’t Escape Hyperparameters and Latent Variables: Machine Learning as a Software Engineering Enterprise”, which brought up a lot of important issues. He rhetorically asked NeurIPS participants — why there are plans, mechanisms, and regulations to ensure safety in airplanes, cars, and in other sectors but still not in AI? Researchers were encouraged to start educating themselves around this topic — what are the ways in which we could incorporate accountability into our profession as AI researchers?

“It is important that we as researchers … take seriously the consequences of the systems we built, and if we don’t, we are doing our field a disservice and ML a disservice.”

That being said, how “taking seriously” would play out in practice, is a question far more difficult to solve. For example, the idea of implementing an AI ethics board might seem like an attractive proposition. But then we may ask, who is eligible to govern such a board? And, for what problems in AI should what ethics rules apply? Nevertheless, as perfectly said by Charles Isbell himself, if we, as researchers, don’t start engaging in these topics ourselves, others will:

“If we do not do it [start taking more seriously the consequences of AI systems], someone will, and they are not likely to do it in a way that we are going to be happy with as researchers.”

4. Virtual conferencing 💻

Due to the current state of the pandemic, NeurIPS, like other conferences, this year, adopted a completely new, virtual format. With everything happening online, NeurIPS organisers made sure that a lot of possibilities were still offered to the participants; to exchange ideas, engage with referees on presentation discussions, meet fellow researchers in socials and last but not least, to have inspiring conversations during the poster sessions.

GatherTown, which was the virtual environment of the poster sessions, felt like a mix of an old-school computer game and Google’s hangout and still, we loved the interaction and all the new ideas and taking home messages. We checked out plenty of brilliant posters, but that would go beyond our scope here, therefore we presented a selection of honourable papers above.

Some researchers of UMI Lab enjoying NeurIPS poster sessions as Gathertown avatars.

5. Final thoughts

XAI is important to enable a trustworthy, ethically correct, and legally understandable use of AI. The importance of doing research here in order to achieve rapid progress on XAI is also reflected in the large number of different work at NeurIPS. A lot of research is currently being carried out in various areas: local and global explanations, self-explanatory networks, and the very important work on evaluating XAI methods.

We believe that the problem of evaluating XAI methods is two-fold. First, how can we show that an explanation is in itself correct? Second, perhaps more importantly, how can we measure that an explanation is helpful? This could be done by investigating different XAI use cases — for example in the context of human-in-the-loop systems.

Furthermore, we believe that in the future, XAI research will increasingly include causal structures of the data in order to encode cause-effect relationships of events and then use these to evaluate new data in the context of known relationships.

As we saw examples of in the conference, there are many applications of XAI for common goods that will surely become even more important in the future. Climate change and healthcare are just two examples. As a research laboratory, we strive to contribute to the progress made on these important topics and are interested in research collaborations in this area.

Follow us on Twitter: @TUBerlin_UMI

Authors: Anna Hedström, Kirill Bykov, Philine Lou Bommer, Dennis Grinwald, Marina Marie-Claire Höhne