The Computational Complexity of Understanding Network Decisions
For a Boolean function Φ{0,1}^d→{0,1} and an assignment to its variables x=(x_1, x_2, ..., x_d) we consider the problem of finding the subsets of the variables that are sufficient to determine the function value with a given probability δ. This is motivated by the task of interpreting predictions of binary classifiers described as Boolean circuits (which can be seen as special cases of neural networks). We show that the problem of deciding whether such subsets of relevant variables of limited size k≤ d exist is complete for the complexity class NP^PP and thus generally unfeasible to solve. We introduce a variant where it suffices to check whether a subset determines the function value with probability at least δ or at most δ-γ for 0<γ<δ. This reduces the complexity to the class NP^BPP. Finally, we show that finding the minimal set of relevant variables can not be reasonably approximated, i.e. with an approximation factor d^1-α for α > 0, by a polynomial time algorithm unless P = NP (this holds even with the probability gap).
READ FULL TEXT