Fully-Dynamic Decision Trees
We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples. Given ϵ > 0 our algorithm guarantees that, at every point in time, every node of the decision tree uses a split with Gini gain within an additive ϵ of the optimum. For real-valued features the algorithm has an amortized running time per insertion/deletion of O(d log^3 n/ϵ^2), which improves to O(d log^2 n/ϵ) for binary or categorical features, while it uses space O(n d), where n is the maximum number of examples at any point in time and d is the number of features. Our algorithm is nearly optimal, as we show that any algorithm with similar guarantees uses amortized running time Ω(d) and space Ω̃ (n d). We complement our theoretical results with an extensive experimental evaluation on real-world data, showing the effectiveness of our algorithm.
READ FULL TEXT