Comparing Probability Distributions with Conditional Transport
To measure the difference between two probability distributions, we propose conditional transport (CT) as a new divergence and further approximate it with the amortized CT (ACT) cost to make it amenable to implicit distributions and stochastic gradient descent based optimization. ACT amortizes the computation of its conditional transport plans and comes with unbiased sample gradients that are straightforward to compute. When applied to train a generative model, ACT is shown to strike a good balance between mode covering and seeking behaviors and strongly resist mode collapse. On a wide variety of benchmark datasets for generative modeling, substituting the default statistical distance of an existing generative adversarial network with ACT is shown to consistently improve the performance.
READ FULL TEXT