Dual Stochastic Natural Gradient Descent
Although theoretically appealing, Stochastic Natural Gradient Descent (SNGD) is computationally expensive, it has been shown to be highly sensitive to the learning rate, and it is not guaranteed to be convergent. Convergent Stochastic Natural Gradient Descent (CSNGD) aims at solving the last two problems. However, the computational expense of CSNGD is still unacceptable when the number of parameters is large. In this paper we introduce the Dual Stochastic Natural Gradient Descent (DSNGD) where we take benefit of dually flat manifolds to obtain a robust alternative to SNGD which is also computationally feasible.
READ FULL TEXT