Unbiased and Efficient Sampling of Dependency Trees

05/25/2022
by   Miloš Stanojević, et al.
0

Distributions over spanning trees are the most common way of computational modeling of dependency syntax. However, most treebanks require that every valid dependency tree has a single edge coming out of the ROOT node, a constraint that is not part of the definition of spanning trees. For this reason all standard inference algorithms for spanning trees are sub-optimal for modeling dependency trees. Zmigrod et al. (2021b) have recently proposed algorithms for sampling with and without replacement from the single-root dependency tree distribution. In this paper we show that their fastest algorithm for sampling with replacement, Wilson-RC, is in fact producing biased samples and we provide two alternatives that are unbiased. Additionally, we propose two algorithms (one incremental, one parallel) that reduce the asymptotic runtime of their algorithm for sampling k trees without replacement to 𝒪(kn^3). These algorithms are both asymptotically and practically more efficient.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset