Solving Tensor Low Cycle Rank Approximation

04/13/2023
by   Yichuan Deng, et al.
0

Large language models have become ubiquitous in modern life, finding applications in various domains such as natural language processing, language translation, and speech recognition. Recently, a breakthrough work [Zhao, Panigrahi, Ge, and Arora Arxiv 2023] explains the attention model from probabilistic context-free grammar (PCFG). One of the central computation task for computing probability in PCFG is formulating a particular tensor low rank approximation problem, we can call it tensor cycle rank. Given an n × n × n third order tensor A, we say that A has cycle rank-k if there exists three n × k^2 size matrices U , V, and W such that for each entry in each A_a,b,c = ∑_i=1^k ∑_j=1^k ∑_l=1^k U_a,i+k(j-1)⊗ V_b, j + k(l-1)⊗ W_c, l + k(i-1) for all a ∈ [n], b ∈ [n], c ∈ [n]. For the tensor classical rank, tucker rank and train rank, it has been well studied in [Song, Woodruff, Zhong SODA 2019]. In this paper, we generalize the previous “rotation and sketch” technique in page 186 of [Song, Woodruff, Zhong SODA 2019] and show an input sparsity time algorithm for cycle rank.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset