The Languini Kitchen serves as both a research collective and codebase
d...
In recent years, large pre-trained language models (LLMs) have demonstra...
Language models have achieved remarkable performance on a wide range of ...
We introduce the Block-Recurrent Transformer, which applies a transforme...
The weight matrix (WM) of a neural network (NN) is its program. The prog...
We share our experience with the recently released WILDS benchmark, a
co...
Transformers with linearised attention ("linear Transformers") have
demo...
We show the formal equivalence of linearised self-attention mechanisms a...
Humans can quickly associate stimuli to solve problems in novel contexts...
We incorporate Tensor-Product Representations within the Transformer in ...
We combine Recurrent Neural Networks with Tensor Product Representations...