With easier access to powerful compute resources, there is a growing tre...
In high-performance computing (HPC), the demand for efficient parallel
p...
There is an ever-present need for shared memory parallelization schemes ...
Automatic source-to-source parallelization of serial code for shared and...
Most current popular subword tokenizers are trained based on word freque...
We look at a decision taken early in training a subword tokenizer, namel...
The Universal Morphology (UniMorph) project is a collaborative effort
pr...
In past years, the world has switched to many-core and multi-core shared...
The problem of representing the atomic elements of language in modern ne...
Commonly-used transformer language models depend on a tokenization schem...
We demonstrate that it is feasible to diacritize Hebrew script without a...
Natural language processing systems often struggle with out-of-vocabular...
In many settings it is important for one to be able to understand why a ...
We present the New York Times Word Innovation Types dataset, or NYTWIT, ...
We propose a new contextual-compositional neural network layer that hand...
Attention mechanisms play a central role in NLP systems, especially with...
Character-level models have been used extensively in recent years in NLP...
Semantic graphs, such as WordNet, are resources which curate natural lan...
Political identity is often manifested in language variation, but the
re...
Word embeddings improve generalization over lexical features by placing ...
A description and annotation guidelines for the Yahoo Webscope release o...