Iris: Automatic Generation of Efficient Data Layouts for High Bandwidth Utilization

11/08/2022
by   Stephanie Soldavini, et al.
0

Optimizing data movements is becoming one of the biggest challenges in heterogeneous computing to cope with data deluge and, consequently, big data applications. When creating specialized accelerators, modern high-level synthesis (HLS) tools are increasingly efficient in optimizing the computational aspects, but data transfers have not been adequately improved. To combat this, novel architectures such as High-Bandwidth Memory with wider data busses have been developed so that more data can be transferred in parallel. Designers must tailor their hardware/software interfaces to fully exploit the available bandwidth. HLS tools can automate this process, but the designer must follow strict coding-style rules. If the bus width is not evenly divisible by the data width (e.g., when using custom-precision data types) or if the arrays are not power-of-two length, the HLS-generated accelerator will likely not fully utilize the available bandwidth, demanding even more manual effort from the designer. We propose a methodology to automatically find and implement a data layout that, when streamed between memory and an accelerator, uses a higher percentage of the available bandwidth than a naive or HLS-optimized design. We borrow concepts from multiprocessor scheduling to achieve such high efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2020

Dataflow-Architecture Co-Design for 2.5D DNN Accelerators using Wireless Network-on-Package

Deep neural network (DNN) models continue to grow in size and complexity...
research
04/25/2018

Giving Text Analytics a Boost

The amount of textual data has reached a new scale and continues to grow...
research
05/25/2021

ScalaBFS: A Scalable BFS Accelerator on HBM-Enhanced FPGAs

High Bandwidth Memory (HBM) provides massive aggregated memory bandwidth...
research
12/10/2017

A Flexible High-Bandwidth Low-Latency Multi-Port Memory Controller

Multi-port memory controllers (MPMCs) have become increasingly important...
research
05/16/2022

TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-based FPGAs

The emergence of high-bandwidth memory (HBM) brings new opportunities to...
research
02/23/2022

Shisha: Online scheduling of CNN pipelines on heterogeneous architectures

Chiplets have become a common methodology in modern chip design. Chiplet...
research
11/02/2020

On the Impact of Partial Sums on Interconnect Bandwidth and Memory Accesses in a DNN Accelerator

Dedicated accelerators are being designed to address the huge resource r...

Please sign up or login with your details

Forgot password? Click here to reset