NPS: A Framework for Accurate Program Sampling Using Graph Neural Network

by   Yuanwei Fang, et al.

With the end of Moore's Law, there is a growing demand for rapid architectural innovations in modern processors, such as RISC-V custom extensions, to continue performance scaling. Program sampling is a crucial step in microprocessor design, as it selects representative simulation points for workload simulation. While SimPoint has been the de-facto approach for decades, its limited expressiveness with Basic Block Vector (BBV) requires time-consuming human tuning, often taking months, which impedes fast innovation and agile hardware development. This paper introduces Neural Program Sampling (NPS), a novel framework that learns execution embeddings using dynamic snapshots of a Graph Neural Network. NPS deploys AssemblyNet for embedding generation, leveraging an application's code structures and runtime states. AssemblyNet serves as NPS's graph model and neural architecture, capturing a program's behavior in aspects such as data computation, code path, and data flow. AssemblyNet is trained with a data prefetch task that predicts consecutive memory addresses. In the experiments, NPS outperforms SimPoint by up to 63 average error by 38 increased accuracy, reducing the expensive accuracy tuning overhead. Furthermore, NPS shows higher accuracy and generality than the state-of-the-art GNN approach in code behavior learning, enabling the generation of high-quality execution embeddings.


page 1

page 2

page 3

page 4


Learning Blended, Precise Semantic Program Embeddings

Learning neural program embeddings is key to utilizing deep neural netwo...

A Hybrid Approach for Learning Program Representations

Learning neural program embedding is the key to utilizing deep neural ne...

GRAPHSPY: Fused Program Semantic-Level Embedding via Graph Neural Networks for Dead Store Detection

Production software oftentimes suffers from the issue of performance ine...

A Loop-Based Methodology for Reducing Computational Redundancy in Workload Sets

The design of general purpose processors relies heavily on a workload ga...

Learning Execution through Neural Code Fusion

As the performance of computer systems stagnates due to the end of Moore...

MicroGrad: A Centralized Framework for Workload Cloning and Stress Testing

We present MicroGrad, a centralized automated framework that is able to ...

GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation

Analytical hardware performance models yield swift estimation of desired...

Please sign up or login with your details

Forgot password? Click here to reset