A parallel branch-and-bound algorithm for register-pressure aware instruction scheduling

Jeffrey Byrnes

Back

A parallel branch-and-bound algorithm for register-pressure aware instruction scheduling

Thesis

Open access

A parallel branch-and-bound algorithm for register-pressure aware instruction scheduling

Jeffrey Byrnes

California State University, Sacramento

Master of Science (MS), California State University, Sacramento

08/28/2024

Handle:

https://hdl.handle.net/20.500.12741/rep:12320

Abstract

Code optimization

Compilers (Computer programs)

Instruction scheduling

Computer Engineering

Computer Science

Parallel Algorithms

Producing a good instruction schedule is vital for the run-time performance of compiler generated executables. Unfortunately, instruction scheduling is an NP hard problem. Since the overall compilation time is a primary design consideration, production compilers typically use greedy algorithms for instruction scheduling. While such algorithms typically produce good schedules, they are not provably optimal, and prior research has found that such algorithms may produce sub optimal schedules [1] [2]. Exact algorithms for this problem have super-polynomial complexity, and may not terminate within a reasonable compile time. Many techniques have been proposed to improve the performance of exact algorithms [3][4][5], but even the best-known exact algorithms often fail to find optimal schedules within a reasonable compile time. In this work, we will propose a novel algorithm which takes advantage of current multi-core processors and parallelizes an existing state-of-the-art exact scheduling algorithm that uses branch-and-bound. The value of the parallel branch-and-bound algorithm lies in its ability to find optimal schedules of instructions more quickly than the sequential algorithm. Our experimental evaluation using AMD’s rocPRIM benchmark on a 16 core AMD ThreadRipper CPU shows an average speedup of 6.1 across all regions of interest, which amounts to a speedup of 1.2 in overall scheduling time. These speedups are achieved while maintaining the same execution-time improvement of the sequential algorithm, which gives an average improvement of 5% in the execution times of the resultant executables when run on an AMD Vega20 GPU.

Files and links (1)

pdf

ByrnesJeffrey_Spring2024_Thesis701.22 kBDownload View

TextThesis Open Access

Metrics

1 Record Views

Details

Title: A parallel branch-and-bound algorithm for register-pressure aware instruction scheduling
Creators: Jeffrey Byrnes
Contributors: Ghassan Shobaki (Advisor)
Ted Krovetz (Committee Member)
Jinsong Ouyang (Committee Member)
Academic Unit: Computer Science Department
Theses and Dissertations: Master of Science (MS); Computer Science; California State University, Sacramento; 05/01/2024; 2024
Publisher: California State University, Sacramento
Publication Details: 08/28/2024
Identifiers: 99258162061801671; https://hdl.handle.net/20.500.12741/rep:12320
Resource Type: Masters Thesis
Language: English
Number of pages: 95
Comment: The accessibility of this document has been verified by Sacramento State University Library. For questions, please contact lib-508Accessibility@csus.edu.