- This event has passed.
ESE PhD Thesis Defense: “Software/Hardware Co-optimization for Computer Systems with 3D-stacking Memories”
August 3, 2023 at 3:00 PM - 5:00 PM
Emerging 3D memory technologies, such as the Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM), provide high bandwidth and massive memory-level parallelism. With the growing heterogeneity and complexity of computer systems (CPU cores and accelerators, etc.), efficiently integrating emerging memories into existing systems poses new challenges to both algorithm, hardware and system. This dissertation explores the application-aware system optimization techniques for 3D-stacking memory in both domain-specific accelerators (DSAs) and general-purpose computer systems. The first part of the dissertation presents a standalone 3D-Stacking memory-based graph accelerator that can achieve 45.8 billion traversed edges per second (TEPS) by co-optimizing the algorithm and the hardware architecture. We first present the modifications of algorithm and a platform-aware graph processing architecture to perform level-synchronized breadth first search (BFS) on FPGA-HMC platform. To gain better insights into the potential bottlenecks of proposed implementation, we develop an analytical performance model to quantitatively evaluate the HMC access latency and corresponding BFS performance. Based on the analysis, we propose a two-level bitmap scheme to reduce memory access and perform optimization on key design parameters (e.g. memory access granularity). Then, we leverage the inherent graph property i.e. vertex degree to co-optimize algorithm and hardware architecture. In particular, we first develop two algorithm optimization techniques: degree-aware adjacency list reordering and degree-aware vertex index sorting and two platform-dependent hardware optimization techniques, namely degree-aware data placement and degree-aware adjacency list compression. These two techniques together substantially reduce the amount of access to external memory. Finally, we conduct extensive experiments on an FPGA-HMC platform to verify the effectiveness of the proposed techniques. In the second part of this dissertation, we develop machine learning methods that can automatically identify access patterns of major variables in a program. These methods can then cluster these variables with similar access patterns to reduce the overhead for SDAM. Our evaluation on standard CPU benchmarks and data-intensive benchmarks (for both CPU and accelerators) demonstrates a 1.41x and1.84x speedup on CPU and a 2.58x speedup on near-memory accelerators in our system with SDAM, compared to a baseline system.
Jialiang Zhang
ESE Ph.D. Candidate
Jialiang Zhang is a Ph.D. candidate at the Electrical and Systems Engineering department of the University of Pennsylvania. His research focuses on hardware acceleration of big data and machine learning applications using FPGA and emerging memory technologies. He received his B.E. degree from the University of Electronic Science and Technology of China. He was admitted as a Ph.D. student at the University of Pennsylvania in 2020 to pursue his interest in FPGA research.