At the 30th Asia and South Pacific Design Automation Conference (ASPDAC ’25), we presented a cache-free, NUMA-based heterogeneous architecture tailored to the cyclical and modular nature of wireless baseband processing (WBP). Leveraging a novel β€œpack-and-ship” data dispatching strategy and a multi-level dataflow scheduling model, our system improves data locality and reduces memory latency. Experimental results show up to 2.3Γ— speedup in single-tile performance and a link-level throughput of 288 Mbps, demonstrating strong scalability and superior performance compared to graphics processing units (GPUs) and digital signal processors (DSPs) baselines.

  Check out our paper at : https://dl.acm.org/doi/abs/10.1145/3658617.3697558

Overview of proposed design.
Tile-level scheduling scheme.