截屏2022-12-11 21.56.24.png

Brief

<aside> ⭐ A software-hardware cooptimization for efficient triangle counting.

</aside>

To reduce the off-chip memory access and the number of set intersections, we propose the force-based node index reorder method. The proposed reorder method simultaneously optimizes both data locality and the number of set intersections. The reorder method reduces the off-chip memory access and the set intersection by 61% and 64%, respectively, while providing 2.19× end-to-end speedup.
To improve the parallelism of set intersections, we propose the first CAM-based TC architecture. The CAM-based processing elements (PEs) reduce the time complexity of set intersections from O(m + n) or O(n log m) to O(n).
Experimental results show that, compared with the state-of-the-art CPU, GPU, and processing-in-memory (PIM) based TC designs, CLAP can achieve 39×, 27×, and 78× speedup, respectively.

Tianyu Fu, Chiyue Wei, Zhenhua Zhu, Shang Yang, Zhongming Yu, Guohao Dai, Huazhong Yang, and Yu Wang

Main Content