Brief
<aside>
⭐ A software-hardware cooptimization for efficient triangle counting.
</aside>
Main Contribution
- To reduce the off-chip memory access and the number of set intersections, we propose the force-based node index reorder method. The proposed reorder method simultaneously optimizes both data locality and the number of set intersections. The reorder method reduces the off-chip memory access and the set intersection by 61% and 64%, respectively, while providing 2.19× end-to-end speedup.
- To improve the parallelism of set intersections, we propose the first CAM-based TC architecture. The CAM-based processing elements (PEs) reduce the time complexity of
set intersections from O(m + n) or O(n log m) to O(n).
- Experimental results show that, compared with the state-of-the-art CPU, GPU, and processing-in-memory (PIM) based TC designs, CLAP can achieve 39×, 27×, and 78×
speedup, respectively.
Authors
Tianyu Fu, Chiyue Wei, Zhenhua Zhu, Shang Yang, Zhongming Yu, Guohao Dai, Huazhong Yang, and Yu Wang
Files
DATE_submission.pdf
Video
760_Video.mp4
Code
https://github.com/thu-nics/CLAP-triangle-counting
Slides
CLAP_slides.pdf
Poster
CLAP_poster_final.pdf
Main Content