GPU

Created2026-03-31|Updated2026-04-01|AI Infra工程架构向

|Post Views:|Comments:

Programming Massively Parallel Processors》（大规模并行处理器编程实战）

会写Kernel函数。
懂得如何利用 Shared Memory 减少显存访问。
理解 Thread Warp Divergence（线程束分歧）怎么拖慢速度。
基本线性代数运算库 (BLAS)：简单的矩阵乘法（GEMM），先用CPU写三层循环，再用AVX优化，最后用CUDA搬到GPU上优化。

Author: gyx

Link: https://gyx47.github.io/gpu/

Copyright Notice: All articles on this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated.

AI 架构性能 CUDA 橱心

Sponsor

alipay

Related Articles

计算机体系结构

Comments