未命名

Created2026-03-31|Updated2026-04-01|AI Infra工程架构向

|Post Views:|Comments:

把FP32（32位浮点）变成FP16甚至INT8（8位整数）运行，且精度损失不大。Post-Training Quantization (PTQ)。了解对称/非对称量化。
找一个开源的ResNet或者简单的Transformer模型，把它导出为ONNX格式，然后用TensorRT加速推理。记录加速前后的Latency（延迟）和Throughput（吞吐量）。
把CUDA代码迁移到Ascend上，踩过哪些坑，最后怎么解决的

Author: gyx

Link: https://gyx47.github.io/%E6%9C%AA%E5%91%BD%E5%90%8D/

Copyright Notice: All articles on this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated.

AI 架构性能 CUDA 橱心

Sponsor

alipay

Related Articles

计算机体系结构

向量搜索前沿调研

Comments