_2025-02-27_11:58:26_2025-02-27 11:58:26
DeepSeek开源周总结和感悟【更新至第三天】 - 知乎
TAGs:处理器 异构计算 软硬协同 大模型
Summary: This text is a summary of a blog post about DeepSeek, an open-source project that optimizes AI performance on specific hardware. The author expresses their feelings about the challenges of completing such work in foreign AI companies due to hardware restrictions. They highlight three projects, FlashMLA, DeepEP, and DeepGEMM, and their contributions to improving AI performance on limited hardware. The author emphasizes the importance of understanding both AI models and hardware for optimal performance and the potential impact of DeepSeek on the industry.本文是关于 DeepSeek 的博客文章的摘要,DeepSeek 是一个开源项目,可优化特定硬件上的 AI 性能。作者表达了他们对由于硬件限制而在国外 AI 公司完成此类工作所面临的挑战的感受。他们重点介绍了 FlashMLA、DeepEP 和 DeepGEMM 三个项目,以及它们对在有限硬件上提高 AI 性能的贡献。作者强调了了解 AI 模型和硬件以实现最佳性能的重要性,以及 DeepSeek 对行业的潜在影响。
_2025-02-28_18:20:02_2025-02-28 18:20:02
deepseek-ai_DeepGEMM: DeepGEMM: 干净高效的 FP8 GEMM 内核,具有细粒度扩展 --- deepseek-ai_DeepGEMM_ DeepGEMM_ clean and efficient FP8 GEMM ker
TAGs:大模型
Summary: DeepGEMM is a library designed for clean and efficient FP8 General Matrix Multiplications (GEMMs) with fine-grained scaling, using Hopper architecture GPUs and CUDA 12.3 or above. It supports both normal and Mix-of-Experts (MoE) grouped GEMMs, with various optimizations such as warp-specialization, TMA features, and a fully JIT design. The library also provides utility functions and environment variables.DeepGEMM 是一个库,旨在使用 Hopper 架构 GPU 和 CUDA 12.3 或更高版本,实现干净高效的 FP8 通用矩阵乘法 (GEMM),进行精细缩放。它支持普通和混合专家 (MoE) 分组 GEMM,具有各种优化,例如 warp 专业化、TMA 功能和完全 JIT 设计。该库还提供实用程序函数和环境变量。