🛠 Hot: xlite-dev | lite.ai.toolkit | Awesome-LLM-Inference | LeetCUDA | ffpa-attn 🎧
🤖 Contact: [email protected] | GitHub: DefTruth | Zhihu(知乎): DefTruth 📞
❤ I love open source, bro, and I think you do too. ❤
🛠 Hot: xlite-dev | lite.ai.toolkit | Awesome-LLM-Inference | LeetCUDA | ffpa-attn 🎧
🤖 Contact: [email protected] | GitHub: DefTruth | Zhihu(知乎): DefTruth 📞
❤ I love open source, bro, and I think you do too. ❤
📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉
📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.
🛠 A lite C++ AI toolkit: 100+🎉 models (Stable-Diffusion, FaceFusion, YOLO series, Det, Seg, Matting) with MNN, ORT and TensorRT.
📒统计学习方法-李航: 笔记-从原理到实现, 200-page PDF Notes with detailed explanations of various math formulas, implemented in R.🎉
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.