Tagged "llm-performance"
- AMD's vLLM-ATOM Plugin Supercharges DeepSeek-R1 and Kimi-K2 Inference on MI350/MI400
- DFlash Doubles Token Generation Speed of Qwen3.5 27B on Mac M5 Max
- Achieving 2000 Tokens Per Second with QWEN 3.5 27B on RTX-5090
- Local LLMs on Apple Silicon Mac 2026: M1 M2 M3 Guide
- Mojo: Creating a Programming Language for an AI World with Chris Lattner
- Every agent framework has the same bug – prompt decay. Here's a fix
- Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues