Tagged "llm-performance"

AMD's vLLM-ATOM Plugin Supercharges DeepSeek-R1 and Kimi-K2 Inference on MI350/MI400 12 May 2026
DFlash Doubles Token Generation Speed of Qwen3.5 27B on Mac M5 Max 15 April 2026
Achieving 2000 Tokens Per Second with QWEN 3.5 27B on RTX-5090 14 March 2026
Local LLMs on Apple Silicon Mac 2026: M1 M2 M3 Guide 14 March 2026
Mojo: Creating a Programming Language for an AI World with Chris Lattner 7 March 2026
Every agent framework has the same bug – prompt decay. Here's a fix 26 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 14 February 2026