Tagged "vllm"

Developer Builds Fully Local Multi-Agent System Using vLLM and Parallel Inference 22 March 2026
Build a $1,500 AI Server with DeepSeek-R1 on RTX 4090 21 March 2026
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models 20 March 2026
LMCache Dramatically Accelerates LLM Inference on Oracle Data Science Platform 20 March 2026
Kimi Introduces Attention Residuals: 1.25x Compute Performance at <2% Overhead 17 March 2026
OpenClaw vs Eigent vs Claude Cowork: Comparing Open-Source AI Collaboration Platforms 15 March 2026
AMD Launches Agent System Optimized for Local AI Inference With Ryzen and Radeon 15 March 2026
P-EAGLE: Faster LLM Inference with Parallel Speculative Decoding in vLLM 14 March 2026
Runpod Report: Qwen Has Overtaken Meta's Llama As The Most-Deployed Self-Hosted LLM 13 March 2026
Intel Updates LLM-Scaler-vLLM With Support For More Qwen3/3.5 Models 13 March 2026
How to Install OpenClaw with Ollama (Step-by-Step Tutorial) 13 March 2026
Nvidia Pushes Jetson as Edge Hub for Open AI Models 12 March 2026
Cutile.jl Brings Nvidia CUDA Tile-Based Programming to Julia 12 March 2026
Show HN: Aver – a Language Designed for AI to Write and Humans to Review 11 March 2026
Nemotron 9B Powers Large-Scale Local Inference: Patent Classification and Real-Time Applications 9 March 2026
HP Refreshes Lineup with AI-Focused Workstations 8 March 2026
Intel Arc Pro B70 Workstation GPU Confirmed via vLLM AI Release Notes 3 March 2026
Framework Choice Critical: llama.cpp and vLLM Outperform Ollama for Qwen 3.5 Testing 3 March 2026
AMD Expands Ryzen AI 400 Series Portfolio for Consumer and Enterprise AI PC Options 2 March 2026
Huawei's SuperPoD Portfolio Creates New Option for Global Computing at MWC Barcelona 2026 1 March 2026
DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference 26 February 2026
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference 26 February 2026
Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers 24 February 2026
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference 23 February 2026
LayerScale Launches Inference Engine Faster Than vLLM, SGLang, and TRT-LLM 19 February 2026
Self-Hosted AI: A Complete Roadmap for Beginners 17 February 2026
Open-Source Models Now Comprise 4 of Top 5 Most-Used Endpoints on OpenRouter 17 February 2026
High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference 17 February 2026
Critical vLLM RCE Vulnerability Allows Remote Code Execution via Video Links 14 February 2026
OpenClaw with vLLM Running for Free on AMD Developer Cloud 12 February 2026
Heaps Do Lie: Debugging a Memory Leak in vLLM 12 February 2026
Mistral AI Debugs Critical Memory Leak in vLLM Inference Engine 11 February 2026