Tagged "vllm"
- Developer Builds Fully Local Multi-Agent System Using vLLM and Parallel Inference
- Build a $1,500 AI Server with DeepSeek-R1 on RTX 4090
- Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models
- LMCache Dramatically Accelerates LLM Inference on Oracle Data Science Platform
- Kimi Introduces Attention Residuals: 1.25x Compute Performance at <2% Overhead
- OpenClaw vs Eigent vs Claude Cowork: Comparing Open-Source AI Collaboration Platforms
- AMD Launches Agent System Optimized for Local AI Inference With Ryzen and Radeon
- P-EAGLE: Faster LLM Inference with Parallel Speculative Decoding in vLLM
- Runpod Report: Qwen Has Overtaken Meta's Llama As The Most-Deployed Self-Hosted LLM
- Intel Updates LLM-Scaler-vLLM With Support For More Qwen3/3.5 Models
- How to Install OpenClaw with Ollama (Step-by-Step Tutorial)
- Nvidia Pushes Jetson as Edge Hub for Open AI Models
- Cutile.jl Brings Nvidia CUDA Tile-Based Programming to Julia
- Show HN: Aver – a Language Designed for AI to Write and Humans to Review
- Nemotron 9B Powers Large-Scale Local Inference: Patent Classification and Real-Time Applications
- HP Refreshes Lineup with AI-Focused Workstations
- Intel Arc Pro B70 Workstation GPU Confirmed via vLLM AI Release Notes
- Framework Choice Critical: llama.cpp and vLLM Outperform Ollama for Qwen 3.5 Testing
- AMD Expands Ryzen AI 400 Series Portfolio for Consumer and Enterprise AI PC Options
- Huawei's SuperPoD Portfolio Creates New Option for Global Computing at MWC Barcelona 2026
- DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference
- DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference
- Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers
- Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
- LayerScale Launches Inference Engine Faster Than vLLM, SGLang, and TRT-LLM
- Self-Hosted AI: A Complete Roadmap for Beginners
- Open-Source Models Now Comprise 4 of Top 5 Most-Used Endpoints on OpenRouter
- High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference
- Critical vLLM RCE Vulnerability Allows Remote Code Execution via Video Links
- OpenClaw with vLLM Running for Free on AMD Developer Cloud
- Heaps Do Lie: Debugging a Memory Leak in vLLM
- Mistral AI Debugs Critical Memory Leak in vLLM Inference Engine