Tagged "datacenter-gpu"
- Llama.cpp ROCm 7 vs Vulkan Performance Benchmarks on AMD Mi50
- Rust Project Perspectives on AI
- ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B
- Custom GPU Multiplexer Achieves 0.3ms Model Switching on Legacy Hardware
- Qwen3.5-397B Achieves 282 tok/s on 4x RTX PRO 6000 Blackwell Through Custom CUTLASS Kernel
- Nvidia's Nemotron 3 Super: Understanding the Significance for Local LLM Deployment
- Sarvam Open-Sources 30B and 105B Reasoning Models
- Comprehensive MoE Backend Benchmarks for Qwen3.5-397B: Real Numbers vs Hype
- Cutile.jl Brings Nvidia CUDA Tile-Based Programming to Julia
- Sarvam Open-Sources 30B and 105B Reasoning Models
- Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models
- Intel Arc Pro B70 Workstation GPU Confirmed via vLLM AI Release Notes
- Google Is Exploring Ways to Use Its Financial Might to Take on Nvidia
- NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support
- AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs
- High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference
- OpenClaw with vLLM Running for Free on AMD Developer Cloud