Tagged "benchmark-report"
- vLLM vs Ollama 2026: Performance Benchmark Reveals 9x Throughput Gap
- M5 Max MacBook Runs Local Large Language Models Efficiently
- 110 Tokens/Second on RTX 4070 Super with Qwen 3.6 35B
- A/B Tested Gemini 3.1 Pro vs. Claude Opus 4.6 – Usage Quota and Quality Comparison
- Benchmarking a Portable AI Workstation: Lenovo ThinkPad P16 Gen 3, Part 2
- Hardware LLM Taalas Reaches >14,000 TPS on Llama 3.1 8B
- Bito's AI Architect Improves Claude Opus Task Success Rate by 35%
- ROCm 7.2.3 Delivers Performance Improvements Over 7.0.0 on AMD Radeon AI PRO
- $200 NVIDIA V100 Server GPU Mod Beats RTX 3060 in Local LLM Test
- Small On-Device AI Model Beats Claude Sonnet 4.5 and GPT-5
- NIST's CAISI Evaluation of DeepSeek V4 Pro Finds It On Par with GPT-5
- Xmemory: Benchmarking Structured AI Memory Against RAG and Hybrid RAG
- Linux Crushes Windows on llama.cpp Inference by Double Digits
- LLMs Consume 5.4x Less Mobile Energy Than Ad-Supported Web Search
- Speculative Decoding Achieves 29% Speed Boost for Gemma-4 31B
- MiniMax-M2.7 Delivers Exceptional Performance on Consumer Hardware
- Intel Arc Pro B70 32GB Achieves 12 Tokens/Sec on Qwen 3.5-27B
- Gemma 4 31B vs Qwen 3.5 27B: Comprehensive Long Context Benchmark
- Warp Decode vs. vLLM's Triton Kernel: Performance Crossover Analysis
- Qwen 3.5 122B Achieves 198 Tokens/sec on Dual RTX PRO 6000 Blackwell GPUs
- Mano-P: Open-Source On-Device GUI Agent, #1 on OSWorld Benchmark
- Show HN: Willitrun – Check if Any ML Model Runs on Any Device (Benchmark-Backed)
- Comprehensive Benchmark: 37 LLMs Tested on MacBook Air M5 With Open-Source Tool
- Gemma 4 Achieves Top Multilingual Performance Across European Languages
- Quantization Strategy Comparison: Balancing Quality and Speed on Consumer Laptops
- Gemma 4 31B Achieves Third Place on FoodTruck Bench, Beating Larger Models
- YC-Bench: GLM-5 Matches Claude Opus 4.6 at 11× Lower Cost
- Gemma 4 31B Outperforms GLM 5.1 in Real-World Testing
- April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini
- Qwen 3.5-27B Demonstrates Superior Performance vs Gemini 3.1 Pro and GPT-5.3
- M5 Max Delivers 1.7x Faster Inference Than M3 Max on Qwen 3.5 Models
- Forensic Beats Mem0 with 90.1% on LOCOMO Benchmark
- TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice
- Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config
- Comparison of Two Frameworks: 40% Token Efficiency Improvement
- Real-World Benchmark: DeepSeek-V3 Matches Claude Sonnet on Routine Coding Tasks
- Llama.cpp Benchmark: RTX 5090 vs Enterprise Systems Compared