Tagged "benchmark-report"

vLLM vs Ollama 2026: Performance Benchmark Reveals 9x Throughput Gap 25 May 2026
M5 Max MacBook Runs Local Large Language Models Efficiently 23 May 2026
110 Tokens/Second on RTX 4070 Super with Qwen 3.6 35B 22 May 2026
A/B Tested Gemini 3.1 Pro vs. Claude Opus 4.6 – Usage Quota and Quality Comparison 22 May 2026
Benchmarking a Portable AI Workstation: Lenovo ThinkPad P16 Gen 3, Part 2 21 May 2026
Hardware LLM Taalas Reaches >14,000 TPS on Llama 3.1 8B 21 May 2026
Bito's AI Architect Improves Claude Opus Task Success Rate by 35% 19 May 2026
ROCm 7.2.3 Delivers Performance Improvements Over 7.0.0 on AMD Radeon AI PRO 15 May 2026
$200 NVIDIA V100 Server GPU Mod Beats RTX 3060 in Local LLM Test 11 May 2026
Small On-Device AI Model Beats Claude Sonnet 4.5 and GPT-5 10 May 2026
NIST's CAISI Evaluation of DeepSeek V4 Pro Finds It On Par with GPT-5 3 May 2026
Xmemory: Benchmarking Structured AI Memory Against RAG and Hybrid RAG 1 May 2026
Linux Crushes Windows on llama.cpp Inference by Double Digits 27 April 2026
LLMs Consume 5.4x Less Mobile Energy Than Ad-Supported Web Search 25 April 2026
Speculative Decoding Achieves 29% Speed Boost for Gemma-4 31B 13 April 2026
MiniMax-M2.7 Delivers Exceptional Performance on Consumer Hardware 13 April 2026
Intel Arc Pro B70 32GB Achieves 12 Tokens/Sec on Qwen 3.5-27B 11 April 2026
Gemma 4 31B vs Qwen 3.5 27B: Comprehensive Long Context Benchmark 11 April 2026
Warp Decode vs. vLLM's Triton Kernel: Performance Crossover Analysis 10 April 2026
Qwen 3.5 122B Achieves 198 Tokens/sec on Dual RTX PRO 6000 Blackwell GPUs 10 April 2026
Mano-P: Open-Source On-Device GUI Agent, #1 on OSWorld Benchmark 9 April 2026
Show HN: Willitrun – Check if Any ML Model Runs on Any Device (Benchmark-Backed) 7 April 2026
Comprehensive Benchmark: 37 LLMs Tested on MacBook Air M5 With Open-Source Tool 7 April 2026
Gemma 4 Achieves Top Multilingual Performance Across European Languages 7 April 2026
Quantization Strategy Comparison: Balancing Quality and Speed on Consumer Laptops 6 April 2026
Gemma 4 31B Achieves Third Place on FoodTruck Bench, Beating Larger Models 5 April 2026
YC-Bench: GLM-5 Matches Claude Opus 4.6 at 11× Lower Cost 4 April 2026
Gemma 4 31B Outperforms GLM 5.1 in Real-World Testing 4 April 2026
April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini 3 April 2026
Qwen 3.5-27B Demonstrates Superior Performance vs Gemini 3.1 Pro and GPT-5.3 1 April 2026
M5 Max Delivers 1.7x Faster Inference Than M3 Max on Qwen 3.5 Models 28 March 2026
Forensic Beats Mem0 with 90.1% on LOCOMO Benchmark 28 March 2026
TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice 27 March 2026
Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config 27 March 2026
Comparison of Two Frameworks: 40% Token Efficiency Improvement 27 March 2026
Real-World Benchmark: DeepSeek-V3 Matches Claude Sonnet on Routine Coding Tasks 26 March 2026
Llama.cpp Benchmark: RTX 5090 vs Enterprise Systems Compared 25 March 2026