Tagged "benchmarks"

South Korea Science Ministry Seeks Five On-Device AI Pilot Projects for Public Services 24 March 2026
KV Cache Quantization Levels Benchmarked on SWE-bench: Practical Trade-offs for Local Inference 24 March 2026
FlashAttention-4 Delivers 2.7x Faster Inference with 1613 TFLOPs/s on Blackwell GPUs 24 March 2026
MiniMax M2.7 Model to Be Released as Open Weights 23 March 2026
Llama.cpp ROCm 7 vs Vulkan Performance Benchmarks on AMD Mi50 23 March 2026
Llama 8B Matches 70B Performance on Multi-Hop QA Using Structured Prompting 22 March 2026
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B 22 March 2026
Qwen 3.5 397B emerges as top-performing local coding model 21 March 2026
Multi-Token Prediction support coming to MLX-LM for Qwen 3.5 21 March 2026
Apple M5 Max 128GB real-world performance benchmarks for local inference 21 March 2026
DeepSeek R1 RTX 4090 vs Apple M3 Max: Benchmark & Performance Guide 21 March 2026
Build a $1,500 AI Server with DeepSeek-R1 on RTX 4090 21 March 2026
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor 20 March 2026
My Dinner with AI 18 March 2026
Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection 18 March 2026
I Ran Local LLMs on a 'Dead' GPU, and the Results Surprised Me 17 March 2026
Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks 17 March 2026
Two Local Models Prove Competitive Enough to Replace ChatGPT, Gemini, and Copilot 15 March 2026
Achieving 2000 Tokens Per Second with QWEN 3.5 27B on RTX-5090 14 March 2026
P-EAGLE: Faster LLM Inference with Parallel Speculative Decoding in vLLM 14 March 2026
Runpod Report: Qwen Has Overtaken Meta's Llama As The Most-Deployed Self-Hosted LLM 13 March 2026
Comprehensive MoE Backend Benchmarks for Qwen3.5-397B: Real Numbers vs Hype 12 March 2026
Apple M5 Max 128GB Benchmark Results for Local LLM Inference 12 March 2026
Simple Layer Duplication Technique Achieves Top Open LLM Leaderboard Performance 11 March 2026
Qwen 3.5-35B Uncensored GGUF Models Now Available 11 March 2026
HP OMEN MAX 16 Review: Is Local AI on a Laptop Viable in 2026? 10 March 2026
Fine-Tuned Qwen SLMs (0.6–8B) Demonstrate Competitive Performance Against Frontier LLMs on Specialized Tasks 10 March 2026
M5 Max and M5 Ultra Chipsets Demonstrate Significant Bandwidth Improvements for Local LLM Inference 10 March 2026
Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2 9 March 2026
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models 9 March 2026
FretBench – Testing 14 LLMs on Reading Guitar Tabs Reveals Performance Gaps 9 March 2026
Qwen 3.5 27B Achieves Strong Local Inference Performance 8 March 2026
Benchmark: Local Open-Source LLMs Competitive in Real-Time Trading Applications 8 March 2026
AI Agent Reliability Tracker 8 March 2026
Qwen3-Coder-Next Achieves Top Ranking on SWE-bench at Pass@5 7 March 2026
Qwen 3.5-35B-A3B Achieves 37.8% on SWE-bench Verified Hard 4 March 2026
Qwen 3.5-27B Q4 Quantization Comparison and Analysis 4 March 2026
Qwen 3.5 vs Qwen 3 Benchmark Analysis: Generational Performance Improvements Visualized 3 March 2026
Running Local AI Models on Mac Studio 128GB: 4B, 20B & 120B Tested 2 March 2026
Local LLM Performance Improvements: A Year of Progress Since DeepSeek R1 Moment 2 March 2026
Google Research Finds Longer Chain-of-Thought Correlates Negatively With Accuracy 1 March 2026
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels 28 February 2026
Qwen3.5-35B RTX 5080 Experiments Confirm KV q8_0 as Free Lunch, Q4_K_M Remains Optimal 28 February 2026
Qwen 3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Quantisation Benchmarks 28 February 2026
Qwen 3.5-35B RTX 5080 Benchmarks Confirm KV Q8_0 as Free Lunch, Q4_K_M Remains Optimal 28 February 2026
The ML.energy Leaderboard 28 February 2026
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot 28 February 2026
Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti 26 February 2026
Qwen 3.5 Underperforms on Hard Coding Tasks—APEX Benchmark Analysis 26 February 2026
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference 26 February 2026
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers 25 February 2026
Show HN: 100% LLM Accuracy–No Fine-Tuning, JSON Only 25 February 2026
Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods 25 February 2026
The Real AI Competition Is Closed-Source vs Open-Source, Not America vs China 24 February 2026
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy 24 February 2026
Which Web Frameworks Are Most Token-Efficient for AI Agents? 23 February 2026
How Do You Know Which SKILL.md Is Good? 23 February 2026
Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio 23 February 2026
GLM-5 Becomes Top Open-Weights Model on Extended NYT Connections Benchmark 23 February 2026
How Slow Local LLMs Are on My Framework 13 AMD Strix Point 22 February 2026
Asus ExpertBook B3 G2 with 50 TOPS AI Sets New Enterprise Standard 22 February 2026
Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder 21 February 2026
I Run Local LLMs in One of the World's Priciest Energy Markets, and I Can Barely Tell 21 February 2026
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels 21 February 2026
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro 20 February 2026
Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System 20 February 2026
The Path to Ubiquitous AI (17k tokens/sec) 20 February 2026
Free ASIC-Accelerated Llama 3.1 8B Inference at 16,000 Tokens/Second 20 February 2026
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs 19 February 2026
GPT4All Replaces Ollama On Mac After Quick Trial 19 February 2026
Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference 19 February 2026
Alibaba's Qwen3.5-397B Achieves #3 Position in Open Weights Model Rankings 18 February 2026
Real-World Coding Benchmark Tests LLMs on 65 Production Codebase Tasks 18 February 2026
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation 17 February 2026
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment 14 February 2026
Optimal llama.cpp Settings Found for Qwen3 Coder Next Loop Issues 13 February 2026
MiniMax M2.5: 230B Parameter MoE Model Coming to HuggingFace 13 February 2026
Running Your Own AI Assistant for €19/Month: Complete Self-Hosting Guide 12 February 2026
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second 12 February 2026
OpenClaw with vLLM Running for Free on AMD Developer Cloud 12 February 2026
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams 12 February 2026
Use Recursive Language Models to address huge contexts for local LLM 12 February 2026