Tagged "benchmarks"
-
South Korea Science Ministry Seeks Five On-Device AI Pilot Projects for Public Services
-
KV Cache Quantization Levels Benchmarked on SWE-bench: Practical Trade-offs for Local Inference
-
FlashAttention-4 Delivers 2.7x Faster Inference with 1613 TFLOPs/s on Blackwell GPUs
-
MiniMax M2.7 Model to Be Released as Open Weights
-
Llama.cpp ROCm 7 vs Vulkan Performance Benchmarks on AMD Mi50
-
Llama 8B Matches 70B Performance on Multi-Hop QA Using Structured Prompting
-
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B
-
Qwen 3.5 397B emerges as top-performing local coding model
-
Multi-Token Prediction support coming to MLX-LM for Qwen 3.5
-
Apple M5 Max 128GB real-world performance benchmarks for local inference
-
DeepSeek R1 RTX 4090 vs Apple M3 Max: Benchmark & Performance Guide
-
Build a $1,500 AI Server with DeepSeek-R1 on RTX 4090
-
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor
-
My Dinner with AI
-
Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection
-
I Ran Local LLMs on a 'Dead' GPU, and the Results Surprised Me
-
Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks
-
Two Local Models Prove Competitive Enough to Replace ChatGPT, Gemini, and Copilot
-
Achieving 2000 Tokens Per Second with QWEN 3.5 27B on RTX-5090
-
P-EAGLE: Faster LLM Inference with Parallel Speculative Decoding in vLLM
-
Runpod Report: Qwen Has Overtaken Meta's Llama As The Most-Deployed Self-Hosted LLM
-
Comprehensive MoE Backend Benchmarks for Qwen3.5-397B: Real Numbers vs Hype
-
Apple M5 Max 128GB Benchmark Results for Local LLM Inference
-
Simple Layer Duplication Technique Achieves Top Open LLM Leaderboard Performance
-
Qwen 3.5-35B Uncensored GGUF Models Now Available
-
HP OMEN MAX 16 Review: Is Local AI on a Laptop Viable in 2026?
-
Fine-Tuned Qwen SLMs (0.6–8B) Demonstrate Competitive Performance Against Frontier LLMs on Specialized Tasks
-
M5 Max and M5 Ultra Chipsets Demonstrate Significant Bandwidth Improvements for Local LLM Inference
-
Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2
-
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models
-
FretBench – Testing 14 LLMs on Reading Guitar Tabs Reveals Performance Gaps
-
Qwen 3.5 27B Achieves Strong Local Inference Performance
-
Benchmark: Local Open-Source LLMs Competitive in Real-Time Trading Applications
-
AI Agent Reliability Tracker
-
Qwen3-Coder-Next Achieves Top Ranking on SWE-bench at Pass@5
-
Qwen 3.5-35B-A3B Achieves 37.8% on SWE-bench Verified Hard
-
Qwen 3.5-27B Q4 Quantization Comparison and Analysis
-
Qwen 3.5 vs Qwen 3 Benchmark Analysis: Generational Performance Improvements Visualized
-
Running Local AI Models on Mac Studio 128GB: 4B, 20B & 120B Tested
-
Local LLM Performance Improvements: A Year of Progress Since DeepSeek R1 Moment
-
Google Research Finds Longer Chain-of-Thought Correlates Negatively With Accuracy
-
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels
-
Qwen3.5-35B RTX 5080 Experiments Confirm KV q8_0 as Free Lunch, Q4_K_M Remains Optimal
-
Qwen 3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Quantisation Benchmarks
-
Qwen 3.5-35B RTX 5080 Benchmarks Confirm KV Q8_0 as Free Lunch, Q4_K_M Remains Optimal
-
The ML.energy Leaderboard
-
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot
-
Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti
-
Qwen 3.5 Underperforms on Hard Coding Tasks—APEX Benchmark Analysis
-
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference
-
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers
-
Show HN: 100% LLM Accuracy–No Fine-Tuning, JSON Only
-
Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods
-
The Real AI Competition Is Closed-Source vs Open-Source, Not America vs China
-
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy
-
Which Web Frameworks Are Most Token-Efficient for AI Agents?
-
How Do You Know Which SKILL.md Is Good?
-
Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio
-
GLM-5 Becomes Top Open-Weights Model on Extended NYT Connections Benchmark
-
How Slow Local LLMs Are on My Framework 13 AMD Strix Point
-
Asus ExpertBook B3 G2 with 50 TOPS AI Sets New Enterprise Standard
-
Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder
-
I Run Local LLMs in One of the World's Priciest Energy Markets, and I Can Barely Tell
-
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels
-
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro
-
Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System
-
The Path to Ubiquitous AI (17k tokens/sec)
-
Free ASIC-Accelerated Llama 3.1 8B Inference at 16,000 Tokens/Second
-
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
-
GPT4All Replaces Ollama On Mac After Quick Trial
-
Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference
-
Alibaba's Qwen3.5-397B Achieves #3 Position in Open Weights Model Rankings
-
Real-World Coding Benchmark Tests LLMs on 65 Production Codebase Tasks
-
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
-
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment
-
Optimal llama.cpp Settings Found for Qwen3 Coder Next Loop Issues
-
MiniMax M2.5: 230B Parameter MoE Model Coming to HuggingFace
-
Running Your Own AI Assistant for €19/Month: Complete Self-Hosting Guide
-
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
-
OpenClaw with vLLM Running for Free on AMD Developer Cloud
-
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams
-
Use Recursive Language Models to address huge contexts for local LLM