Tagged "model-performance"
- I Replaced My Local LLM With a Model Half Its Size and Got Better Results
- Gemma 4 Just Replaced My Whole Local LLM Stack
- Fine-Tuned Qwen3.5-0.8B for OCR Outperforms Previous 2B Release
- Abliterated Local LLM Models Show Distinct Behavioral Characteristics Compared to Standard Variants
- MiniMax-M2.7 Delivers Exceptional Performance on Consumer Hardware
- GLM 5.1 Dominates Agentic Benchmarks, Outperforming Most Models at 1/3 Opus Cost
- Show HN: Willitrun – Check if Any ML Model Runs on Any Device (Benchmark-Backed)
- Gemma 4 Achieves Top Multilingual Performance Across European Languages
- Gemma 4 31B Achieves Exceptional Performance on Local Hardware
- Gemma 4 31B Achieves Third Place on FoodTruck Bench, Beating Larger Models
- Gemma 4 31B Outperforms GLM 5.1 in Real-World Testing
- Gemma 4 26B A4B Outperforms Qwen 3.5 35B on Apple Silicon
- Qwen 3.5-27B Demonstrates Superior Performance vs Gemini 3.1 Pro and GPT-5.3
- Ollama Adopts Apple's MLX Framework for Faster Local AI on Mac
- PrismML Announces 1-Bit Bonsai: First Commercially Viable 1-Bit LLMs
- Real-World Benchmark: DeepSeek-V3 Matches Claude Sonnet on Routine Coding Tasks
- Velr: Embedded Property-Graph Database for Local LLM Applications
- Alibaba Commits to Continuous Open-Sourcing of Qwen and Wan Models
- Setting Up a Private AI Brain on Windows: Complete Guide to Local LLM Deployment
- Nvidia Nemotron Cascade 2 30B Emerges as Powerful Alternative to Qwen Models
- Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options
- Open-Source LLMs Rapidly Displacing Proprietary SOTA Models
- Nvidia's Nemotron 3 Super: Understanding the Significance for Local LLM Deployment
- Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models
- FretBench – Testing 14 LLMs on Reading Guitar Tabs Reveals Performance Gaps
- Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference
- Change Intent Records: The Missing Artifact in AI-Assisted Development
- Qwen3.5-35B RTX 5080 Experiments Confirm KV q8_0 as Free Lunch, Q4_K_M Remains Optimal
- Qwen 3.5-27B Demonstrates Exceptional Performance with Thoughtful Prompt Engineering
- Qwen 3.5 Underperforms on Hard Coding Tasks—APEX Benchmark Analysis
- Qwen3.5 122B Achieves 25 tok/s on 72GB VRAM Setup
- Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods
- GLM-5 Becomes Top Open-Weights Model on Extended NYT Connections Benchmark
- Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder
- MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment
- GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks