Tagged "quantization"

Qt 6.11 Released with Enhanced Cross-Platform Deployment Capabilities 23 March 2026
Korea to Deploy Domestic AI Chips in Smart Cities as NPU Trials Scale Up 23 March 2026
How to Build a Self-Hosted AI Server with LM Studio: Step-by-Step Guide 23 March 2026
Ditching Paid AI Services: Building Self-Hosted LLM Solutions as ChatGPT, Claude, and Gemini Alternatives 22 March 2026
Qwen 3.5 122B Uncensored (Aggressive) Released with New K_P Quantisations 22 March 2026
Setting Up a Private AI Brain on Windows: Complete Guide to Local LLM Deployment 22 March 2026
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B 22 March 2026
BrowserOS 0.44.0 Release: Advances in Local AI Integration for Web-Based Applications 22 March 2026
AI Playground for Developers Built in Vite and Python 22 March 2026
Running an AI Agent on a 448KB RAM Microcontroller 21 March 2026
Qwen 3.5 397B emerges as top-performing local coding model 21 March 2026
Ultra-Compact 28M Parameter Models Show Promise for Specialized Domain Tasks 20 March 2026
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options 20 March 2026
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models 20 March 2026
Repurpose Old GPUs as Dedicated AI Inference Accelerators 20 March 2026
Multiverse Computing Targets On-Device AI With Compressed Models and New API Portal 19 March 2026
Tether's QVAC Introduces Cross-Platform Bitnet LoRA Framework for On-Device AI Training 19 March 2026
You're Using Your Local LLM Wrong If You're Prompting It Like a Cloud LLM 18 March 2026
Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection 18 March 2026
Browser-Based Transcription Tools 18 March 2026
Run LLMs Locally with Llama.cpp 17 March 2026
I Ran Local LLMs on a 'Dead' GPU, and the Results Surprised Me 17 March 2026
Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks 17 March 2026
Mistral Small 4 119B Released with NVFP4 Quantisation Support 17 March 2026
Mistral Releases Small 4 Open-Source Model Under Apache 2.0 17 March 2026
Researcher Discovers Universal "Danger Zone" in Transformer Model Architecture at 50% Depth 17 March 2026
Kimi Introduces Attention Residuals: 1.25x Compute Performance at <2% Overhead 17 March 2026
OpenClaw Isn't the Only Raspberry Pi AI Tool—Here Are 4 Others You Can Try This Week 16 March 2026
OmniCoder-9B: Efficient Coding Model for 8GB GPUs 16 March 2026
Nota Added to Three Technology and Growth ETFs in a Row – Market Recognition for AI Efficiency 16 March 2026
AMD Declares 'AI on the PC Has Crossed an Important Line' – Agent Computers as Next Breakthrough 16 March 2026
Qwen3.5-397B Achieves 282 tok/s on 4x RTX PRO 6000 Blackwell Through Custom CUTLASS Kernel 15 March 2026
Running Qwen3.5-27B Across Multiple GPUs Over LAN Achieves Practical Speed for Local Inference 15 March 2026
Two Local Models Prove Competitive Enough to Replace ChatGPT, Gemini, and Copilot 15 March 2026
Cicikus v3 Prometheus 4.4B – An Experimental Franken-Merge for Edge Reasoning 15 March 2026
Best Local LLM Models 2026: Developer Comparison 14 March 2026
Intel Updates LLM-Scaler-vLLM With Support For More Qwen3/3.5 Models 13 March 2026
Sarvam Open-Sources 30B and 105B Reasoning Models 12 March 2026
Qwodel – An Open-Source Unified Pipeline for LLM Quantization 12 March 2026
Quantization Explained: Q4_K_M vs AWQ vs FP16 for Local LLMs 12 March 2026
Nvidia Releases Nemotron 3 Super: 120B MoE Model for Local Deployment 12 March 2026
Comprehensive MoE Backend Benchmarks for Qwen3.5-397B: Real Numbers vs Hype 12 March 2026
Experiment: 0.8B Model Self-Improvement on MacBook Air Yields Surprising Results 11 March 2026
SK Hynix Completes Qualification for LPDDR6 Memory Optimized for AI Inference 11 March 2026
Qwen 3.5-35B Uncensored GGUF Models Now Available 11 March 2026
NVIDIA Jetson Brings Open Models to Life at the Edge 11 March 2026
Llama.cpp Celebrates Major Milestone: From Leak to Industry Standard 11 March 2026
HP OMEN MAX 16 Review: Is Local AI on a Laptop Viable in 2026? 10 March 2026
FreeBSD 14.4 Released: Implications for Local LLM Deployment 10 March 2026
Community Survey: AI Content Automation Stacks in 2026 10 March 2026
Qwen 3.5 Small Expands On-Device AI to Phones and IoT with Offline Support 9 March 2026
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models 9 March 2026
Qwen 3.5 Derestricted Model Available for Local Deployment 9 March 2026
Nota AI to Showcase End-to-End On-Device AI Optimization at Embedded World 2026 9 March 2026
How to Run Your Own Local LLM — 2026 Edition 9 March 2026
Snapdragon Wear Elite Unveiled at MWC 2026, Advancing Wearable AI Inference 8 March 2026
Qwen 3.5 27B Achieves Strong Local Inference Performance 8 March 2026
Student Researcher Achieves 42x Model Compression Through Novel Architecture 8 March 2026
Apple Launches MacBook Neo with A18 Pro Chip for Affordable Local AI Inference 8 March 2026
Mojo: Creating a Programming Language for an AI World with Chris Lattner 7 March 2026
IBM Granite 4.0 1B Speech Model Released for Multilingual Speech Recognition 7 March 2026
Show HN: TLDR – Free Chrome Extension for AI-Powered Article Summarization 6 March 2026
Final Qwen3.5 Unsloth GGUF Update with Improved Size/Quality Tradeoffs 6 March 2026
OPPO and MediaTek Highlight On-Device AI Innovations at MWC 2026 6 March 2026
Unity Showcases Manufacturing AI Workflow at Smart Factory Expo 5 March 2026
MediaTek Advances Omni Model for Efficient Smartphone Inference 5 March 2026
Kakao Launches Kanana AI for On-Device Schedule and Recommendation Management 5 March 2026
Apple Unveils MacBook Pro with M5 Pro and M5 Max Featuring On-Device AI 5 March 2026
Qwen 3.5-27B Q4 Quantization Comparison and Analysis 4 March 2026
Qualcomm Snapdragon Wear Elite Brings On-Device AI to Smartwatches 4 March 2026
OpenWrt 25.12.0 – Stable Release 4 March 2026
On-Device AI Laptop Lineups Become Standard Across Major Manufacturers 4 March 2026
Quantifying Cost Savings with Local LLMs for Development 4 March 2026
Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17 3 March 2026
Running Local AI Models on Mac Studio 128GB: 4B, 20B & 120B Tested 2 March 2026
Qwen 3.5 27B Achieves 100+ Tokens/s Decode on Dual RTX 3090s with 170K Context 2 March 2026
Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference 2 March 2026
Qualcomm Launches Snapdragon Wear Elite for On-Device AI on Wearables 2 March 2026
Local LLM Performance Improvements: A Year of Progress Since DeepSeek R1 Moment 2 March 2026
HP ZBook Ultra 14 G1a Workstation Reclaims Local AI Workflows for Professionals 2 March 2026
How to Run High-Performance LLMs Locally on the Arduino UNO Q 1 March 2026
Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models 1 March 2026
Apple Intelligence, Galaxy AI, Gemini: Why Your AI-Powered Phone Is Worth Repairing 1 March 2026
Unsloth Dynamic 2.0 GGUFs 28 February 2026
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels 28 February 2026
Qwen3.5-35B RTX 5080 Experiments Confirm KV q8_0 as Free Lunch, Q4_K_M Remains Optimal 28 February 2026
Qwen 3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Quantisation Benchmarks 28 February 2026
Qwen 3.5-35B RTX 5080 Benchmarks Confirm KV Q8_0 as Free Lunch, Q4_K_M Remains Optimal 28 February 2026
Meta Reveals AI-Packed Smartwatch In 2026 – Why Wearables Shift Now 28 February 2026
Galaxy S26 Debuts AI-Powered Scam Detection in Bold Security Push 28 February 2026
Arduino, Qualcomm Bring On-Device AI and Robotics Learning to Indian School Systems 28 February 2026
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot 28 February 2026
Snapdragon 8 Elite Gen 5 for Galaxy Official: 5 Key Improvements that Push the Boundaries 27 February 2026
On-Device AI in Mobile Apps: What Should Run on the Phone vs the Cloud (A 2026 Decision Guide) 27 February 2026
Arduino, Qualcomm Bring On-Device AI and Robotics Learning to Indian School Systems 27 February 2026
Arduino and Qualcomm Bring On-Device AI Learning to Indian Schools 27 February 2026
Android Phones Are Getting Smarter Without Internet — Here's Why On-Device AI Is the Next Big Shift 27 February 2026
Android Phones Are Getting Smarter Without Internet — On-Device AI as the Next Shift 27 February 2026
Running LLMs on Raspberry Pi and Edge Devices: A Practical Guide 26 February 2026
Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti 26 February 2026
Qwen3.5 122B Achieves 25 tok/s on 72GB VRAM Setup 26 February 2026
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers 25 February 2026
Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment 25 February 2026
Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices 25 February 2026
Show HN: 100% LLM Accuracy–No Fine-Tuning, JSON Only 25 February 2026
Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods 25 February 2026
Kioxia Sampling UFS 5.0 Embedded Flash Memory for Next-Generation Mobile Applications 24 February 2026
Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers 24 February 2026
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy 24 February 2026
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference 23 February 2026
How Do You Know Which SKILL.md Is Good? 23 February 2026
Open-Source Framework Achieves Gemini 3 Deep Think Level Performance Through Local Model Scaffolding 23 February 2026
Open-Source llama.cpp Finds Long-Term Home at Hugging Face 23 February 2026
Future of Mobile AI: What On-Device Intelligence Means for App Developers 23 February 2026
The Complete Stack for Local Autonomous Agents: From GGML to Orchestration 23 February 2026
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference 23 February 2026
Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization 22 February 2026
GGML Joins Hugging Face: What This Means for Local Model Optimization 22 February 2026
DietPi Released a New Version v10.1 22 February 2026
CPU-Trained Language Model Outperforms GPU Baseline After 40 Hours 22 February 2026
Taalas Etches AI Models onto Transistors to Rocket Boost Inference 21 February 2026
Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder 21 February 2026
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels 21 February 2026
[Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally 21 February 2026
I Thought I Needed a GPU to Run AI Until I Learned About These Models 21 February 2026
VaultAI – 42 AI Models on a Portable SSD, Works Offline for $399 20 February 2026
The Path to Ubiquitous AI (17k tokens/sec) 20 February 2026
Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge 20 February 2026
Kitten TTS V0.8 Released: New State-of-the-Art Super-Tiny TTS Model Under 25 MB 20 February 2026
Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses 19 February 2026
Running Local LLMs and VLMs on Arduino UNO Q with yzma 19 February 2026
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs 19 February 2026
Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows 19 February 2026
Kitten TTS V0.8 Released: State-of-the-Art Super-Tiny Text-to-Speech Model Under 25MB 19 February 2026
Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference 19 February 2026
Qualcomm Ventures Positions India as Blueprint for Affordable On-Device AI Infrastructure 18 February 2026
Same INT8 Model Shows 93% to 71% Accuracy Variance Across Snapdragon Chipsets 18 February 2026
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation 17 February 2026
Ask HN: What is the best bang for buck budget AI coding? 17 February 2026
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release 16 February 2026
NVIDIA's Dynamic Memory Sparsification Cuts LLM Inference Costs by 8x 14 February 2026
MiniMax Releases M2.5 Model with SOTA Coding and Agent Capabilities 14 February 2026
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment 14 February 2026
GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision 14 February 2026
Ring-1T-2.5 Released with SOTA Deep Thinking Performance 13 February 2026
GitHub Announces Support for Open Source AI Project Maintainers 13 February 2026
Samsung's REAM: Alternative Model Compression Technique 12 February 2026
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second 12 February 2026
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams 12 February 2026
GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks 12 February 2026
Community Member Builds 144GB VRAM Local LLM Powerhouse 11 February 2026