Tagged "quantization"
-
Qt 6.11 Released with Enhanced Cross-Platform Deployment Capabilities
-
Korea to Deploy Domestic AI Chips in Smart Cities as NPU Trials Scale Up
-
How to Build a Self-Hosted AI Server with LM Studio: Step-by-Step Guide
-
Ditching Paid AI Services: Building Self-Hosted LLM Solutions as ChatGPT, Claude, and Gemini Alternatives
-
Qwen 3.5 122B Uncensored (Aggressive) Released with New K_P Quantisations
-
Setting Up a Private AI Brain on Windows: Complete Guide to Local LLM Deployment
-
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B
-
BrowserOS 0.44.0 Release: Advances in Local AI Integration for Web-Based Applications
-
AI Playground for Developers Built in Vite and Python
-
Running an AI Agent on a 448KB RAM Microcontroller
-
Qwen 3.5 397B emerges as top-performing local coding model
-
Ultra-Compact 28M Parameter Models Show Promise for Specialized Domain Tasks
-
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options
-
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models
-
Repurpose Old GPUs as Dedicated AI Inference Accelerators
-
Multiverse Computing Targets On-Device AI With Compressed Models and New API Portal
-
Tether's QVAC Introduces Cross-Platform Bitnet LoRA Framework for On-Device AI Training
-
You're Using Your Local LLM Wrong If You're Prompting It Like a Cloud LLM
-
Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection
-
Browser-Based Transcription Tools
-
Run LLMs Locally with Llama.cpp
-
I Ran Local LLMs on a 'Dead' GPU, and the Results Surprised Me
-
Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks
-
Mistral Small 4 119B Released with NVFP4 Quantisation Support
-
Mistral Releases Small 4 Open-Source Model Under Apache 2.0
-
Researcher Discovers Universal "Danger Zone" in Transformer Model Architecture at 50% Depth
-
Kimi Introduces Attention Residuals: 1.25x Compute Performance at <2% Overhead
-
OpenClaw Isn't the Only Raspberry Pi AI Tool—Here Are 4 Others You Can Try This Week
-
OmniCoder-9B: Efficient Coding Model for 8GB GPUs
-
Nota Added to Three Technology and Growth ETFs in a Row – Market Recognition for AI Efficiency
-
AMD Declares 'AI on the PC Has Crossed an Important Line' – Agent Computers as Next Breakthrough
-
Qwen3.5-397B Achieves 282 tok/s on 4x RTX PRO 6000 Blackwell Through Custom CUTLASS Kernel
-
Running Qwen3.5-27B Across Multiple GPUs Over LAN Achieves Practical Speed for Local Inference
-
Two Local Models Prove Competitive Enough to Replace ChatGPT, Gemini, and Copilot
-
Cicikus v3 Prometheus 4.4B – An Experimental Franken-Merge for Edge Reasoning
-
Best Local LLM Models 2026: Developer Comparison
-
Intel Updates LLM-Scaler-vLLM With Support For More Qwen3/3.5 Models
-
Sarvam Open-Sources 30B and 105B Reasoning Models
-
Qwodel – An Open-Source Unified Pipeline for LLM Quantization
-
Quantization Explained: Q4_K_M vs AWQ vs FP16 for Local LLMs
-
Nvidia Releases Nemotron 3 Super: 120B MoE Model for Local Deployment
-
Comprehensive MoE Backend Benchmarks for Qwen3.5-397B: Real Numbers vs Hype
-
Experiment: 0.8B Model Self-Improvement on MacBook Air Yields Surprising Results
-
SK Hynix Completes Qualification for LPDDR6 Memory Optimized for AI Inference
-
Qwen 3.5-35B Uncensored GGUF Models Now Available
-
NVIDIA Jetson Brings Open Models to Life at the Edge
-
Llama.cpp Celebrates Major Milestone: From Leak to Industry Standard
-
HP OMEN MAX 16 Review: Is Local AI on a Laptop Viable in 2026?
-
FreeBSD 14.4 Released: Implications for Local LLM Deployment
-
Community Survey: AI Content Automation Stacks in 2026
-
Qwen 3.5 Small Expands On-Device AI to Phones and IoT with Offline Support
-
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models
-
Qwen 3.5 Derestricted Model Available for Local Deployment
-
Nota AI to Showcase End-to-End On-Device AI Optimization at Embedded World 2026
-
How to Run Your Own Local LLM — 2026 Edition
-
Snapdragon Wear Elite Unveiled at MWC 2026, Advancing Wearable AI Inference
-
Qwen 3.5 27B Achieves Strong Local Inference Performance
-
Student Researcher Achieves 42x Model Compression Through Novel Architecture
-
Apple Launches MacBook Neo with A18 Pro Chip for Affordable Local AI Inference
-
Mojo: Creating a Programming Language for an AI World with Chris Lattner
-
IBM Granite 4.0 1B Speech Model Released for Multilingual Speech Recognition
-
Show HN: TLDR – Free Chrome Extension for AI-Powered Article Summarization
-
Final Qwen3.5 Unsloth GGUF Update with Improved Size/Quality Tradeoffs
-
OPPO and MediaTek Highlight On-Device AI Innovations at MWC 2026
-
Unity Showcases Manufacturing AI Workflow at Smart Factory Expo
-
MediaTek Advances Omni Model for Efficient Smartphone Inference
-
Kakao Launches Kanana AI for On-Device Schedule and Recommendation Management
-
Apple Unveils MacBook Pro with M5 Pro and M5 Max Featuring On-Device AI
-
Qwen 3.5-27B Q4 Quantization Comparison and Analysis
-
Qualcomm Snapdragon Wear Elite Brings On-Device AI to Smartwatches
-
OpenWrt 25.12.0 – Stable Release
-
On-Device AI Laptop Lineups Become Standard Across Major Manufacturers
-
Quantifying Cost Savings with Local LLMs for Development
-
Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17
-
Running Local AI Models on Mac Studio 128GB: 4B, 20B & 120B Tested
-
Qwen 3.5 27B Achieves 100+ Tokens/s Decode on Dual RTX 3090s with 170K Context
-
Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference
-
Qualcomm Launches Snapdragon Wear Elite for On-Device AI on Wearables
-
Local LLM Performance Improvements: A Year of Progress Since DeepSeek R1 Moment
-
HP ZBook Ultra 14 G1a Workstation Reclaims Local AI Workflows for Professionals
-
How to Run High-Performance LLMs Locally on the Arduino UNO Q
-
Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models
-
Apple Intelligence, Galaxy AI, Gemini: Why Your AI-Powered Phone Is Worth Repairing
-
Unsloth Dynamic 2.0 GGUFs
-
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels
-
Qwen3.5-35B RTX 5080 Experiments Confirm KV q8_0 as Free Lunch, Q4_K_M Remains Optimal
-
Qwen 3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Quantisation Benchmarks
-
Qwen 3.5-35B RTX 5080 Benchmarks Confirm KV Q8_0 as Free Lunch, Q4_K_M Remains Optimal
-
Meta Reveals AI-Packed Smartwatch In 2026 – Why Wearables Shift Now
-
Galaxy S26 Debuts AI-Powered Scam Detection in Bold Security Push
-
Arduino, Qualcomm Bring On-Device AI and Robotics Learning to Indian School Systems
-
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot
-
Snapdragon 8 Elite Gen 5 for Galaxy Official: 5 Key Improvements that Push the Boundaries
-
On-Device AI in Mobile Apps: What Should Run on the Phone vs the Cloud (A 2026 Decision Guide)
-
Arduino, Qualcomm Bring On-Device AI and Robotics Learning to Indian School Systems
-
Arduino and Qualcomm Bring On-Device AI Learning to Indian Schools
-
Android Phones Are Getting Smarter Without Internet — Here's Why On-Device AI Is the Next Big Shift
-
Android Phones Are Getting Smarter Without Internet — On-Device AI as the Next Shift
-
Running LLMs on Raspberry Pi and Edge Devices: A Practical Guide
-
Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti
-
Qwen3.5 122B Achieves 25 tok/s on 72GB VRAM Setup
-
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers
-
Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment
-
Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices
-
Show HN: 100% LLM Accuracy–No Fine-Tuning, JSON Only
-
Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods
-
Kioxia Sampling UFS 5.0 Embedded Flash Memory for Next-Generation Mobile Applications
-
Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers
-
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy
-
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
-
How Do You Know Which SKILL.md Is Good?
-
Open-Source Framework Achieves Gemini 3 Deep Think Level Performance Through Local Model Scaffolding
-
Open-Source llama.cpp Finds Long-Term Home at Hugging Face
-
Future of Mobile AI: What On-Device Intelligence Means for App Developers
-
The Complete Stack for Local Autonomous Agents: From GGML to Orchestration
-
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
-
Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization
-
GGML Joins Hugging Face: What This Means for Local Model Optimization
-
DietPi Released a New Version v10.1
-
CPU-Trained Language Model Outperforms GPU Baseline After 40 Hours
-
Taalas Etches AI Models onto Transistors to Rocket Boost Inference
-
Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder
-
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels
-
[Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally
-
I Thought I Needed a GPU to Run AI Until I Learned About These Models
-
VaultAI – 42 AI Models on a Portable SSD, Works Offline for $399
-
The Path to Ubiquitous AI (17k tokens/sec)
-
Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge
-
Kitten TTS V0.8 Released: New State-of-the-Art Super-Tiny TTS Model Under 25 MB
-
Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses
-
Running Local LLMs and VLMs on Arduino UNO Q with yzma
-
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
-
Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows
-
Kitten TTS V0.8 Released: State-of-the-Art Super-Tiny Text-to-Speech Model Under 25MB
-
Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference
-
Qualcomm Ventures Positions India as Blueprint for Affordable On-Device AI Infrastructure
-
Same INT8 Model Shows 93% to 71% Accuracy Variance Across Snapdragon Chipsets
-
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
-
Ask HN: What is the best bang for buck budget AI coding?
-
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release
-
NVIDIA's Dynamic Memory Sparsification Cuts LLM Inference Costs by 8x
-
MiniMax Releases M2.5 Model with SOTA Coding and Agent Capabilities
-
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment
-
GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision
-
Ring-1T-2.5 Released with SOTA Deep Thinking Performance
-
GitHub Announces Support for Open Source AI Project Maintainers
-
Samsung's REAM: Alternative Model Compression Technique
-
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
-
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams
-
GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks
-
Community Member Builds 144GB VRAM Local LLM Powerhouse