Tagged "model-compression"
- Google's Gemma 4: Powerful AI Models Optimized for Your Phone and Laptop
- Building Real-World On-Device AI with LiteRT and NPU
- Anker Unveils 'Thus' Chip to Bring On-Device AI Across Product Line
- 10GB VRAM Local LLM: The Complete Setup Guide (2026)
- Unweight: Lossless MLP Weight Compression for LLM Inference
- Bonsai 1.7B in the Browser: A 290MB 1-bit LLM on WebGPU
- SigMap – Shrink AI Coding Context 97% with Auto-Scaling Token Budget
- Researchers Achieve 1-Bit Quantization of OLMo-3 7B Using Distillation
- On-Device AI: Achieving Powerful AI Capabilities Without Internet Connectivity
- CarryAI's Serverless Vision-Language Models Enable On-Device Multimodal AI
- Quansloth Using Google's Turboquant Breaks the VRAM Wall for Local LLMs
- CricketBrain: Neuromorphic Signal Processor in Rust (0.175us/step, 944 bytes)
- Quantization Strategy Comparison: Balancing Quality and Speed on Consumer Laptops
- Google AI Edge Gallery Tops App Store Charts with On-Device Gemma 4
- Qwen 3.5 397B Reduced to 35% Parameters With Usable Quality on 96GB GPU
- Mixed Precision Quantization on MLX with TurboQuant Implementation
- TurboQuant Enables Qwen 3.5-27B on 16GB Consumer GPUs
- Bonsai 1-Bit Models Deliver Exceptional Local Inference Performance
- Claw64 – Full Agentic Loop in <4KB on Commodore 64
- TurboQuant: Understanding the Quantization Breakthrough
- Google's TurboQuant Shows Memory Constraints Remain Critical for Local LLM Inference
- CERN Embeds Tiny AI Models in Silicon Chips for Real-Time LHC Data Filtering
- TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice
- RotorQuant: 10-19x Faster Quantisation Alternative Using Clifford Algebra
- Coding Implementation to Run Qwen3.5 Reasoning Models Distilled With Claude-Style Thinking Using GGUF and 4-Bit Quantization
- Quantization Reveals Outliers Impacting LLM Accuracy
- Apple Gets Full Gemini Access and Uses Distillation to Build Lightweight On-Device AI
- Samsung Galaxy A37 and A57 5G Launch with On-Device AI Capabilities in India
- NVIDIA Releases GPT-OSS-Puzzle-88B, a Deployment-Optimized Model
- Nota AI and SiMa.ai Partner on Physical AI Technology for Local Deployment
- Google's TurboQuant: The Unsexy AI Breakthrough Worth Watching
- Apple Plans Slimmed-Down Gemini Models for Local iPhone AI Features
- Google TurboQuant: Extreme Compression for Local LLM Deployment
- Running an Open-Weight LLM Locally on an Apple Watch
- .APKs Are Just .ZIPs: Semi-Legally Hacking Software for Orphaned Hardware
- Ultra-Large 400B-Class LLM Runs on iPhone in Test
- LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language
- Running an AI Agent on a 448KB RAM Microcontroller
- Multiverse Computing Targets On-Device AI With Compressed Models and New API Portal
- Researcher Discovers Universal "Danger Zone" in Transformer Model Architecture at 50% Depth
- Nota Added to Three Technology and Growth ETFs in a Row – Market Recognition for AI Efficiency
- Nota AI to Showcase End-to-End On-Device AI Optimization at Embedded World 2026
- Student Researcher Achieves 42x Model Compression Through Novel Architecture
- ETH Zurich Research Challenges Context-Length Assumptions in LLM Agents
- OPPO and MediaTek Highlight On-Device AI Innovations at MWC 2026
- Qualcomm Snapdragon Wear Elite Brings On-Device AI to Smartwatches
- On-Device AI Laptop Lineups Become Standard Across Major Manufacturers
- Meta Reveals AI-Packed Smartwatch In 2026 – Why Wearables Shift Now
- Arduino and Qualcomm Bring On-Device AI Learning to Indian Schools
- Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices
- Kioxia Sampling UFS 5.0 Embedded Flash Memory for Next-Generation Mobile Applications
- Enhanced Interface Speed Enables High-Performance On-Device AI Features in Smartphones
- At India AI Impact Summit, Intel Showcases AI PCs and Cost-Efficient Frugal AI
- Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses
- NVIDIA's Dynamic Memory Sparsification Cuts LLM Inference Costs by 8x
- Samsung's REAM: Alternative Model Compression Technique