Tagged "model-compression"

A Cinematic Landing-Page Hero for 80 Cents (GPT Image 2 and Veo 3.1) 2 June 2026
Tether AI Upgrades QVAC SDK With TurboQuant for Data Center-Sized Memory on Everyday Devices 2 June 2026
Netflix Wiz Creates App to Slash AI Bills by Pruning Agent Instructions, Then Open-Sources It 31 May 2026
Mistral AI Launches Mistral Vibe 28 May 2026
DeepSeek's Flagship V4 Pro Model Drops to 75% Lower Pricing, Increasing Competitive Pressure on Local Inference Economics 26 May 2026
Maker Demonstrates Portable AI with Suitcase-Integrated Jetson Orin Setup 25 May 2026
Apple's 2026 AI Strategy Prioritizes On-Device Model Deployment 25 May 2026
The Brain vs. Deep Learning Part I: Computational Complexity Analysis 22 May 2026
Meta Plans Agentic AI on Smartphones and Wearables by 2026 20 May 2026
Google Tensor SDK Beta with LiteRT Enables Efficient On-Device AI 20 May 2026
On-Device AI to Be in 80% of Wearables by 2032 19 May 2026
Local LLMs Enable Intelligent Smart Camera Control Without Cloud Dependency 18 May 2026
MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU 17 May 2026
Google Limits Gemini Intelligence to New Flagships—Hardware Requirements for Local Deployment 17 May 2026
Chrome Automatically Downloads 4GB AI Model for Local Processing 14 May 2026
Running a Local LLM on a 12-Year-Old Raspberry Pi 13 May 2026
DistillFast: AI Cost Optimization Tool for Model Efficiency 10 May 2026
Chrome's On-Device AI Features Consuming 4GB of Storage for Gemini Nano 9 May 2026
Perplexity Brings On-Device AI Workflow to Macs with 'Personal Computer' Feature 8 May 2026
Anker's Thus Chip Puts AI On-Device, Promising Faster Responses And Better Privacy 4 May 2026
Building a Raspberry Pi-Based Local LLM Server for Remote Access 1 May 2026
How Much "Brain Damage" Can an LLM Tolerate? 30 April 2026
Google's Gemma 4: Powerful AI Models Optimized for Your Phone and Laptop 28 April 2026
Building Real-World On-Device AI with LiteRT and NPU 24 April 2026
Anker Unveils 'Thus' Chip to Bring On-Device AI Across Product Line 23 April 2026
10GB VRAM Local LLM: The Complete Setup Guide (2026) 23 April 2026
Unweight: Lossless MLP Weight Compression for LLM Inference 18 April 2026
Bonsai 1.7B in the Browser: A 290MB 1-bit LLM on WebGPU 16 April 2026
SigMap – Shrink AI Coding Context 97% with Auto-Scaling Token Budget 15 April 2026
Researchers Achieve 1-Bit Quantization of OLMo-3 7B Using Distillation 13 April 2026
On-Device AI: Achieving Powerful AI Capabilities Without Internet Connectivity 12 April 2026
CarryAI's Serverless Vision-Language Models Enable On-Device Multimodal AI 10 April 2026
Quansloth Using Google's Turboquant Breaks the VRAM Wall for Local LLMs 7 April 2026
CricketBrain: Neuromorphic Signal Processor in Rust (0.175us/step, 944 bytes) 7 April 2026
Quantization Strategy Comparison: Balancing Quality and Speed on Consumer Laptops 6 April 2026
Google AI Edge Gallery Tops App Store Charts with On-Device Gemma 4 6 April 2026
Qwen 3.5 397B Reduced to 35% Parameters With Usable Quality on 96GB GPU 5 April 2026
Mixed Precision Quantization on MLX with TurboQuant Implementation 4 April 2026
TurboQuant Enables Qwen 3.5-27B on 16GB Consumer GPUs 2 April 2026
Bonsai 1-Bit Models Deliver Exceptional Local Inference Performance 2 April 2026
Claw64 – Full Agentic Loop in <4KB on Commodore 64 1 April 2026
TurboQuant: Understanding the Quantization Breakthrough 29 March 2026
Google's TurboQuant Shows Memory Constraints Remain Critical for Local LLM Inference 29 March 2026
CERN Embeds Tiny AI Models in Silicon Chips for Real-Time LHC Data Filtering 28 March 2026
TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice 27 March 2026
RotorQuant: 10-19x Faster Quantisation Alternative Using Clifford Algebra 27 March 2026
Coding Implementation to Run Qwen3.5 Reasoning Models Distilled With Claude-Style Thinking Using GGUF and 4-Bit Quantization 27 March 2026
Quantization Reveals Outliers Impacting LLM Accuracy 27 March 2026
Apple Gets Full Gemini Access and Uses Distillation to Build Lightweight On-Device AI 27 March 2026
Samsung Galaxy A37 and A57 5G Launch with On-Device AI Capabilities in India 26 March 2026
NVIDIA Releases GPT-OSS-Puzzle-88B, a Deployment-Optimized Model 26 March 2026
Nota AI and SiMa.ai Partner on Physical AI Technology for Local Deployment 26 March 2026
Google's TurboQuant: The Unsexy AI Breakthrough Worth Watching 26 March 2026
Apple Plans Slimmed-Down Gemini Models for Local iPhone AI Features 26 March 2026
Google TurboQuant: Extreme Compression for Local LLM Deployment 25 March 2026
Running an Open-Weight LLM Locally on an Apple Watch 25 March 2026
.APKs Are Just .ZIPs: Semi-Legally Hacking Software for Orphaned Hardware 25 March 2026
Ultra-Large 400B-Class LLM Runs on iPhone in Test 25 March 2026
LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language 24 March 2026
Running an AI Agent on a 448KB RAM Microcontroller 21 March 2026
Multiverse Computing Targets On-Device AI With Compressed Models and New API Portal 19 March 2026
Researcher Discovers Universal "Danger Zone" in Transformer Model Architecture at 50% Depth 17 March 2026
Nota Added to Three Technology and Growth ETFs in a Row – Market Recognition for AI Efficiency 16 March 2026
Nota AI to Showcase End-to-End On-Device AI Optimization at Embedded World 2026 9 March 2026
Student Researcher Achieves 42x Model Compression Through Novel Architecture 8 March 2026
ETH Zurich Research Challenges Context-Length Assumptions in LLM Agents 8 March 2026
OPPO and MediaTek Highlight On-Device AI Innovations at MWC 2026 6 March 2026
Qualcomm Snapdragon Wear Elite Brings On-Device AI to Smartwatches 4 March 2026
On-Device AI Laptop Lineups Become Standard Across Major Manufacturers 4 March 2026
Meta Reveals AI-Packed Smartwatch In 2026 – Why Wearables Shift Now 28 February 2026
Arduino and Qualcomm Bring On-Device AI Learning to Indian Schools 27 February 2026
Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices 25 February 2026
Kioxia Sampling UFS 5.0 Embedded Flash Memory for Next-Generation Mobile Applications 24 February 2026
Enhanced Interface Speed Enables High-Performance On-Device AI Features in Smartphones 24 February 2026
At India AI Impact Summit, Intel Showcases AI PCs and Cost-Efficient Frugal AI 22 February 2026
Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses 19 February 2026
NVIDIA's Dynamic Memory Sparsification Cuts LLM Inference Costs by 8x 14 February 2026
Samsung's REAM: Alternative Model Compression Technique 12 February 2026