Tagged "hardware-optimization"

Qualcomm and Samsung's 30-Year AI Alliance Enters a New Phase as On-Device AI Chip Race Heats Up 21 March 2026
Multi-Token Prediction support coming to MLX-LM for Qwen 3.5 21 March 2026
Multiverse Computing Targets On-Device AI With Compressed Models and New API Portal 19 March 2026
Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection 18 March 2026
I Ran Local LLMs on a 'Dead' GPU, and the Results Surprised Me 17 March 2026
OmniCoder-9B: Efficient Coding Model for 8GB GPUs 16 March 2026
Nvidia's Nemotron 3 Super: Understanding the Significance for Local LLM Deployment 15 March 2026
Startup Transforms Mac Mini Into Full-Powered AI Inference System With External GPU 15 March 2026
Open-Source GreenBoost Driver Augments NVIDIA GPU VRAM With System RAM and NVMe Storage 15 March 2026
I made Karpathy's Autoresearch work on CPU 15 March 2026
Lemonade v10 Brings Linux NPU Support and Multi-Modal Capabilities 14 March 2026
Sarvam Open-Sources 30B and 105B Reasoning Models 12 March 2026
Quantization Explained: Q4_K_M vs AWQ vs FP16 for Local LLMs 12 March 2026
Nvidia Pushes Jetson as Edge Hub for Open AI Models 12 March 2026
Cutile.jl Brings Nvidia CUDA Tile-Based Programming to Julia 12 March 2026
Sarvam Open-Sources 30B and 105B Reasoning Models 11 March 2026
Qwen 3.5-35B Uncensored GGUF Models Now Available 11 March 2026
Llama.cpp Prompt Processing Optimization: Ubatch Size Configuration Guide 8 March 2026
HP Refreshes Lineup with AI-Focused Workstations 8 March 2026
Final Qwen3.5 Unsloth GGUF Update with Improved Size/Quality Tradeoffs 6 March 2026
MediaTek Advances Omni Model for Efficient Smartphone Inference 5 March 2026
Apple Unveils MacBook Pro with M5 Pro and M5 Max Featuring On-Device AI 5 March 2026
On-Device AI Laptop Lineups Become Standard Across Major Manufacturers 4 March 2026
Running Local AI Models on Mac Studio 128GB: 4B, 20B & 120B Tested 2 March 2026
Apple Neural Engine Reverse-Engineered for Local Model Training on Mac Mini M4 2 March 2026
Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models 1 March 2026
Bare-Metal LLM Inference: UEFI Application Boots Directly Into LLM Chat 1 March 2026
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels 28 February 2026
On-Device AI in Mobile Apps: What Should Run on the Phone vs the Cloud (A 2026 Decision Guide) 28 February 2026
Running LLMs on Raspberry Pi and Edge Devices: A Practical Guide 26 February 2026
DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference 26 February 2026
Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices 25 February 2026
South Korea to Launch $687 Million Project to Develop On-Device AI Semiconductors 23 February 2026
Local GPT-OSS 20B Model Demonstrates Practical Agentic Capabilities 23 February 2026
Open-Source llama.cpp Finds Long-Term Home at Hugging Face 23 February 2026
O-TITANS: Orthogonal LoRA Framework for Gemma 3 with Google TITANS Memory Architecture 22 February 2026
Taalas Etches AI Models onto Transistors to Rocket Boost Inference 21 February 2026
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs 19 February 2026
Mihup and Qualcomm Collaborate to Advance Secure On-Device Voice AI for BFSI 19 February 2026
GPT4All Replaces Ollama On Mac After Quick Trial 19 February 2026
Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference 19 February 2026
Same INT8 Model Shows 93% to 71% Accuracy Variance Across Snapdragon Chipsets 18 February 2026
Qwen3-Next 80B MoE Achieves 39 Tokens/Second on RTX 5070/5060 Ti Dual-GPU Setup 17 February 2026
Sourdine: Open-Source macOS App for 100% Local AI Transcription 16 February 2026
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release 16 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 14 February 2026
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment 14 February 2026
Simile AI Raises $100M Series A for Local AI Infrastructure 13 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 12 February 2026
Samsung's REAM: Alternative Model Compression Technique 12 February 2026
NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics 11 February 2026
Carmack Proposes Using Long Fiber Lines as L2 Cache for Streaming AI Data 11 February 2026
Arm SME2 Technology Expands CPU Capabilities for On-Device AI 11 February 2026