Tagged "hardware-optimization"
- Qualcomm and Samsung's 30-Year AI Alliance Enters a New Phase as On-Device AI Chip Race Heats Up
- Multi-Token Prediction support coming to MLX-LM for Qwen 3.5
- Multiverse Computing Targets On-Device AI With Compressed Models and New API Portal
- Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection
- I Ran Local LLMs on a 'Dead' GPU, and the Results Surprised Me
- OmniCoder-9B: Efficient Coding Model for 8GB GPUs
- Nvidia's Nemotron 3 Super: Understanding the Significance for Local LLM Deployment
- Startup Transforms Mac Mini Into Full-Powered AI Inference System With External GPU
- Open-Source GreenBoost Driver Augments NVIDIA GPU VRAM With System RAM and NVMe Storage
- I made Karpathy's Autoresearch work on CPU
- Lemonade v10 Brings Linux NPU Support and Multi-Modal Capabilities
- Sarvam Open-Sources 30B and 105B Reasoning Models
- Quantization Explained: Q4_K_M vs AWQ vs FP16 for Local LLMs
- Nvidia Pushes Jetson as Edge Hub for Open AI Models
- Cutile.jl Brings Nvidia CUDA Tile-Based Programming to Julia
- Sarvam Open-Sources 30B and 105B Reasoning Models
- Qwen 3.5-35B Uncensored GGUF Models Now Available
- Llama.cpp Prompt Processing Optimization: Ubatch Size Configuration Guide
- HP Refreshes Lineup with AI-Focused Workstations
- Final Qwen3.5 Unsloth GGUF Update with Improved Size/Quality Tradeoffs
- MediaTek Advances Omni Model for Efficient Smartphone Inference
- Apple Unveils MacBook Pro with M5 Pro and M5 Max Featuring On-Device AI
- On-Device AI Laptop Lineups Become Standard Across Major Manufacturers
- Running Local AI Models on Mac Studio 128GB: 4B, 20B & 120B Tested
- Apple Neural Engine Reverse-Engineered for Local Model Training on Mac Mini M4
- Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models
- Bare-Metal LLM Inference: UEFI Application Boots Directly Into LLM Chat
- Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels
- On-Device AI in Mobile Apps: What Should Run on the Phone vs the Cloud (A 2026 Decision Guide)
- Running LLMs on Raspberry Pi and Edge Devices: A Practical Guide
- DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference
- Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices
- South Korea to Launch $687 Million Project to Develop On-Device AI Semiconductors
- Local GPT-OSS 20B Model Demonstrates Practical Agentic Capabilities
- Open-Source llama.cpp Finds Long-Term Home at Hugging Face
- O-TITANS: Orthogonal LoRA Framework for Gemma 3 with Google TITANS Memory Architecture
- Taalas Etches AI Models onto Transistors to Rocket Boost Inference
- Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
- Mihup and Qualcomm Collaborate to Advance Secure On-Device Voice AI for BFSI
- GPT4All Replaces Ollama On Mac After Quick Trial
- Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference
- Same INT8 Model Shows 93% to 71% Accuracy Variance Across Snapdragon Chipsets
- Qwen3-Next 80B MoE Achieves 39 Tokens/Second on RTX 5070/5060 Ti Dual-GPU Setup
- Sourdine: Open-Source macOS App for 100% Local AI Transcription
- Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release
- Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
- MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment
- Simile AI Raises $100M Series A for Local AI Infrastructure
- Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
- Samsung's REAM: Alternative Model Compression Technique
- NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics
- Carmack Proposes Using Long Fiber Lines as L2 Cache for Streaming AI Data
- Arm SME2 Technology Expands CPU Capabilities for On-Device AI