Tagged "cpu-only"
-
Phison and Intel Roll Out aiDAPTIV to Boost Local AI on Intel AI PC Platforms
-
Microsoft and Nvidia to Unveil First Windows PCs with Nvidia CPUs and AI Capabilities
-
Snapdragon C Debuts with 6nm Process and Dedicated On-Device AI Engine
-
Tweaking Local Language Model Settings with Ollama
-
Lenovo Bets on On-Device AI to Lift Business PC Upgrades
-
The Anatomy of an LLM
-
Developer Switches from LM Studio to llama.cpp, Reports No Performance Downgrade
-
Dell Launches 14 Plus Laptop with Intel Core Ultra 9 and 32GB RAM at $1,499.99, Enabling Local Model Inference
-
Redditor Successfully Runs 1 Trillion Parameter LLM Using Cheap Intel Optane DIMMs
-
Intel llm-scaler-vllm 1.4 Released With Updated Components and Arc Pro B70 Support
-
AMD's New Ryzen AI Max Pro 400 with 192GB LPDDR5X Memory
-
llama.cpp Adds Multi-Token Prediction, Doubles Qwen 3.6B Throughput for Local Inference
-
Linux 7.1-rc4 Released: Kernel Updates Relevant to Local LLM Inference
-
AI/ML Benchmark Tool for Local LLM Inference and XGBoost Training
-
Show HN: Find the best local LLM for your hardware, ranked by benchmarks
-
Running Local AI LLMs on Mini PCs Without NVIDIA GPUs
-
Running a Local LLM on a 12-Year-Old Raspberry Pi
-
Mainline Linux 6.12 on Annapurna Labs Alpine V2 (Ubiquiti UNVR, UDM-Pro)
-
Lucebox Brings Faster Local AI Inference to AMD Strix Halo
-
How I Used a Local LLM to Organize the Store on My NAS
-
Running a Local LLM on a 12-Year-Old Raspberry Pi: Practical Edge Inference
-
How I Used a Local LLM to Organize the Store on My NAS
-
Microsoft VibeVoice C++ Port Enables Local Voice AI on CPU and GPU Without Python
-
Sarvam Edge: Indian-Built AI Models Run Offline on Phones and Laptops Without Internet
-
llama.cpp Now Supports Multi-Token Prediction in Beta
-
Building a Raspberry Pi-Based Local LLM Server for Remote Access
-
New Open-Source Tool Automatically Matches Local LLMs to Your PC Hardware
-
Running Capable Local LLMs Without Expensive GPU Hardware
-
Why the Same LLM Gives Different Answers in Different Environments
-
The New Linux Kernel AI Bot Uncovering Bugs Is A Local LLM On Framework Desktop + AMD Ryzen AI Max
-
Intel OpenVINO 2026.1 Integrates llama.cpp with Wildcat Lake and Arc Pro B70
-
The Open-Source AI Ecosystem Keeps Treating llama.cpp Like a Second-Class Citizen
-
Sorting 1M u64 KV-Pairs in 20ms on i9-13980HX Using Branchless Rust Implementation
-
Dynamic Expert Cache in llama.cpp Achieves 27% Faster Inference on Large MoE Models
-
Qwen 3.5 Small – On-Device Multimodal Models Released
-
A Deep Dive into Tinygrad AI Compiler
-
The Best Local AI Model for Home Assistant Isn't Always the Biggest One
-
Building Offline AI Companions on Severely Constrained Hardware (8GB RAM)
-
5 Open-Source Projects Running Transformers on CPUs to GPUs in Pure Java
-
Speculative Decoding Made My Local LLM Actually Usable
-
Run Qwen3.5 on an Old Laptop: A Lightweight Local Agentic AI Setup Guide
-
Intel Releases OpenVINO 2026.1 With Backend For Llama.cpp, New Hardware Support
-
Your Next Assistant is Your PC: How On-Device AI is Transforming Work, One Workflow at a Time
-
Octopoda: Open Source Memory Layer for Fully Offline AI Agents
-
AMD Announces Day 0 Support for Google Gemma 4 Across Processors and GPUs
-
TurboQuant in Llama.cpp Achieves 6X Smaller KV Cache
-
Kokoro TTS Achieves 20× Realtime Speed on CPU-Only On-Device Inference
-
Gemma 4 KV Cache Memory Issues Fixed in llama.cpp
-
AMD Rolls Out Gemma 4 Model Support Across Full Range of GPUs & CPUs
-
OpenUMA – Apple-Style Unified Memory for x86 AI Inference
-
Show HN: Extra-Platforms, Python Library to Detect OS, Arch, Shell, CI, AI
-
Local AI Ecosystem Extends Far Beyond Ollama
-
Claw64 – Full Agentic Loop in <4KB on Commodore 64
-
PrismML Announces 1-Bit Bonsai: First Commercially Viable 1-Bit LLMs
-
Select the Right Hardware for Your Local LLM Deployment with This Online Guide
-
DeepSeek V3 Complete Guide: Deploy and Optimize Local AI in 2026
-
TurboQuant KV Cache Compression Achieves 22.8% Faster Decoding at 32K Context
-
Samsung Galaxy Book6 Series Brings Intel Core Ultra Chips for On-Device LLM Inference
-
TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice
-
Coding Implementation to Run Qwen3.5 Reasoning Models Distilled With Claude-Style Thinking Using GGUF and 4-Bit Quantization
-
HP Launches IQ On-Device AI Assistant, Advancing Enterprise AI Adoption on PCs
-
.APKs Are Just .ZIPs: Semi-Legally Hacking Software for Orphaned Hardware