Tagged "cpu-inference"

MacinAI Local brings functional LLM inference to classic Macintosh hardware 21 March 2026
Repurpose Old GPUs as Dedicated AI Inference Accelerators 20 March 2026
Llamafile 0.10 Released with GPU Support and Rebuilt Core 20 March 2026
Browser-Based Transcription Tools 18 March 2026
Run LLMs Locally with Llama.cpp 17 March 2026
Kimi Introduces Attention Residuals: 1.25x Compute Performance at <2% Overhead 17 March 2026
Show HN: Voice-tracked teleprompter using on-device ASR in the browser 15 March 2026
Hybrid AI Desktop Layer Combining DOM-Automation and API-Integrations 15 March 2026
Open-Source GreenBoost Driver Augments NVIDIA GPU VRAM With System RAM and NVMe Storage 15 March 2026
I made Karpathy's Autoresearch work on CPU 15 March 2026
AMD Launches Agent System Optimized for Local AI Inference With Ryzen and Radeon 15 March 2026
P-EAGLE: Faster LLM Inference with Parallel Speculative Decoding in vLLM 14 March 2026
Intel OpenVINO Backend Support Now Available in llama.cpp 14 March 2026
Intel Updates LLM-Scaler-vLLM With Support For More Qwen3/3.5 Models 13 March 2026
Llama.cpp Celebrates Major Milestone: From Leak to Industry Standard 11 March 2026
HP OMEN MAX 16 Review: Is Local AI on a Laptop Viable in 2026? 10 March 2026
Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2 9 March 2026
When Running Ollama on Your PC for Local AI, One Thing Matters More Than Most 9 March 2026
Turning Your Linux Terminal into a Local AI Assistant 7 March 2026
OpenWrt 25.12.0 – Stable Release 4 March 2026
AMD Launches Copilot+ Desktop Chips to Compete in On-Device AI Market 4 March 2026
AMD Ryzen AI 400 Series Desktop Processors Launch with Integrated 60 TOPS NPU 3 March 2026
GitDelivr: A Free CDN for Git Clones Built on Cloudflare Workers and R2 2 March 2026
Browser Use vs. Claude Computer Use: Comparing Agent Automation Frameworks 2 March 2026
AMD Expands Ryzen AI 400 Series Portfolio for Consumer and Enterprise AI PC Options 2 March 2026
Bare-Metal LLM Inference: UEFI Application Boots Directly Into LLM Chat 1 March 2026
The ML.energy Leaderboard 28 February 2026
LLmFit: Terminal Tool for Right-Sizing LLM Models to Your Hardware 28 February 2026
LLmFit: One-Command Hardware-Aware Model Selection Across 497 Models and 133 Providers 28 February 2026
Krasis: Hybrid CPU/GPU MoE Runtime Achieves 3,324 Tokens/Second Prefill on RTX 5080 28 February 2026
Krasis Hybrid MoE Runtime Achieves 3,324 tok/s Prefill on Single RTX 5080 28 February 2026
What Breaks When AI Agent Frameworks Are Forced Into <1MB RAM and Sub-ms Startup 25 February 2026
A Tool to Tell You What LLMs Can Run on Your Machine 23 February 2026
Open-Source llama.cpp Finds Long-Term Home at Hugging Face 23 February 2026
AI Is Stress Testing Processor Architectures and RISC-V Fits the Moment 22 February 2026
Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization 22 February 2026
GGML Joins Hugging Face: What This Means for Local Model Optimization 22 February 2026
CPU-Trained Language Model Outperforms GPU Baseline After 40 Hours 22 February 2026
At India AI Impact Summit, Intel Showcases Its AI PCs and Cost-Efficient Frugal AI 21 February 2026
I Thought I Needed a GPU to Run AI Until I Learned About These Models 21 February 2026
Google Is Exploring Ways to Use Its Financial Might to Take on Nvidia 21 February 2026
GGML.AI Acquired by Hugging Face 21 February 2026
PaddleOCR-VL Now Integrated into llama.cpp for Multilingual OCR 20 February 2026
Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference 19 February 2026
Matmul-Free Language Model Trained on CPU in 1.2 Hours 18 February 2026
ASUS Zenbook 14 Launches in India with AI-Capable Hardware, Starting at Rs 1,15,990 17 February 2026
Asus ExpertBook B3 G2 Laptop Features Ryzen AI 9 HX 470 CPU in 1.41kg Ultraportable Form Factor 17 February 2026
GPU-Accelerated DataFrame Library for Local Inference Workloads 16 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 14 February 2026
GNOME's AI Assistant Newelle Adds llama.cpp Support and Command Execution 14 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 13 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 12 February 2026
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second 12 February 2026
GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks 12 February 2026
NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics 11 February 2026
Arm SME2 Technology Expands CPU Capabilities for On-Device AI 11 February 2026
Community Member Builds 144GB VRAM Local LLM Powerhouse 11 February 2026