Tagged "cpu-inference"
- MacinAI Local brings functional LLM inference to classic Macintosh hardware
- Repurpose Old GPUs as Dedicated AI Inference Accelerators
- Llamafile 0.10 Released with GPU Support and Rebuilt Core
- Browser-Based Transcription Tools
- Run LLMs Locally with Llama.cpp
- Kimi Introduces Attention Residuals: 1.25x Compute Performance at <2% Overhead
- Show HN: Voice-tracked teleprompter using on-device ASR in the browser
- Hybrid AI Desktop Layer Combining DOM-Automation and API-Integrations
- Open-Source GreenBoost Driver Augments NVIDIA GPU VRAM With System RAM and NVMe Storage
- I made Karpathy's Autoresearch work on CPU
- AMD Launches Agent System Optimized for Local AI Inference With Ryzen and Radeon
- P-EAGLE: Faster LLM Inference with Parallel Speculative Decoding in vLLM
- Intel OpenVINO Backend Support Now Available in llama.cpp
- Intel Updates LLM-Scaler-vLLM With Support For More Qwen3/3.5 Models
- Llama.cpp Celebrates Major Milestone: From Leak to Industry Standard
- HP OMEN MAX 16 Review: Is Local AI on a Laptop Viable in 2026?
- Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2
- When Running Ollama on Your PC for Local AI, One Thing Matters More Than Most
- Turning Your Linux Terminal into a Local AI Assistant
- OpenWrt 25.12.0 – Stable Release
- AMD Launches Copilot+ Desktop Chips to Compete in On-Device AI Market
- AMD Ryzen AI 400 Series Desktop Processors Launch with Integrated 60 TOPS NPU
- GitDelivr: A Free CDN for Git Clones Built on Cloudflare Workers and R2
- Browser Use vs. Claude Computer Use: Comparing Agent Automation Frameworks
- AMD Expands Ryzen AI 400 Series Portfolio for Consumer and Enterprise AI PC Options
- Bare-Metal LLM Inference: UEFI Application Boots Directly Into LLM Chat
- The ML.energy Leaderboard
- LLmFit: Terminal Tool for Right-Sizing LLM Models to Your Hardware
- LLmFit: One-Command Hardware-Aware Model Selection Across 497 Models and 133 Providers
- Krasis: Hybrid CPU/GPU MoE Runtime Achieves 3,324 Tokens/Second Prefill on RTX 5080
- Krasis Hybrid MoE Runtime Achieves 3,324 tok/s Prefill on Single RTX 5080
- What Breaks When AI Agent Frameworks Are Forced Into <1MB RAM and Sub-ms Startup
- A Tool to Tell You What LLMs Can Run on Your Machine
- Open-Source llama.cpp Finds Long-Term Home at Hugging Face
- AI Is Stress Testing Processor Architectures and RISC-V Fits the Moment
- Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization
- GGML Joins Hugging Face: What This Means for Local Model Optimization
- CPU-Trained Language Model Outperforms GPU Baseline After 40 Hours
- At India AI Impact Summit, Intel Showcases Its AI PCs and Cost-Efficient Frugal AI
- I Thought I Needed a GPU to Run AI Until I Learned About These Models
- Google Is Exploring Ways to Use Its Financial Might to Take on Nvidia
- GGML.AI Acquired by Hugging Face
- PaddleOCR-VL Now Integrated into llama.cpp for Multilingual OCR
- Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference
- Matmul-Free Language Model Trained on CPU in 1.2 Hours
- ASUS Zenbook 14 Launches in India with AI-Capable Hardware, Starting at Rs 1,15,990
- Asus ExpertBook B3 G2 Laptop Features Ryzen AI 9 HX 470 CPU in 1.41kg Ultraportable Form Factor
- GPU-Accelerated DataFrame Library for Local Inference Workloads
- Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
- GNOME's AI Assistant Newelle Adds llama.cpp Support and Command Execution
- Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
- Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
- Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
- GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks
- NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics
- Arm SME2 Technology Expands CPU Capabilities for On-Device AI
- Community Member Builds 144GB VRAM Local LLM Powerhouse