Tagged "local-inference"
-
Grokfeed: Terminal Feed Reader for HN, Reddit, and Lobste.rs Using Claude Code
-
Why the Same LLM Gives Different Answers in Different Environments
-
Linux Crushes Windows on llama.cpp Inference by Double Digits
-
Pluggable's TBT5-AI: First Thunderbolt Dock Explicitly Targeting Local LLM Workstations
-
Google's Gemma 4 Could Put Powerful AI on Your Phone and Laptop
-
Rust Open-Source Headless Browser for AI Agents and Web Scraping
-
Fixing Hallucination in LLM Prediction With Only One 48GB GPU
-
GPU Passthrough to LXCs in Proxmox Outperforms VMs and Simplifies Local AI Infrastructure
-
Build Your Own Local AI Stack with 5 Docker Containers and Eliminate ChatGPT Subscriptions
-
Llama 4 Scout on MLX: The Complete Apple Silicon Guide (2026)
-
Intel OpenVINO 2026.1 Integrates llama.cpp with Wildcat Lake and Arc Pro B70
-
Developer Replaced GPT-4 with a Local SLM and CI/CD Pipeline Stability Improved
-
My AI Workflow: Practical Guide to Using AI Without Skill Atrophy
-
The Open-Source AI Ecosystem Keeps Treating llama.cpp Like a Second-Class Citizen
-
Gemma 4 Just Replaced My Whole Local LLM Stack
-
DeepX and Hyundai Motor Group Robotics LAB Partner to Develop Next-Generation Physical AI Compute Platform
-
ZeusHammer: Built an AI Agent That Thinks Locally
-
llama.cpp Merges Speculative Checkpointing for Major Inference Speed Boost
-
Running DeepSeek R1 Locally: Your Complete Setup Guide
-
Web Agent Bridge: Open-Source OS for AI Agents
-
Waterloo's Live AI-Goose Tracker: Real-Time Edge Vision
-
PCMind: Local AI Analysis of Docs, Audio, Video and Images
-
Show HN: I Can't Write Python. It Works Anyway – Local LLM Automation
-
115 TOPS in 0.67L: CHUWI AuBox X Packs On-Device AI Power Into a Palm-Sized Mini PC
-
When Should AI Step Aside?: Teaching Agents When Humans Want to Intervene
-
Kilo Is the VS Code Extension That Actually Works With Every Local LLM I Throw at It
-
The Case for Out-of-Process Enforcement for AI Agents
-
Show HN: An MCP server that lets AI compose music on a hardware synth
-
Community Computer: Collaborative Autoresearch on a Peer-to-Peer Network
-
ChatMCP – Connect your AI browser chats to your coding agents
-
N8n, Dify, and Ollama Emerge as Leading Self-Hosted AI Automation Stack
-
Book Translator: Two-Pass Local Translation with Self-Reflection via Ollama
-
Google's Gemma 4 Brings Game-Changing Performance to Local Laptop Inference
-
GBrain – System to Make Your AI Agent Better Reflect You
-
DotLLM – Building an LLM Inference Engine in C#
-
Ubiquiti UniFi G6 Turret 4K Camera Features On-Device AI Processing at $199 Price Point
-
Developer Shares Golden Stack for Local Coding Assistant Integration Directly Inside Code Editors
-
Speculative Decoding Achieves 29% Speed Boost for Gemma-4 31B
-
Qwen3 Audio and Vision Support Now Available in llama.cpp
-
Defender – Local Prompt Injection Detection for AI Agents
-
Audio Processing Support Lands in llama.cpp with Gemma-4
-
A Deep Dive into Tinygrad AI Compiler
-
Google Gemma 4 Delivers Exceptional Speed and Accuracy for Local Inference
-
DFlash Speculative Decoding Achieves 3.3x Speedup on Apple Silicon
-
I Gave My AI Shell Access and Felt Uneasy – So I Sandboxed It
-
Intel Arc Pro B70 32GB Achieves 12 Tokens/Sec on Qwen 3.5-27B
-
Google's Gemini Nano 4 Offers Faster, Smarter Local Inference Capabilities
-
DMax: New Parallel Decoding Paradigm for Diffusion Language Models
-
ASUS ExpertBook P1 Integrates On-Device AI for Enterprise Collaboration
-
AI PC Market Projected to Reach $235B by 2032, Driven by On-Device Computing Adoption
-
Warp Decode vs. vLLM's Triton Kernel: Performance Crossover Analysis
-
Community Reverse Engineers Gemma 4 Multi-Token Prediction Capability
-
AI Scans 400k Reddit Posts to Flag Overlooked GLP-1 Side Effects
-
Energy Consumption: The Final Frontier for AI and Local Inference
-
Speculative Decoding Made My Local LLM Actually Usable
-
Gemma 4 Support Stabilized in Llama.cpp
-
LiteLLM Integrates with Ollama to Simplify Running 100+ Models Locally
-
GitHub Copilot CLI Adds Support for BYOK and Local Model Deployment
-
Show HN: Willitrun – Check if Any ML Model Runs on Any Device (Benchmark-Backed)
-
Running AI Natively on Windows 11 Using an eGPU
-
Gemma 4 26B Achieves Impressive Local Performance With Proper Configuration
-
METATRON: Open-Source AI Penetration Testing with Local LLMs
-
Apple Brings Enhanced On-Device AI Features to iPhone
-
Vektor – Local-First Associative Memory for AI Agents
-
Unpaved: Audit Toolkit for AI Developer Tool Bias in Global South Contexts
-
Satsgate: Monetize AI Agents and APIs with Lightning L402 Protocol
-
DGX Spark Hardware Limitations: Missing NVFP4 Support Undermines Local AI Value Proposition
-
GMKtec NucBox K17 Launches with 97 TOPS AI Performance for Local Inference
-
Run AutoGEN with Ollama and LiteLLM in Simple Steps
-
Samsung Launches Galaxy Book6 Series with NVIDIA RTX 5070 and On-Device AI
-
NVIDIA and Google Optimize Gemma 4 AI Models for Local RTX Deployment
-
Nex Life Logger: Local Activity Tracker with AI Agent Integration
-
Free AI Video Clipper Using Scene and Speech-Based Segmentation
-
AMD Rolls Out Gemma 4 Model Support Across Full Range of GPUs & CPUs
-
SkillCompass – Diagnose and Improve AI Agent Skills Across 6 Dimensions
-
April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini
-
Gemma 4 Makes Local AI Agents Practical
-
Apfel – The Free AI Already on Your Mac
-
TurboQuant Enables Qwen 3.5-27B on 16GB Consumer GPUs
-
Lotte Innovate and DeepX Collaborate on Mass Production of Domestic AI Semiconductors
-
Bonsai 1-Bit Models Deliver Exceptional Local Inference Performance
-
Local AI Ecosystem Extends Far Beyond Ollama
-
ByteShape Releases Qwen 3.5 9B Quantisations with Hardware-Matched Tuning Guide
-
Samsung launches Galaxy Book6 series in India with Nvidia RTX 5070 graphics and on-device AI
-
Ask HN: What do you use for local embeddings?
-
Samsung Launches Galaxy Book6 Series in India with NVIDIA RTX 5070 Graphics and On-Device AI
-
DeepSeek V3 Complete Guide: Deploy and Optimize Local AI in 2026
-
DaVinci-MagiHuman: Open-Source AI Model for Realistic Video Generation
-
M5 Max Delivers 1.7x Faster Inference Than M3 Max on Qwen 3.5 Models
-
HP Launches Copilot+ PCs in India with On-Device AI Capabilities for Local Inference
-
GLM-5.1 Model Weights Launching Early April for Local Deployment
-
Coding Implementation to Run Qwen3.5 Reasoning Models Distilled With Claude-Style Thinking Using GGUF and 4-Bit Quantization
-
Comparison of Two Frameworks: 40% Token Efficiency Improvement
-
Mistral AI Releases Voxtral: Open-Source TTS Model Beating ElevenLabs on Local Hardware
-
Pluggable's TBT5-AI: First Thunderbolt Dock Explicitly Targeting Local LLM Workstations
-
NVIDIA Releases GPT-OSS-Puzzle-88B, a Deployment-Optimized Model
-
Intel Launches Arc Pro B70/B65 with 32GB VRAM for Local AI Inference
-
Real-World Benchmark: DeepSeek-V3 Matches Claude Sonnet on Routine Coding Tasks
-
Google TurboQuant: Extreme Compression for Local LLM Deployment
-
New Open-Weight Models Released: GigaChat-3.1-Ultra and Lightning Variants
-
Private Brain LLM Setup on Windows PC Eliminates Need for Paid Cloud Services
-
Researcher Successfully Runs Local LLMs on Legacy "Dead" GPU With Surprising Results
-
Llama.cpp Benchmark: RTX 5090 vs Enterprise Systems Compared
-
HP Launches IQ On-Device AI Assistant, Advancing Enterprise AI Adoption on PCs
-
Open-Source AI Text-to-Speech Models You Can Run Locally for Natural Voice
-
Powerful AI Search Engine Built on Single GeForce RTX 5090
-
Nvidia Nemotron Cascade 2 30B Emerges as Powerful Alternative to Qwen Models
-
AI Playground for Developers Built in Vite and Python
-
Self-Hosted AI Code Review with Local LLMs: Secure Automation Guide
-
Running an AI Agent on a 448KB RAM Microcontroller
-
Pydantic-Deep: Production Deep Agents for Pydantic AI
-
Multi-Token Prediction support coming to MLX-LM for Qwen 3.5
-
Apple M5 Max 128GB real-world performance benchmarks for local inference
-
Cursor's Composer 2 model attribution dispute highlights open-source licensing concerns
-
Your Site Content Is Powering AI. Your Bank Account Has No Idea
-
Build a $1,500 AI Server with DeepSeek-R1 on RTX 4090
-
Atuin v18.13 – Better Search, a PTY Proxy, and AI for Your Shell
-
SwarmHawk – Open-Source CLI for Vulnerability Scanning with AI Synthesis
-
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options
-
Repurpose Old GPUs as Dedicated AI Inference Accelerators
-
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor
-
Llamafile 0.10 Released with GPU Support and Rebuilt Core
-
Unsloth Studio: Open-Source Web UI for Training and Running LLMs Locally
-
MiniMax-M2.7: New Compact Model Announced for Local Deployment
-
Mamba 3: State Space Model Architecture Optimized for Inference
-
LucidShark – Local-first, open-source quality and security gate
-
Auto-retry Claude Code on subscription rate limits (zero deps, tmux-based)
-
Browser-Based Transcription Tools
-
Run LLMs Locally with Llama.cpp
-
Mistral Releases Small 4 Open-Source Model Under Apache 2.0
-
KAIST Develops World's First Hyper-Personalized On-Device AI Chip
-
Open-Source LLMs Rapidly Displacing Proprietary SOTA Models
-
OmniCoder-9B: Efficient Coding Model for 8GB GPUs
-
NVIDIA Updates Nemotron 3 122B License, Removes Deployment Restrictions
-
Show HN: Merrilin.ai – Code Blocks in Your Books, Finally
-
LoKI – Local AI Assistant for Linux and WSL
-
This External GPU Enclosure Tries to Break Cloud Dependence for Local AI Inference
-
Dictare – Open-source Voice Layer for AI Coding Agents (100% Local)
-
Custom AI Smart Speaker
-
Startup Transforms Mac Mini Into Full-Powered AI Inference System With External GPU
-
India's Mobile-First AI Strategy Could Accelerate Local Inference Adoption in Emerging Markets
-
AMD Launches Agent System Optimized for Local AI Inference With Ryzen and Radeon
-
Show HN: Bots of WallStreet – Multi-Agent Debate and Prediction Framework
-
Linux 7.0 AMDGPU Fixing Idle Power Issue For RDNA4 GPUs After Compute Workloads
-
Nvidia Pushes Jetson as Edge Hub for Open AI Models
-
Apple M5 Max 128GB Benchmark Results for Local LLM Inference
-
Qwen 3.5-35B Uncensored GGUF Models Now Available
-
NVIDIA Jetson Brings Open Models to Life at the Edge
-
Llama.cpp Celebrates Major Milestone: From Leak to Industry Standard
-
HP OMEN MAX 16 Review: Is Local AI on a Laptop Viable in 2026?
-
Google Delivers On-Device AI Features in New Chromebook Plus Model
-
M5 Max and M5 Ultra Chipsets Demonstrate Significant Bandwidth Improvements for Local LLM Inference
-
Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2
-
When Running Ollama on Your PC for Local AI, One Thing Matters More Than Most
-
Nemotron 9B Powers Large-Scale Local Inference: Patent Classification and Real-Time Applications
-
commitgen-cc – Generate Conventional Commit Messages Locally with Ollama
-
Qwen 3.5 27B Achieves Strong Local Inference Performance
-
Benchmark: Local Open-Source LLMs Competitive in Real-Time Trading Applications
-
Show HN: Ivy – the first proactive, offline AI tutor
-
Apple Launches MacBook Neo with A18 Pro Chip for Affordable Local AI Inference
-
Self-Hosted Paperless-ngx With Optional Local AI Integration
-
Show HN: RedDragon – LLM-Assisted IR Analysis of Code Across Languages
-
Mojo: Creating a Programming Language for an AI World with Chris Lattner
-
Llama.cpp Merges Automatic Parser Generator to Mainline
-
Show HN: Asterode – Multi-Model AI App with Memory and Power Features
-
Building PyTorch-Native Support for IBM Spyre Accelerator
-
HyperExcel Seeks 150 Billion Won Series B to Scale LPU and Verda in Korea
-
Show HN: BoardMint – A PCB Review Tool That Avoids AI Hallucinations
-
Apple Unveils MacBook Pro with M5 Pro and M5 Max Featuring On-Device AI
-
SynthesisOS – A Local-First, Agentic Desktop Layer Built in Rust
-
OpenWrt 25.12.0 – Stable Release
-
On-Device AI Laptop Lineups Become Standard Across Major Manufacturers
-
AMD Launches Copilot+ Desktop Chips to Compete in On-Device AI Market
-
Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference
-
Intel Arc Pro B70 Workstation GPU Confirmed via vLLM AI Release Notes
-
Framework Choice Critical: llama.cpp and vLLM Outperform Ollama for Qwen 3.5 Testing
-
Building a Dependency-Free GPT on a Custom OS
-
Running Local AI Models on Mac Studio 128GB: 4B, 20B & 120B Tested
-
Qualcomm Launches Snapdragon Wear Elite for On-Device AI on Wearables
-
HP ZBook Ultra 14 G1a Workstation Reclaims Local AI Workflows for Professionals
-
Change Intent Records: The Missing Artifact in AI-Assisted Development
-
C7: Pipe Up-to-Date Library Docs Into Any LLM From the Terminal
-
Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models
-
4 Free Tools to Run Powerful AI on Your PC Without a Subscription
-
Unsloth Dynamic 2.0 GGUFs
-
5 Useful Docker Containers for Agentic Developers
-
On-Device Function Calling in Google AI Edge Gallery
-
Show HN: Caret – Tab to Complete at Any App on Your Mac
-
Android Phones Are Getting Smarter Without Internet — Here's Why On-Device AI Is the Next Big Shift
-
Every agent framework has the same bug – prompt decay. Here's a fix
-
Building a Privacy-Preserving RAG System in the Browser
-
Ollama for JavaScript Developers: Building AI Apps Without API Keys
-
LM Studio vs Ollama: Complete Comparison
-
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference
-
Apple: Python bindings for access to the on-device Apple Intelligence model
-
Red Hat Launches AI Enterprise for Hybrid AI Deployments
-
Show HN: Pluckr – LLM-Powered HTML Scraper That Caches Selectors and Auto-Heals
-
How AI is Redefining Price and Performance in Modern Laptops
-
Mirai Tech Raises $10 Million for On-Device AI Innovation
-
Enhanced Interface Speed Enables High-Performance On-Device AI Features in Smartphones
-
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy
-
Qwen3 Demonstrates Advanced Voice Cloning via Embeddings
-
Open-Source Framework Achieves Gemini 3 Deep Think Level Performance Through Local Model Scaffolding
-
Local GPT-OSS 20B Model Demonstrates Practical Agentic Capabilities
-
Open-Source llama.cpp Finds Long-Term Home at Hugging Face
-
Yet Another Fix Coming for Older AMD GPUs on Linux – Thanks to Valve Developer
-
Show HN: Horizon – My AI-Powered Personal News Aggregator and Summarizer
-
GGML Joins Hugging Face: What This Means for Local Model Optimization
-
AI PCs Explained: 7 Critical Truths About NPUs and Privacy
-
[Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally
-
GGML.AI Acquired by Hugging Face
-
Apple Researchers Develop On-Device AI Agent That Interacts With Apps for You
-
VaultAI – 42 AI Models on a Portable SSD, Works Offline for $399
-
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro
-
I Stopped Paying for ChatGPT and Built a Private AI Setup That Anyone Can Run
-
Ollama Production Deployment: Docker-Compose Setup Guide
-
NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support
-
Kitten TTS V0.8 Released: State-of-the-Art Super-Tiny Text-to-Speech Model Under 25MB
-
Tailscale Releases New Tool to Prevent Sensitive Data Leakage to Cloud AI Services
-
Sarvam AI Launches Edge Model to Challenge Major AI Players with Local-First Approach
-
GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs
-
Can We Leverage AI/LLMs for Self-Learning?
-
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet
-
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
-
Show HN: PgCortex – AI enrichment per Postgres row, zero transaction blocking
-
High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference
-
Asus ExpertBook B3 G2 Laptop Features Ryzen AI 9 HX 470 CPU in 1.41kg Ultraportable Form Factor
-
GPU-Accelerated DataFrame Library for Local Inference Workloads
-
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release
-
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
-
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment
-
LLaDA2.1 Introduces Token Editing for Massive Speed Gains in Local Inference
-
GPT-OSS 20B Now Runs 100% Locally in Browser via WebGPU
-
GNOME's AI Assistant Newelle Adds llama.cpp Support and Command Execution
-
First Vibecoded AI Operating System for Local Deployment
-
Optimal llama.cpp Settings Found for Qwen3 Coder Next Loop Issues
-
Ming-flash-omni-2.0: 100B MoE Omni-Modal Model Released
-
The Future of AI Slop Is Constraints - Implications for Local Models
-
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
-
Developer Creates Custom Local AI Headshot Generator After Commercial Solutions Fail
-
Community Member Builds 144GB VRAM Local LLM Powerhouse