Tagged "offline-deployment"

LM Studio Releases Reworked Plugins with Fully Local Web Research 23 March 2026
Developer Builds Fully Local Multi-Agent System Using vLLM and Parallel Inference 22 March 2026
Why You Should Use Both ChatGPT and Local LLMs: A Practical Hybrid Approach 22 March 2026
MacinAI Local brings functional LLM inference to classic Macintosh hardware 21 March 2026
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet 19 March 2026
Custom AI Smart Speaker 16 March 2026
Cicikus v3 Prometheus 4.4B – An Experimental Franken-Merge for Edge Reasoning 15 March 2026
Local AI Coding Assistant: Complete VS Code + Ollama + Continue Setup 12 March 2026
Qwen 3.5 Small Expands On-Device AI to Phones and IoT with Offline Support 9 March 2026
commitgen-cc – Generate Conventional Commit Messages Locally with Ollama 9 March 2026
Show HN: Ivy – the first proactive, offline AI tutor 8 March 2026
Unity Showcases Manufacturing AI Workflow at Smart Factory Expo 5 March 2026
SynthesisOS – A Local-First, Agentic Desktop Layer Built in Rust 4 March 2026
RunAnywhere Launches Production-Grade On-Device AI Platform for Enterprise Scale 4 March 2026
OpenWrt 25.12.0 – Stable Release 4 March 2026
On-Device AI Laptop Lineups Become Standard Across Major Manufacturers 4 March 2026
ÆTHERYA Core – Deterministic Policy Engine for Governing LLM Actions 4 March 2026
Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17 3 March 2026
Apple Intelligence, Galaxy AI, Gemini: Why Your AI-Powered Phone Is Worth Repairing 1 March 2026
On-Device Function Calling in Google AI Edge Gallery 27 February 2026
Android Phones Are Getting Smarter Without Internet — Here's Why On-Device AI Is the Next Big Shift 27 February 2026
Android Phones Are Getting Smarter Without Internet — On-Device AI as the Next Shift 27 February 2026
Researchers Develop Persistent Memory System for Local LLMs—No RAG Required 26 February 2026
Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio 23 February 2026
Search and Analyze Documents from the DOJ Epstein Files Release with Local LLM 21 February 2026
VaultAI – 42 AI Models on a Portable SSD, Works Offline for $399 20 February 2026
TemplateFlow – Build AI Workflows, Not Prompts 20 February 2026
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro 20 February 2026
I Stopped Paying for ChatGPT and Built a Private AI Setup That Anyone Can Run 20 February 2026
The Path to Ubiquitous AI (17k tokens/sec) 20 February 2026
PaddleOCR-VL Now Integrated into llama.cpp for Multilingual OCR 20 February 2026
Ollama Production Deployment: Docker-Compose Setup Guide 20 February 2026
NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support 20 February 2026
Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge 20 February 2026
Kitten TTS V0.8 Released: New State-of-the-Art Super-Tiny TTS Model Under 25 MB 20 February 2026
Why AI Models Fail at Iterative Reasoning and What Could Fix It 20 February 2026
Free ASIC-Accelerated Llama 3.1 8B Inference at 16,000 Tokens/Second 20 February 2026
Show HN: Forked – A Local Time-Travel Debugger for OpenClaw Agents 20 February 2026
AI Integration in Sublime Text: Practical Local LLM Editor Enhancement 19 February 2026
Self-Hosted Local LLMs for Document Management with Paperless-ngx 19 February 2026
Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses 19 February 2026
Running Local LLMs and VLMs on Arduino UNO Q with yzma 19 February 2026
Mihup and Qualcomm Collaborate to Advance Secure On-Device Voice AI for BFSI 19 February 2026
Complete Offline AI System: Voice Control and Smart Home via Local LLM and Radio Without Internet 19 February 2026
Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows 19 February 2026
Local-First RAG: Vector Search in SQLite with Hamming Distance 19 February 2026
LayerScale Launches Inference Engine Faster Than vLLM, SGLang, and TRT-LLM 19 February 2026
Kitten TTS V0.8 Released: State-of-the-Art Super-Tiny Text-to-Speech Model Under 25MB 19 February 2026
GPT4All Replaces Ollama On Mac After Quick Trial 19 February 2026
Clipthesis: Free Local App for Video Tagging and Search Across Drives 19 February 2026
Why My Country's AI Scene Is Built on Sand 18 February 2026
Tailscale Releases New Tool to Prevent Sensitive Data Leakage to Cloud AI Services 18 February 2026
Show HN: Shiro.computer Static Page, Unix/NPM Shimmed to Host Claude Code 18 February 2026
Sarvam AI Launches Edge Model to Challenge Major AI Players with Local-First Approach 18 February 2026
Alibaba's Qwen3.5-397B Achieves #3 Position in Open Weights Model Rankings 18 February 2026
Qualcomm Ventures Positions India as Blueprint for Affordable On-Device AI Infrastructure 18 February 2026
OpenClaw Refactored in Go, Runs on $10 Hardware 18 February 2026
Same INT8 Model Shows 93% to 71% Accuracy Variance Across Snapdragon Chipsets 18 February 2026
GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs 18 February 2026
Matmul-Free Language Model Trained on CPU in 1.2 Hours 18 February 2026
Cloudflare Releases Agents SDK v0.5.0 with Rust-Powered Infire Engine for Edge Inference 18 February 2026
Can We Leverage AI/LLMs for Self-Learning? 18 February 2026
AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs 18 February 2026
Self-Hosted AI: A Complete Roadmap for Beginners 17 February 2026
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet 17 February 2026
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation 17 February 2026
Open-Source Models Now Comprise 4 of Top 5 Most-Used Endpoints on OpenRouter 17 February 2026
I attacked my own LangGraph agent system. All 6 attacks worked 17 February 2026
High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference 17 February 2026
Cohere Releases Tiny Aya: Efficient 3.3B Multilingual Model for 70+ Languages 17 February 2026
Chinese AI Chipmaker Axera Semiconductor Plans $379 Million Hong Kong IPO for Edge Inference Hardware 17 February 2026
ASUS Zenbook 14 Launches in India with AI-Capable Hardware, Starting at Rs 1,15,990 17 February 2026
Asus ExpertBook B3 G2 Laptop Features Ryzen AI 9 HX 470 CPU in 1.41kg Ultraportable Form Factor 17 February 2026
Ask HN: What is the best bang for buck budget AI coding? 17 February 2026
Sourdine: Open-Source macOS App for 100% Local AI Transcription 16 February 2026
Security Alert: Open Claw Designed for Self-Hosting, Stop Sharing Credentials 16 February 2026
InitRunner: YAML-Based AI Agent Framework with RAG and Memory 16 February 2026
GPU-Accelerated DataFrame Library for Local Inference Workloads 16 February 2026
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release 16 February 2026
Switching From Ollama And LM Studio To llama.cpp: A Performance Comparison 14 February 2026
SnowBall Technique Addresses Context Window Limitations in Local LLMs 14 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 14 February 2026
NVIDIA's Dynamic Memory Sparsification Cuts LLM Inference Costs by 8x 14 February 2026
MiniMax Releases M2.5 Model with SOTA Coding and Agent Capabilities 14 February 2026
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment 14 February 2026
LLM APIs Reconceptualized as State Synchronization Challenge 14 February 2026
LLaDA2.1 Introduces Token Editing for Massive Speed Gains in Local Inference 14 February 2026
GPT-OSS 20B Now Runs 100% Locally in Browser via WebGPU 14 February 2026
GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision 14 February 2026
GNOME's AI Assistant Newelle Adds llama.cpp Support and Command Execution 14 February 2026
175,000 Publicly Exposed Ollama AI Servers Discovered Across 130 Countries 14 February 2026
Context Management Identified as Real Bottleneck in AI-Assisted Coding 14 February 2026
ByteDance Releases Seed2.0 LLM with Complex Real-World Task Improvements 14 February 2026
WinClaw: Windows-Native AI Assistant with Office Automation 13 February 2026
First Vibecoded AI Operating System for Local Deployment 13 February 2026
Simile AI Raises $100M Series A for Local AI Infrastructure 13 February 2026
Ring-1T-2.5 Released with SOTA Deep Thinking Performance 13 February 2026
MiniMax M2.5: 230B Parameter MoE Model Coming to HuggingFace 13 February 2026
Ming-flash-omni-2.0: 100B MoE Omni-Modal Model Released 13 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 13 February 2026
The Future of AI Slop Is Constraints - Implications for Local Models 13 February 2026
Running Your Own AI Assistant for €19/Month: Complete Self-Hosting Guide 12 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 12 February 2026
Samsung's REAM: Alternative Model Compression Technique 12 February 2026
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second 12 February 2026
OpenClaw with vLLM Running for Free on AMD Developer Cloud 12 February 2026
Researchers Find 175,000 Publicly Exposed Ollama AI Servers Across 130 Countries 12 February 2026
Memio Launches AI-Powered Knowledge Hub for Android with Local Processing 12 February 2026
GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks 12 February 2026
I Tried a Claude Code Rival That's Local, Open Source, and Completely Free 12 February 2026
NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics 11 February 2026
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts 11 February 2026
5 Practical Ways to Use Local LLMs with MCP Tools 11 February 2026
Godot MCP Gives AI Assistants Full Access to Game Engine Editor 11 February 2026
Building a RAG Pipeline on 2M+ Pages: EpsteinFiles-RAG Project 11 February 2026
Energy-Based Models Compared Against Frontier AI for Sudoku Solving 11 February 2026
DeepSeek Launches Model Update with 1M Context Window 11 February 2026
Developer Creates Custom Local AI Headshot Generator After Commercial Solutions Fail 11 February 2026
Carmack Proposes Using Long Fiber Lines as L2 Cache for Streaming AI Data 11 February 2026
Arm SME2 Technology Expands CPU Capabilities for On-Device AI 11 February 2026
Anthropic Releases Claude Opus 4.6 Sabotage Risk Assessment 11 February 2026
Community Member Builds 144GB VRAM Local LLM Powerhouse 11 February 2026