Tagged "offline-deployment"
-
LM Studio Releases Reworked Plugins with Fully Local Web Research
-
Developer Builds Fully Local Multi-Agent System Using vLLM and Parallel Inference
-
Why You Should Use Both ChatGPT and Local LLMs: A Practical Hybrid Approach
-
MacinAI Local brings functional LLM inference to classic Macintosh hardware
-
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet
-
Custom AI Smart Speaker
-
Cicikus v3 Prometheus 4.4B – An Experimental Franken-Merge for Edge Reasoning
-
Local AI Coding Assistant: Complete VS Code + Ollama + Continue Setup
-
Qwen 3.5 Small Expands On-Device AI to Phones and IoT with Offline Support
-
commitgen-cc – Generate Conventional Commit Messages Locally with Ollama
-
Show HN: Ivy – the first proactive, offline AI tutor
-
Unity Showcases Manufacturing AI Workflow at Smart Factory Expo
-
SynthesisOS – A Local-First, Agentic Desktop Layer Built in Rust
-
RunAnywhere Launches Production-Grade On-Device AI Platform for Enterprise Scale
-
OpenWrt 25.12.0 – Stable Release
-
On-Device AI Laptop Lineups Become Standard Across Major Manufacturers
-
ÆTHERYA Core – Deterministic Policy Engine for Governing LLM Actions
-
Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17
-
Apple Intelligence, Galaxy AI, Gemini: Why Your AI-Powered Phone Is Worth Repairing
-
On-Device Function Calling in Google AI Edge Gallery
-
Android Phones Are Getting Smarter Without Internet — Here's Why On-Device AI Is the Next Big Shift
-
Android Phones Are Getting Smarter Without Internet — On-Device AI as the Next Shift
-
Researchers Develop Persistent Memory System for Local LLMs—No RAG Required
-
Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio
-
Search and Analyze Documents from the DOJ Epstein Files Release with Local LLM
-
VaultAI – 42 AI Models on a Portable SSD, Works Offline for $399
-
TemplateFlow – Build AI Workflows, Not Prompts
-
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro
-
I Stopped Paying for ChatGPT and Built a Private AI Setup That Anyone Can Run
-
The Path to Ubiquitous AI (17k tokens/sec)
-
PaddleOCR-VL Now Integrated into llama.cpp for Multilingual OCR
-
Ollama Production Deployment: Docker-Compose Setup Guide
-
NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support
-
Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge
-
Kitten TTS V0.8 Released: New State-of-the-Art Super-Tiny TTS Model Under 25 MB
-
Why AI Models Fail at Iterative Reasoning and What Could Fix It
-
Free ASIC-Accelerated Llama 3.1 8B Inference at 16,000 Tokens/Second
-
Show HN: Forked – A Local Time-Travel Debugger for OpenClaw Agents
-
AI Integration in Sublime Text: Practical Local LLM Editor Enhancement
-
Self-Hosted Local LLMs for Document Management with Paperless-ngx
-
Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses
-
Running Local LLMs and VLMs on Arduino UNO Q with yzma
-
Mihup and Qualcomm Collaborate to Advance Secure On-Device Voice AI for BFSI
-
Complete Offline AI System: Voice Control and Smart Home via Local LLM and Radio Without Internet
-
Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows
-
Local-First RAG: Vector Search in SQLite with Hamming Distance
-
LayerScale Launches Inference Engine Faster Than vLLM, SGLang, and TRT-LLM
-
Kitten TTS V0.8 Released: State-of-the-Art Super-Tiny Text-to-Speech Model Under 25MB
-
GPT4All Replaces Ollama On Mac After Quick Trial
-
Clipthesis: Free Local App for Video Tagging and Search Across Drives
-
Why My Country's AI Scene Is Built on Sand
-
Tailscale Releases New Tool to Prevent Sensitive Data Leakage to Cloud AI Services
-
Show HN: Shiro.computer Static Page, Unix/NPM Shimmed to Host Claude Code
-
Sarvam AI Launches Edge Model to Challenge Major AI Players with Local-First Approach
-
Alibaba's Qwen3.5-397B Achieves #3 Position in Open Weights Model Rankings
-
Qualcomm Ventures Positions India as Blueprint for Affordable On-Device AI Infrastructure
-
OpenClaw Refactored in Go, Runs on $10 Hardware
-
Same INT8 Model Shows 93% to 71% Accuracy Variance Across Snapdragon Chipsets
-
GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs
-
Matmul-Free Language Model Trained on CPU in 1.2 Hours
-
Cloudflare Releases Agents SDK v0.5.0 with Rust-Powered Infire Engine for Edge Inference
-
Can We Leverage AI/LLMs for Self-Learning?
-
AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs
-
Self-Hosted AI: A Complete Roadmap for Beginners
-
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet
-
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
-
Open-Source Models Now Comprise 4 of Top 5 Most-Used Endpoints on OpenRouter
-
I attacked my own LangGraph agent system. All 6 attacks worked
-
High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference
-
Cohere Releases Tiny Aya: Efficient 3.3B Multilingual Model for 70+ Languages
-
Chinese AI Chipmaker Axera Semiconductor Plans $379 Million Hong Kong IPO for Edge Inference Hardware
-
ASUS Zenbook 14 Launches in India with AI-Capable Hardware, Starting at Rs 1,15,990
-
Asus ExpertBook B3 G2 Laptop Features Ryzen AI 9 HX 470 CPU in 1.41kg Ultraportable Form Factor
-
Ask HN: What is the best bang for buck budget AI coding?
-
Sourdine: Open-Source macOS App for 100% Local AI Transcription
-
Security Alert: Open Claw Designed for Self-Hosting, Stop Sharing Credentials
-
InitRunner: YAML-Based AI Agent Framework with RAG and Memory
-
GPU-Accelerated DataFrame Library for Local Inference Workloads
-
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release
-
Switching From Ollama And LM Studio To llama.cpp: A Performance Comparison
-
SnowBall Technique Addresses Context Window Limitations in Local LLMs
-
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
-
NVIDIA's Dynamic Memory Sparsification Cuts LLM Inference Costs by 8x
-
MiniMax Releases M2.5 Model with SOTA Coding and Agent Capabilities
-
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment
-
LLM APIs Reconceptualized as State Synchronization Challenge
-
LLaDA2.1 Introduces Token Editing for Massive Speed Gains in Local Inference
-
GPT-OSS 20B Now Runs 100% Locally in Browser via WebGPU
-
GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision
-
GNOME's AI Assistant Newelle Adds llama.cpp Support and Command Execution
-
175,000 Publicly Exposed Ollama AI Servers Discovered Across 130 Countries
-
Context Management Identified as Real Bottleneck in AI-Assisted Coding
-
ByteDance Releases Seed2.0 LLM with Complex Real-World Task Improvements
-
WinClaw: Windows-Native AI Assistant with Office Automation
-
First Vibecoded AI Operating System for Local Deployment
-
Simile AI Raises $100M Series A for Local AI Infrastructure
-
Ring-1T-2.5 Released with SOTA Deep Thinking Performance
-
MiniMax M2.5: 230B Parameter MoE Model Coming to HuggingFace
-
Ming-flash-omni-2.0: 100B MoE Omni-Modal Model Released
-
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
-
The Future of AI Slop Is Constraints - Implications for Local Models
-
Running Your Own AI Assistant for €19/Month: Complete Self-Hosting Guide
-
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
-
Samsung's REAM: Alternative Model Compression Technique
-
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
-
OpenClaw with vLLM Running for Free on AMD Developer Cloud
-
Researchers Find 175,000 Publicly Exposed Ollama AI Servers Across 130 Countries
-
Memio Launches AI-Powered Knowledge Hub for Android with Local Processing
-
GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks
-
I Tried a Claude Code Rival That's Local, Open Source, and Completely Free
-
NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics
-
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts
-
5 Practical Ways to Use Local LLMs with MCP Tools
-
Godot MCP Gives AI Assistants Full Access to Game Engine Editor
-
Building a RAG Pipeline on 2M+ Pages: EpsteinFiles-RAG Project
-
Energy-Based Models Compared Against Frontier AI for Sudoku Solving
-
DeepSeek Launches Model Update with 1M Context Window
-
Developer Creates Custom Local AI Headshot Generator After Commercial Solutions Fail
-
Carmack Proposes Using Long Fiber Lines as L2 Cache for Streaming AI Data
-
Arm SME2 Technology Expands CPU Capabilities for On-Device AI
-
Anthropic Releases Claude Opus 4.6 Sabotage Risk Assessment
-
Community Member Builds 144GB VRAM Local LLM Powerhouse