Tagged "llm-deployment"

Netflix Wiz Creates App to Slash AI Bills, Then Open Sources It 1 June 2026
Why Chinese AI Labs Went Open and Will Remain Open 31 May 2026
Apple Doubles Down on On-Device AI at WWDC 2026, Setting Privacy-First Strategy 30 May 2026
AI Token Streaming Isn't About SSE vs. WebSockets 21 May 2026
OpenNebula 7.2 "Dark Horse" Released with Enhanced Infrastructure Support 14 April 2026
Users Report Significant Performance Improvements After Migrating from Ollama to llama.cpp 12 April 2026
Gemini-CLI, Llama.cpp, and Qwen3.5 Running on NVIDIA Jetson TK1 9 April 2026
GMKtec NucBox K17 Launches with 97 TOPS AI Performance for Local Inference 5 April 2026
Intel's $949 GPU Has 32GB of VRAM for Local AI, but Software is Why Nvidia Keeps Winning 2 April 2026
Self-Hostable AI Agents and Internal Software Framework Released 23 March 2026
Qt 6.11 Released with Enhanced Cross-Platform Deployment Capabilities 23 March 2026
Claude Usage Monitor: Track API Usage with macOS Menu Bar App 23 March 2026
How to Build a Self-Hosted AI Server with LM Studio: Step-by-Step Guide 23 March 2026
Building a Production AI Receptionist: Practical Local LLM Deployment Case Study 23 March 2026
Qwen 3.5 122B Uncensored (Aggressive) Released with New K_P Quantisations 22 March 2026
Developer Builds Fully Local Multi-Agent System Using vLLM and Parallel Inference 22 March 2026
Multi-Token Prediction support coming to MLX-LM for Qwen 3.5 21 March 2026
Apple M5 Max 128GB real-world performance benchmarks for local inference 21 March 2026
What AI Augmentation Means for Technical Leaders 21 March 2026
Llamafile 0.10 Released with GPU Support and Rebuilt Core 20 March 2026
My Dinner with AI 18 March 2026
Mamba 3: State Space Model Architecture Optimized for Inference 18 March 2026
I Switched to a Local LLM for These 5 Tasks and the Cloud Version Hasn't Been Worth It Since 18 March 2026
Auto-retry Claude Code on subscription rate limits (zero deps, tmux-based) 18 March 2026
How I Used Lima for an AI Coding Agent Sandbox 17 March 2026
How AI Agents Should Pay for API Calls: X402 and USDC Verification on Base 17 March 2026
LoKI – Local AI Assistant for Linux and WSL 16 March 2026
Qwen3.5-397B Achieves 282 tok/s on 4x RTX PRO 6000 Blackwell Through Custom CUTLASS Kernel 15 March 2026
Open-Source GreenBoost Driver Augments NVIDIA GPU VRAM With System RAM and NVMe Storage 15 March 2026
AMD Launches Agent System Optimized for Local AI Inference With Ryzen and Radeon 15 March 2026
P-EAGLE: Faster LLM Inference with Parallel Speculative Decoding in vLLM 14 March 2026
Local LLMs on Apple Silicon Mac 2026: M1 M2 M3 Guide 14 March 2026
Show HN: Intake API – An Inbox for AI Coding Agents 14 March 2026
How to Run Local LLMs in 2026: The Complete Developer's Guide 14 March 2026
Show HN: Bots of WallStreet – Multi-Agent Debate and Prediction Framework 14 March 2026
Best Local LLM Models 2026: Developer Comparison 14 March 2026
AgentArmor: Open-Source 8-Layer Security Framework for AI Agents 14 March 2026
Show HN: VmExit – An Experiment in AI-Native Computing 12 March 2026
MeepaChat – Slack for AI Agents (iOS, macOS, Web / Cloud, Self-Hosted) 12 March 2026
The $1,500 Local AI Setup: DeepSeek-R1 on Consumer Hardware 12 March 2026
A Kubernetes Operator That Orchestrates AI Coding Agents 11 March 2026
Qwen 3.5 Ultra-Compact Models Enable On-Device AI from Watches to Gaming 10 March 2026
M5 Max and M5 Ultra Chipsets Demonstrate Significant Bandwidth Improvements for Local LLM Inference 10 March 2026
Qwen 3.5 Derestricted Model Available for Local Deployment 9 March 2026
Nota AI to Showcase End-to-End On-Device AI Optimization at Embedded World 2026 9 March 2026
How to Run Your Own Local LLM — 2026 Edition 9 March 2026
Qwen 3.5 27B Achieves Strong Local Inference Performance 8 March 2026
Show HN: Proxly – Self-hosted tunneling on your own domain in 60 seconds 8 March 2026
Show HN: SimplAI – Build and Deploy AI Agents and Workflows Without Boilerplate 7 March 2026
Llama.cpp Merges Automatic Parser Generator to Mainline 7 March 2026
IBM Granite 4.0 1B Speech Model Released for Multilingual Speech Recognition 7 March 2026
Qwen 3.5-4B Generates Fully Functional OS in Single Prompt 4 March 2026
Qwen 3.5-35B-A3B Achieves 37.8% on SWE-bench Verified Hard 4 March 2026
Qwen 3.5-27B Q4 Quantization Comparison and Analysis 4 March 2026
Apple M5 Pro and M5 Max: 4× Faster LLM Processing 4 March 2026
VibeWhisper – macOS Voice-to-Text with 100% Local Processing Option 3 March 2026
Qwen 3.5 0.8B Successfully Deployed on 7-Year-Old Samsung S10E Using llama.cpp 3 March 2026
GitDelivr: A Free CDN for Git Clones Built on Cloudflare Workers and R2 2 March 2026
Browser Use vs. Claude Computer Use: Comparing Agent Automation Frameworks 2 March 2026
Huawei's SuperPoD Portfolio Creates New Option for Global Computing at MWC Barcelona 2026 1 March 2026
Serve Markdown to LLMs from your Next.js app 28 February 2026
Krasis Hybrid MoE Runtime Achieves 3,324 tok/s Prefill on Single RTX 5080 28 February 2026
5 Useful Docker Containers for Agentic Developers 28 February 2026
Extracting 100K Concepts from an 8B LLM 27 February 2026
5 Useful Docker Containers for Agentic Developers 27 February 2026
Show HN: AgentGate – Stake-Gated Action Microservice for AI Agents 27 February 2026
Qwen3.5 122B Achieves 25 tok/s on 72GB VRAM Setup 26 February 2026
LM Studio vs Ollama: Complete Comparison 26 February 2026
The Complete Developer's Guide to Running LLMs Locally: From Ollama to Production 26 February 2026
Red Hat Launches AI Enterprise for Hybrid AI Deployments 25 February 2026
Qwen3.5 Thinking Mode Can Be Disabled for Production Inference Optimization 25 February 2026
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers 25 February 2026
The Real AI Competition Is Closed-Source vs Open-Source, Not America vs China 24 February 2026
Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers 24 February 2026
How Do You Know Which SKILL.md Is Good? 23 February 2026
Massu: Governance Layer for AI Coding Assistants with 51 MCP Tools 23 February 2026
Gix: Go CLI for AI-Generated Commit Messages 23 February 2026
FORTHought: Self-Hosted AI Stack for Physics Labs Built on OpenWebUI 23 February 2026
The Complete Stack for Local Autonomous Agents: From GGML to Orchestration 23 February 2026
Google Is Exploring Ways to Use Its Financial Might to Take on Nvidia 21 February 2026
The Path to Ubiquitous AI (17k tokens/sec) 20 February 2026
Ollama Production Deployment: Docker-Compose Setup Guide 20 February 2026
Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge 20 February 2026
Kitten TTS V0.8 Released: New State-of-the-Art Super-Tiny TTS Model Under 25 MB 20 February 2026
Alibaba's Qwen3.5-397B Achieves #3 Position in Open Weights Model Rankings 18 February 2026
Cohere Releases Tiny Aya: Efficient 3.3B Multilingual Model for 70+ Languages 17 February 2026
Switching From Ollama And LM Studio To llama.cpp: A Performance Comparison 14 February 2026
Running Your Own AI Assistant for €19/Month: Complete Self-Hosting Guide 12 February 2026
ByteDance Releases Seedance 2.0 AI Development Platform 12 February 2026
OpenClaw with vLLM Running for Free on AMD Developer Cloud 12 February 2026
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts 11 February 2026