Tagged "production-ops"

TemplateFlow – Build AI Workflows, Not Prompts 20 February 2026
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro 20 February 2026
Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System 20 February 2026
I Stopped Paying for ChatGPT and Built a Private AI Setup That Anyone Can Run 20 February 2026
The Path to Ubiquitous AI (17k tokens/sec) 20 February 2026
Ollama Production Deployment: Docker-Compose Setup Guide 20 February 2026
NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support 20 February 2026
Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge 20 February 2026
Show HN: Forked – A Local Time-Travel Debugger for OpenClaw Agents 20 February 2026
Self-Hosted Local LLMs for Document Management with Paperless-ngx 19 February 2026
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs 19 February 2026
Mihup and Qualcomm Collaborate to Advance Secure On-Device Voice AI for BFSI 19 February 2026
LayerScale Launches Inference Engine Faster Than vLLM, SGLang, and TRT-LLM 19 February 2026
Aegis.rs: Open Source Rust-Based LLM Security Proxy Released 19 February 2026
Tailscale Releases New Tool to Prevent Sensitive Data Leakage to Cloud AI Services 18 February 2026
Show HN: Shiro.computer Static Page, Unix/NPM Shimmed to Host Claude Code 18 February 2026
Alibaba's Qwen3.5-397B Achieves #3 Position in Open Weights Model Rankings 18 February 2026
Same INT8 Model Shows 93% to 71% Accuracy Variance Across Snapdragon Chipsets 18 February 2026
GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs 18 February 2026
Matmul-Free Language Model Trained on CPU in 1.2 Hours 18 February 2026
Real-World Coding Benchmark Tests LLMs on 65 Production Codebase Tasks 18 February 2026
Cloudflare Releases Agents SDK v0.5.0 with Rust-Powered Infire Engine for Edge Inference 18 February 2026
Ask HN: How Do You Debug Multi-Step AI Workflows When the Output Is Wrong? 18 February 2026
AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs 18 February 2026
Self-Hosted AI: A Complete Roadmap for Beginners 17 February 2026
Show HN: PgCortex – AI enrichment per Postgres row, zero transaction blocking 17 February 2026
Open-Source Models Now Comprise 4 of Top 5 Most-Used Endpoints on OpenRouter 17 February 2026
I attacked my own LangGraph agent system. All 6 attacks worked 17 February 2026
Show HN: Inkog – Pre-flight check for AI agents (governance, loops, injection) 17 February 2026
High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference 17 February 2026
Chinese AI Chipmaker Axera Semiconductor Plans $379 Million Hong Kong IPO for Edge Inference Hardware 17 February 2026
Asus ExpertBook B3 G2 Laptop Features Ryzen AI 9 HX 470 CPU in 1.41kg Ultraportable Form Factor 17 February 2026
I broke into my own AI system in 10 minutes. I built it 17 February 2026
GPU-Accelerated DataFrame Library for Local Inference Workloads 16 February 2026
Critical vLLM RCE Vulnerability Allows Remote Code Execution via Video Links 14 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 14 February 2026
LLM APIs Reconceptualized as State Synchronization Challenge 14 February 2026
LLaDA2.1 Introduces Token Editing for Massive Speed Gains in Local Inference 14 February 2026
175,000 Publicly Exposed Ollama AI Servers Discovered Across 130 Countries 14 February 2026
First Vibecoded AI Operating System for Local Deployment 13 February 2026
Switching From Ollama and LM Studio to llama.cpp: Performance Benefits 13 February 2026
Simile AI Raises $100M Series A for Local AI Infrastructure 13 February 2026
175,000 Publicly Exposed Ollama AI Servers Discovered Across 130 Countries 13 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 13 February 2026
ByteDance Releases Seedance 2.0 AI Development Platform 12 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 12 February 2026
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second 12 February 2026
Researchers Find 175,000 Publicly Exposed Ollama AI Servers Across 130 Countries 12 February 2026
Heaps Do Lie: Debugging a Memory Leak in vLLM 12 February 2026
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams 12 February 2026
Analysis Reveals AI's Real Impact on Software Launches and Development 12 February 2026
Mistral AI Debugs Critical Memory Leak in vLLM Inference Engine 11 February 2026
175,000 Publicly Exposed Ollama Servers Create Major Security Risk 11 February 2026
NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics 11 February 2026
Community Member Builds 144GB VRAM Local LLM Powerhouse 11 February 2026