Tagged "production-ops"
- Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
- Mihup and Qualcomm Collaborate to Advance Secure On-Device Voice AI for BFSI
- LayerScale Launches Inference Engine Faster Than vLLM, SGLang, and TRT-LLM
- Aegis.rs: Open Source Rust-Based LLM Security Proxy Released
- Tailscale Releases New Tool to Prevent Sensitive Data Leakage to Cloud AI Services
- Show HN: Shiro.computer Static Page, Unix/NPM Shimmed to Host Claude Code
- Alibaba's Qwen3.5-397B Achieves #3 Position in Open Weights Model Rankings
- Same INT8 Model Shows 93% to 71% Accuracy Variance Across Snapdragon Chipsets
- GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs
- Matmul-Free Language Model Trained on CPU in 1.2 Hours
- Real-World Coding Benchmark Tests LLMs on 65 Production Codebase Tasks
- Cloudflare Releases Agents SDK v0.5.0 with Rust-Powered Infire Engine for Edge Inference
- Ask HN: How Do You Debug Multi-Step AI Workflows When the Output Is Wrong?
- AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs
- Self-Hosted AI: A Complete Roadmap for Beginners
- Show HN: PgCortex – AI enrichment per Postgres row, zero transaction blocking
- Open-Source Models Now Comprise 4 of Top 5 Most-Used Endpoints on OpenRouter
- I attacked my own LangGraph agent system. All 6 attacks worked
- Show HN: Inkog – Pre-flight check for AI agents (governance, loops, injection)
- High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference
- Chinese AI Chipmaker Axera Semiconductor Plans $379 Million Hong Kong IPO for Edge Inference Hardware
- Asus ExpertBook B3 G2 Laptop Features Ryzen AI 9 HX 470 CPU in 1.41kg Ultraportable Form Factor
- I broke into my own AI system in 10 minutes. I built it
- GPU-Accelerated DataFrame Library for Local Inference Workloads
- First Vibecoded AI Operating System for Local Deployment
- Switching From Ollama and LM Studio to llama.cpp: Performance Benefits
- Simile AI Raises $100M Series A for Local AI Infrastructure
- 175,000 Publicly Exposed Ollama AI Servers Discovered Across 130 Countries
- Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
- ByteDance Releases Seedance 2.0 AI Development Platform
- Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
- Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
- Researchers Find 175,000 Publicly Exposed Ollama AI Servers Across 130 Countries
- Heaps Do Lie: Debugging a Memory Leak in vLLM
- New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams
- Analysis Reveals AI's Real Impact on Software Launches and Development
- Mistral AI Debugs Critical Memory Leak in vLLM Inference Engine
- 175,000 Publicly Exposed Ollama Servers Create Major Security Risk
- NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics
- Community Member Builds 144GB VRAM Local LLM Powerhouse