Tagged "llm-deployment"
-
OpenNebula 7.2 "Dark Horse" Released with Enhanced Infrastructure Support
-
Users Report Significant Performance Improvements After Migrating from Ollama to llama.cpp
-
Gemini-CLI, Llama.cpp, and Qwen3.5 Running on NVIDIA Jetson TK1
-
GMKtec NucBox K17 Launches with 97 TOPS AI Performance for Local Inference
-
Intel's $949 GPU Has 32GB of VRAM for Local AI, but Software is Why Nvidia Keeps Winning
-
Self-Hostable AI Agents and Internal Software Framework Released
-
Qt 6.11 Released with Enhanced Cross-Platform Deployment Capabilities
-
Claude Usage Monitor: Track API Usage with macOS Menu Bar App
-
How to Build a Self-Hosted AI Server with LM Studio: Step-by-Step Guide
-
Building a Production AI Receptionist: Practical Local LLM Deployment Case Study
-
Qwen 3.5 122B Uncensored (Aggressive) Released with New K_P Quantisations
-
Developer Builds Fully Local Multi-Agent System Using vLLM and Parallel Inference
-
Multi-Token Prediction support coming to MLX-LM for Qwen 3.5
-
Apple M5 Max 128GB real-world performance benchmarks for local inference
-
What AI Augmentation Means for Technical Leaders
-
Llamafile 0.10 Released with GPU Support and Rebuilt Core
-
My Dinner with AI
-
Mamba 3: State Space Model Architecture Optimized for Inference
-
I Switched to a Local LLM for These 5 Tasks and the Cloud Version Hasn't Been Worth It Since
-
Auto-retry Claude Code on subscription rate limits (zero deps, tmux-based)
-
How I Used Lima for an AI Coding Agent Sandbox
-
How AI Agents Should Pay for API Calls: X402 and USDC Verification on Base
-
LoKI – Local AI Assistant for Linux and WSL
-
Qwen3.5-397B Achieves 282 tok/s on 4x RTX PRO 6000 Blackwell Through Custom CUTLASS Kernel
-
Open-Source GreenBoost Driver Augments NVIDIA GPU VRAM With System RAM and NVMe Storage
-
AMD Launches Agent System Optimized for Local AI Inference With Ryzen and Radeon
-
P-EAGLE: Faster LLM Inference with Parallel Speculative Decoding in vLLM
-
Local LLMs on Apple Silicon Mac 2026: M1 M2 M3 Guide
-
Show HN: Intake API – An Inbox for AI Coding Agents
-
How to Run Local LLMs in 2026: The Complete Developer's Guide
-
Show HN: Bots of WallStreet – Multi-Agent Debate and Prediction Framework
-
Best Local LLM Models 2026: Developer Comparison
-
AgentArmor: Open-Source 8-Layer Security Framework for AI Agents
-
Show HN: VmExit – An Experiment in AI-Native Computing
-
MeepaChat – Slack for AI Agents (iOS, macOS, Web / Cloud, Self-Hosted)
-
The $1,500 Local AI Setup: DeepSeek-R1 on Consumer Hardware
-
A Kubernetes Operator That Orchestrates AI Coding Agents
-
Qwen 3.5 Ultra-Compact Models Enable On-Device AI from Watches to Gaming
-
M5 Max and M5 Ultra Chipsets Demonstrate Significant Bandwidth Improvements for Local LLM Inference
-
Qwen 3.5 Derestricted Model Available for Local Deployment
-
Nota AI to Showcase End-to-End On-Device AI Optimization at Embedded World 2026
-
How to Run Your Own Local LLM — 2026 Edition
-
Qwen 3.5 27B Achieves Strong Local Inference Performance
-
Show HN: Proxly – Self-hosted tunneling on your own domain in 60 seconds
-
Show HN: SimplAI – Build and Deploy AI Agents and Workflows Without Boilerplate
-
Llama.cpp Merges Automatic Parser Generator to Mainline
-
IBM Granite 4.0 1B Speech Model Released for Multilingual Speech Recognition
-
Qwen 3.5-4B Generates Fully Functional OS in Single Prompt
-
Qwen 3.5-35B-A3B Achieves 37.8% on SWE-bench Verified Hard
-
Qwen 3.5-27B Q4 Quantization Comparison and Analysis
-
Apple M5 Pro and M5 Max: 4× Faster LLM Processing
-
VibeWhisper – macOS Voice-to-Text with 100% Local Processing Option
-
Qwen 3.5 0.8B Successfully Deployed on 7-Year-Old Samsung S10E Using llama.cpp
-
GitDelivr: A Free CDN for Git Clones Built on Cloudflare Workers and R2
-
Browser Use vs. Claude Computer Use: Comparing Agent Automation Frameworks
-
Huawei's SuperPoD Portfolio Creates New Option for Global Computing at MWC Barcelona 2026
-
Serve Markdown to LLMs from your Next.js app
-
Krasis Hybrid MoE Runtime Achieves 3,324 tok/s Prefill on Single RTX 5080
-
5 Useful Docker Containers for Agentic Developers
-
Extracting 100K Concepts from an 8B LLM
-
5 Useful Docker Containers for Agentic Developers
-
Show HN: AgentGate – Stake-Gated Action Microservice for AI Agents
-
Qwen3.5 122B Achieves 25 tok/s on 72GB VRAM Setup
-
LM Studio vs Ollama: Complete Comparison
-
The Complete Developer's Guide to Running LLMs Locally: From Ollama to Production
-
Red Hat Launches AI Enterprise for Hybrid AI Deployments
-
Qwen3.5 Thinking Mode Can Be Disabled for Production Inference Optimization
-
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers
-
The Real AI Competition Is Closed-Source vs Open-Source, Not America vs China
-
Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers
-
How Do You Know Which SKILL.md Is Good?
-
Massu: Governance Layer for AI Coding Assistants with 51 MCP Tools
-
Gix: Go CLI for AI-Generated Commit Messages
-
FORTHought: Self-Hosted AI Stack for Physics Labs Built on OpenWebUI
-
The Complete Stack for Local Autonomous Agents: From GGML to Orchestration
-
Google Is Exploring Ways to Use Its Financial Might to Take on Nvidia
-
The Path to Ubiquitous AI (17k tokens/sec)
-
Ollama Production Deployment: Docker-Compose Setup Guide
-
Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge
-
Kitten TTS V0.8 Released: New State-of-the-Art Super-Tiny TTS Model Under 25 MB
-
Alibaba's Qwen3.5-397B Achieves #3 Position in Open Weights Model Rankings
-
Cohere Releases Tiny Aya: Efficient 3.3B Multilingual Model for 70+ Languages
-
Switching From Ollama And LM Studio To llama.cpp: A Performance Comparison
-
Running Your Own AI Assistant for €19/Month: Complete Self-Hosting Guide
-
ByteDance Releases Seedance 2.0 AI Development Platform
-
OpenClaw with vLLM Running for Free on AMD Developer Cloud
-
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts