Tagged "cost-saving"
-
Show HN: A Human-Curated, CLI-Driven Context Layer for AI Agents
-
No, Local LLMs Can't Replace ChatGPT or Gemini — I Tried
-
Apple Accelerates U.S. Manufacturing with Mac Mini Production
-
Comparing Manual vs. AI Requirements Gathering: 2 Sentences vs. 127-Point Spec
-
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
-
Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio
-
Local GPT-OSS 20B Model Demonstrates Practical Agentic Capabilities
-
Gix: Go CLI for AI-Generated Commit Messages
-
Yet Another Fix Coming for Older AMD GPUs on Linux – Thanks to Valve Developer
-
Show HN: Tickr – AI Project Manager That Lives Inside Slack (Replaces Jira)
-
At India AI Impact Summit, Intel Showcases AI PCs and Cost-Efficient Frugal AI
-
CPU-Trained Language Model Outperforms GPU Baseline After 40 Hours
-
Asus ExpertBook B3 G2 with 50 TOPS AI Sets New Enterprise Standard
-
Taalas Etches AI Models onto Transistors to Rocket Boost Inference
-
I Run Local LLMs in One of the World's Priciest Energy Markets, and I Can Barely Tell
-
GGML.AI Acquired by Hugging Face
-
24 Simultaneous Claude Code Agents on Local Hardware
-
Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses
-
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
-
LayerScale Launches Inference Engine Faster Than vLLM, SGLang, and TRT-LLM
-
Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference
-
Show HN: Shiro.computer Static Page, Unix/NPM Shimmed to Host Claude Code
-
Sarvam AI Launches Edge Model to Challenge Major AI Players with Local-First Approach
-
Alibaba's Qwen3.5-397B Achieves #3 Position in Open Weights Model Rankings
-
Qualcomm Ventures Positions India as Blueprint for Affordable On-Device AI Infrastructure
-
OpenClaw Refactored in Go, Runs on $10 Hardware
-
GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs
-
Matmul-Free Language Model Trained on CPU in 1.2 Hours
-
Cloudflare Releases Agents SDK v0.5.0 with Rust-Powered Infire Engine for Edge Inference
-
Qwen3-Next 80B MoE Achieves 39 Tokens/Second on RTX 5070/5060 Ti Dual-GPU Setup
-
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
-
Show HN: PgCortex – AI enrichment per Postgres row, zero transaction blocking
-
Open-Source Models Now Comprise 4 of Top 5 Most-Used Endpoints on OpenRouter
-
High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference
-
Cohere Releases Tiny Aya: Efficient 3.3B Multilingual Model for 70+ Languages
-
Chinese AI Chipmaker Axera Semiconductor Plans $379 Million Hong Kong IPO for Edge Inference Hardware
-
ASUS Zenbook 14 Launches in India with AI-Capable Hardware, Starting at Rs 1,15,990
-
Ask HN: What is the best bang for buck budget AI coding?
-
MiniMax M2.5: 230B Parameter MoE Model Coming to HuggingFace
-
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
-
Running Your Own AI Assistant for €19/Month: Complete Self-Hosting Guide
-
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
-
OpenClaw with vLLM Running for Free on AMD Developer Cloud
-
I Tried a Claude Code Rival That's Local, Open Source, and Completely Free
-
Use Recursive Language Models to address huge contexts for local LLM
-
NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics
-
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts
-
5 Practical Ways to Use Local LLMs with MCP Tools
-
Energy-Based Models Compared Against Frontier AI for Sudoku Solving
-
Carmack Proposes Using Long Fiber Lines as L2 Cache for Streaming AI Data
-
Arm SME2 Technology Expands CPU Capabilities for On-Device AI