Tagged "model-optimization"
- What Breaks When AI Agent Frameworks Are Forced Into <1MB RAM and Sub-ms Startup
- Enhanced Interface Speed Enables High-Performance On-Device AI Features in Smartphones
- Show HN: Dypai – Build Backends from Your IDE Using AI and MCP
- Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
- Local GPT-OSS 20B Model Demonstrates Practical Agentic Capabilities
- How Slow Local LLMs Are on My Framework 13 AMD Strix Point
- AI PCs Explained: 7 Critical Truths About NPUs and Privacy
- [Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally
- I Thought I Needed a GPU to Run AI Until I Learned About These Models
- Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses
- Running Local LLMs and VLMs on Arduino UNO Q with yzma
- Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
- GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs
- Qwen3-Next 80B MoE Achieves 39 Tokens/Second on RTX 5070/5060 Ti Dual-GPU Setup
- Optimal llama.cpp Settings Found for Qwen3 Coder Next Loop Issues
- NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics