Tagged "performance-optimization"
- Show HN: A Human-Curated, CLI-Driven Context Layer for AI Agents
- What Breaks When AI Agent Frameworks Are Forced Into <1MB RAM and Sub-ms Startup
- Kioxia Sampling UFS 5.0 Embedded Flash Memory for Next-Generation Mobile Applications
- Enhanced Interface Speed Enables High-Performance On-Device AI Features in Smartphones
- Open-Source Framework Achieves Gemini 3 Deep Think Level Performance Through Local Model Scaffolding
- Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
- Open-Source + AI: ggml Joins Hugging Face, llama.cpp Stays Open—Local AI's Long-Term Home
- GPT4All Replaces Ollama On Mac After Quick Trial
- Cloudflare Releases Agents SDK v0.5.0 with Rust-Powered Infire Engine for Edge Inference
- AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs
- Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
- Memio Launches AI-Powered Knowledge Hub for Android with Local Processing
- NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics
- Carmack Proposes Using Long Fiber Lines as L2 Cache for Streaming AI Data