Tagged "resource-optimization"
- Qwen 3.5 Models: Optimal Settings and Reduced Overthinking Configuration
- LMCache Dramatically Accelerates LLM Inference on Oracle Data Science Platform
- Custom GPU Multiplexer Achieves 0.3ms Model Switching on Legacy Hardware
- Kimi Introduces Attention Residuals: 1.25x Compute Performance at <2% Overhead
- FreeBSD 14.4 Released: Implications for Local LLM Deployment
- Fine-Tuned Qwen SLMs (0.6–8B) Demonstrate Competitive Performance Against Frontier LLMs on Specialized Tasks
- Snapdragon Wear Elite Unveiled at MWC 2026, Advancing Wearable AI Inference
- SynthesisOS – A Local-First, Agentic Desktop Layer Built in Rust
- RunAnywhere Launches Production-Grade On-Device AI Platform for Enterprise Scale
- Qwen 3.5-27B Q4 Quantization Comparison and Analysis
- The ML.energy Leaderboard
- DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference
- Show HN: A Ground Up TLS 1.3 Client Written in C
- O-TITANS: Orthogonal LoRA Framework for Gemma 3 with Google TITANS Memory Architecture
- At India AI Impact Summit, Intel Showcases Its AI PCs and Cost-Efficient Frugal AI
- 24 Simultaneous Claude Code Agents on Local Hardware
- TemplateFlow – Build AI Workflows, Not Prompts
- Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge
- Local-First RAG: Vector Search in SQLite with Hamming Distance
- Sarvam AI Launches Edge Model to Challenge Major AI Players with Local-First Approach
- OpenClaw Refactored in Go, Runs on $10 Hardware
- Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet
- Switching From Ollama And LM Studio To llama.cpp: A Performance Comparison
- MiniMax Releases M2.5 Model with SOTA Coding and Agent Capabilities
- Switching From Ollama and LM Studio to llama.cpp: Performance Benefits
- Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts
- Energy-Based Models Compared Against Frontier AI for Sudoku Solving