Tagged "context-window"
- Chinese LLM Ecosystem Landscape: ByteDance Doubao, Alibaba, and Open-Source Competition
- ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B
- A Little Gap That Will Ensure the Future of AI Agents Being Autonomous
- AI Playground for Developers Built in Vite and Python
- Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models
- Mamba 3: State Space Model Architecture Optimized for Inference
- Running Qwen3.5-27B Across Multiple GPUs Over LAN Achieves Practical Speed for Local Inference
- Memory Should Decay: Implementing Temporal Memory Decay in Local LLM Systems
- 3-Path Agent Memory: 8 KB Recurrent State vs. 156 MB KV Cache at 10K Tokens
- 8 Local LLM Settings Most People Never Touch That Fixed My Worst AI Problems
- Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models
- ETH Zurich Research Challenges Context-Length Assumptions in LLM Agents
- Analysis Reveals Claude Code Sends 62,600 Characters of Tool Definitions Per Turn
- Incrmd: Incremental AI Coding by Editing PROJECT.md
- Qwen 3.5 27B Achieves 100+ Tokens/s Decode on Dual RTX 3090s with 170K Context
- Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference
- Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti
- Researchers Develop Persistent Memory System for Local LLMs—No RAG Required
- Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment
- Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
- Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
- Google Is Exploring Ways to Use Its Financial Might to Take on Nvidia
- Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System
- The Path to Ubiquitous AI (17k tokens/sec)
- Why AI Models Fail at Iterative Reasoning and What Could Fix It
- GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs
- SnowBall Technique Addresses Context Window Limitations in Local LLMs
- NVIDIA's Dynamic Memory Sparsification Cuts LLM Inference Costs by 8x
- GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision
- Context Management Identified as Real Bottleneck in AI-Assisted Coding
- Use Recursive Language Models to address huge contexts for local LLM
- DeepSeek Launches Model Update with 1M Context Window