Tagged "context-window"

SigMap – Shrink AI Coding Context 97% with Auto-Scaling Token Budget 15 April 2026
Learn LLM Internals 13 April 2026
Gemma 4 31B vs Qwen 3.5 27B: Comprehensive Long Context Benchmark 11 April 2026
MemPalace, the Highest-Scoring AI Memory System Ever Benchmarked 7 April 2026
Context Window Optimization: Extending Gemma 4 Context Length Through Efficient Projection Quantization 6 April 2026
Gemma 4 26B A4B Outperforms Qwen 3.5 35B on Apple Silicon 3 April 2026
Mixed KV Cache Quantization: Performance Risks and Pitfalls 29 March 2026
Linux Significantly Outperforms Windows for Local LLM Inference 29 March 2026
TurboQuant KV Cache Compression Achieves 22.8% Faster Decoding at 32K Context 28 March 2026
Qwen3 512k Context via TurboQuant on Mac mini 28 March 2026
Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config 27 March 2026
Chinese LLM Ecosystem Landscape: ByteDance Doubao, Alibaba, and Open-Source Competition 24 March 2026
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B 22 March 2026
A Little Gap That Will Ensure the Future of AI Agents Being Autonomous 22 March 2026
AI Playground for Developers Built in Vite and Python 22 March 2026
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models 20 March 2026
Mamba 3: State Space Model Architecture Optimized for Inference 18 March 2026
Running Qwen3.5-27B Across Multiple GPUs Over LAN Achieves Practical Speed for Local Inference 15 March 2026
Memory Should Decay: Implementing Temporal Memory Decay in Local LLM Systems 14 March 2026
3-Path Agent Memory: 8 KB Recurrent State vs. 156 MB KV Cache at 10K Tokens 14 March 2026
8 Local LLM Settings Most People Never Touch That Fixed My Worst AI Problems 10 March 2026
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models 9 March 2026
ETH Zurich Research Challenges Context-Length Assumptions in LLM Agents 8 March 2026
Analysis Reveals Claude Code Sends 62,600 Characters of Tool Definitions Per Turn 6 March 2026
Incrmd: Incremental AI Coding by Editing PROJECT.md 4 March 2026
Qwen 3.5 27B Achieves 100+ Tokens/s Decode on Dual RTX 3090s with 170K Context 2 March 2026
Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference 2 March 2026
Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti 26 February 2026
Researchers Develop Persistent Memory System for Local LLMs—No RAG Required 26 February 2026
Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment 25 February 2026
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference 23 February 2026
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference 23 February 2026
Google Is Exploring Ways to Use Its Financial Might to Take on Nvidia 21 February 2026
Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System 20 February 2026
The Path to Ubiquitous AI (17k tokens/sec) 20 February 2026
Why AI Models Fail at Iterative Reasoning and What Could Fix It 20 February 2026
GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs 18 February 2026
SnowBall Technique Addresses Context Window Limitations in Local LLMs 14 February 2026
NVIDIA's Dynamic Memory Sparsification Cuts LLM Inference Costs by 8x 14 February 2026
GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision 14 February 2026
Context Management Identified as Real Bottleneck in AI-Assisted Coding 14 February 2026
Use Recursive Language Models to address huge contexts for local LLM 12 February 2026
DeepSeek Launches Model Update with 1M Context Window 11 February 2026