Tagged "model-optimization"

LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language 24 March 2026
Qwen 3.5 Models: Optimal Settings and Reduced Overthinking Configuration 23 March 2026
How to Build a Self-Hosted AI Server with LM Studio: Step-by-Step Guide 23 March 2026
Alibaba Commits to Continuous Open-Sourcing of Qwen and Wan Models 23 March 2026
Powerful AI Search Engine Built on Single GeForce RTX 5090 23 March 2026
Llama 8B Matches 70B Performance on Multi-Hop QA Using Structured Prompting 22 March 2026
Ultra-Compact 28M Parameter Models Show Promise for Specialized Domain Tasks 20 March 2026
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options 20 March 2026
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models 20 March 2026
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor 20 March 2026
Cursor's Composer 2 Model Analysis – Fine-Tuned Variant of Kimi K2.5 20 March 2026
AI's Impact on Mathematics Analogous to Car's Impact on Cities 20 March 2026
MiniMax-M2.7: New Compact Model Announced for Local Deployment 18 March 2026
Run LLMs Locally with Llama.cpp 17 March 2026
Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks 17 March 2026
Local Qwen Models Master Browser Automation Through Iterative Replanning 17 March 2026
Researcher Discovers Universal "Danger Zone" in Transformer Model Architecture at 50% Depth 17 March 2026
OpenClaw Isn't the Only Raspberry Pi AI Tool—Here Are 4 Others You Can Try This Week 16 March 2026
Practical Fix for Qwen 3.5 Overthinking in llama.cpp 16 March 2026
Nota Added to Three Technology and Growth ETFs in a Row – Market Recognition for AI Efficiency 16 March 2026
StepFun Releases SFT Dataset Used to Train Step 3.5 Flash for Community Fine-Tuning 15 March 2026
India's Mobile-First AI Strategy Could Accelerate Local Inference Adoption in Emerging Markets 15 March 2026
Local LLMs on Apple Silicon Mac 2026: M1 M2 M3 Guide 14 March 2026
Qwodel – An Open-Source Unified Pipeline for LLM Quantization 12 March 2026
Quantization Explained: Q4_K_M vs AWQ vs FP16 for Local LLMs 12 March 2026
Nvidia Releases Nemotron 3 Super: 120B MoE Model for Local Deployment 12 March 2026
The $1,500 Local AI Setup: DeepSeek-R1 on Consumer Hardware 12 March 2026
Show HN: Detect When an LLM Silently Changes Behavior for the Same Prompt 12 March 2026
Sarvam Open-Sources 30B and 105B Reasoning Models 11 March 2026
Simple Layer Duplication Technique Achieves Top Open LLM Leaderboard Performance 11 March 2026
Fine-Tuned Qwen SLMs (0.6–8B) Demonstrate Competitive Performance Against Frontier LLMs on Specialized Tasks 10 March 2026
Qwen 3.5 Small Expands On-Device AI to Phones and IoT with Offline Support 9 March 2026
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models 9 March 2026
Nota AI to Showcase End-to-End On-Device AI Optimization at Embedded World 2026 9 March 2026
Nemotron 9B Powers Large-Scale Local Inference: Patent Classification and Real-Time Applications 9 March 2026
How to Run Your Own Local LLM — 2026 Edition 9 March 2026
Snapdragon Wear Elite Unveiled at MWC 2026, Advancing Wearable AI Inference 8 March 2026
Samsung Opens Registration for Vision AI QLED and OLED Television Integration 8 March 2026
Student Researcher Achieves 42x Model Compression Through Novel Architecture 8 March 2026
Windows 11 Notepad Gets On-Device AI Text Generation Without Subscription 7 March 2026
Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support 7 March 2026
The Emerging Role of SRAM-Centric Chips in AI Inference 6 March 2026
Building PyTorch-Native Support for IBM Spyre Accelerator 6 March 2026
Unity Showcases Manufacturing AI Workflow at Smart Factory Expo 5 March 2026
Kakao Launches Kanana AI for On-Device Schedule and Recommendation Management 5 March 2026
Qwen 3.5-27B Q4 Quantization Comparison and Analysis 4 March 2026
Qualcomm Snapdragon Wear Elite Brings On-Device AI to Smartwatches 4 March 2026
On-Device AI Laptop Lineups Become Standard Across Major Manufacturers 4 March 2026
Qualcomm Snapdragon Wear Elite: 2B Parameter NPU for Personal AI Wearables 3 March 2026
Apple M4 iPad Air Targets AI Users with Double M1 Speed Performance 3 March 2026
Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17 3 March 2026
Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference 2 March 2026
Jan Releases Code-Tuned 4B Model for Efficient Local Code Generation and Development Tasks 2 March 2026
Change Intent Records: The Missing Artifact in AI-Assisted Development 2 March 2026
Switch Qwen 3.5 Thinking Mode On/Off Without Model Reload Using setParamsByID 1 March 2026
Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models 1 March 2026
AI-Native Store Research 1 March 2026
Unsloth Dynamic 2.0 GGUFs 28 February 2026
Qwen3.5-35B Successfully Runs on Raspberry Pi 5 at 3+ Tokens/Second 28 February 2026
Qwen 3.5-27B Demonstrates Exceptional Performance with Thoughtful Prompt Engineering 28 February 2026
Meta Reveals AI-Packed Smartwatch In 2026 – Why Wearables Shift Now 28 February 2026
LLmFit: Terminal Tool for Right-Sizing LLM Models to Your Hardware 28 February 2026
Galaxy S26 Debuts AI-Powered Scam Detection in Bold Security Push 28 February 2026
Arduino, Qualcomm Bring On-Device AI and Robotics Learning to Indian School Systems 28 February 2026
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot 28 February 2026
Show HN: Caret – Tab to Complete at Any App on Your Mac 27 February 2026
Arduino, Qualcomm Bring On-Device AI and Robotics Learning to Indian School Systems 27 February 2026
Arduino and Qualcomm Bring On-Device AI Learning to Indian Schools 27 February 2026
Running LLMs on Raspberry Pi and Edge Devices: A Practical Guide 26 February 2026
Qwen3.5 122B Achieves 25 tok/s on 72GB VRAM Setup 26 February 2026
DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference 26 February 2026
Apple: Python bindings for access to the on-device Apple Intelligence model 26 February 2026
Agent System – 7 specialized AI agents that plan, build, verify, and ship code 26 February 2026
Show HN: 100% LLM Accuracy–No Fine-Tuning, JSON Only 25 February 2026
What Breaks When AI Agent Frameworks Are Forced Into <1MB RAM and Sub-ms Startup 25 February 2026
Enhanced Interface Speed Enables High-Performance On-Device AI Features in Smartphones 24 February 2026
Show HN: Dypai – Build Backends from Your IDE Using AI and MCP 24 February 2026
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference 23 February 2026
Open-Source Framework Achieves Gemini 3 Deep Think Level Performance Through Local Model Scaffolding 23 February 2026
Local GPT-OSS 20B Model Demonstrates Practical Agentic Capabilities 23 February 2026
Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search 23 February 2026
How Slow Local LLMs Are on My Framework 13 AMD Strix Point 22 February 2026
AI PCs Explained: 7 Critical Truths About NPUs and Privacy 22 February 2026
[Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally 21 February 2026
I Thought I Needed a GPU to Run AI Until I Learned About These Models 21 February 2026
VaultAI – 42 AI Models on a Portable SSD, Works Offline for $399 20 February 2026
I Stopped Paying for ChatGPT and Built a Private AI Setup That Anyone Can Run 20 February 2026
The Path to Ubiquitous AI (17k tokens/sec) 20 February 2026
Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge 20 February 2026
AI Integration in Sublime Text: Practical Local LLM Editor Enhancement 19 February 2026
Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses 19 February 2026
Running Local LLMs and VLMs on Arduino UNO Q with yzma 19 February 2026
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs 19 February 2026
Tailscale Releases New Tool to Prevent Sensitive Data Leakage to Cloud AI Services 18 February 2026
GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs 18 February 2026
Qwen3-Next 80B MoE Achieves 39 Tokens/Second on RTX 5070/5060 Ti Dual-GPU Setup 17 February 2026
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release 16 February 2026
NVIDIA's Dynamic Memory Sparsification Cuts LLM Inference Costs by 8x 14 February 2026
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment 14 February 2026
Optimal llama.cpp Settings Found for Qwen3 Coder Next Loop Issues 13 February 2026
MiniMax M2.5: 230B Parameter MoE Model Coming to HuggingFace 13 February 2026
Ming-flash-omni-2.0: 100B MoE Omni-Modal Model Released 13 February 2026
The Future of AI Slop Is Constraints - Implications for Local Models 13 February 2026
NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics 11 February 2026
Energy-Based Models Compared Against Frontier AI for Sudoku Solving 11 February 2026