Tagged "model-optimization"
-
LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language
-
Qwen 3.5 Models: Optimal Settings and Reduced Overthinking Configuration
-
How to Build a Self-Hosted AI Server with LM Studio: Step-by-Step Guide
-
Alibaba Commits to Continuous Open-Sourcing of Qwen and Wan Models
-
Powerful AI Search Engine Built on Single GeForce RTX 5090
-
Llama 8B Matches 70B Performance on Multi-Hop QA Using Structured Prompting
-
Ultra-Compact 28M Parameter Models Show Promise for Specialized Domain Tasks
-
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options
-
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models
-
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor
-
Cursor's Composer 2 Model Analysis – Fine-Tuned Variant of Kimi K2.5
-
AI's Impact on Mathematics Analogous to Car's Impact on Cities
-
MiniMax-M2.7: New Compact Model Announced for Local Deployment
-
Run LLMs Locally with Llama.cpp
-
Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks
-
Local Qwen Models Master Browser Automation Through Iterative Replanning
-
Researcher Discovers Universal "Danger Zone" in Transformer Model Architecture at 50% Depth
-
OpenClaw Isn't the Only Raspberry Pi AI Tool—Here Are 4 Others You Can Try This Week
-
Practical Fix for Qwen 3.5 Overthinking in llama.cpp
-
Nota Added to Three Technology and Growth ETFs in a Row – Market Recognition for AI Efficiency
-
StepFun Releases SFT Dataset Used to Train Step 3.5 Flash for Community Fine-Tuning
-
India's Mobile-First AI Strategy Could Accelerate Local Inference Adoption in Emerging Markets
-
Local LLMs on Apple Silicon Mac 2026: M1 M2 M3 Guide
-
Qwodel – An Open-Source Unified Pipeline for LLM Quantization
-
Quantization Explained: Q4_K_M vs AWQ vs FP16 for Local LLMs
-
Nvidia Releases Nemotron 3 Super: 120B MoE Model for Local Deployment
-
The $1,500 Local AI Setup: DeepSeek-R1 on Consumer Hardware
-
Show HN: Detect When an LLM Silently Changes Behavior for the Same Prompt
-
Sarvam Open-Sources 30B and 105B Reasoning Models
-
Simple Layer Duplication Technique Achieves Top Open LLM Leaderboard Performance
-
Fine-Tuned Qwen SLMs (0.6–8B) Demonstrate Competitive Performance Against Frontier LLMs on Specialized Tasks
-
Qwen 3.5 Small Expands On-Device AI to Phones and IoT with Offline Support
-
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models
-
Nota AI to Showcase End-to-End On-Device AI Optimization at Embedded World 2026
-
Nemotron 9B Powers Large-Scale Local Inference: Patent Classification and Real-Time Applications
-
How to Run Your Own Local LLM — 2026 Edition
-
Snapdragon Wear Elite Unveiled at MWC 2026, Advancing Wearable AI Inference
-
Samsung Opens Registration for Vision AI QLED and OLED Television Integration
-
Student Researcher Achieves 42x Model Compression Through Novel Architecture
-
Windows 11 Notepad Gets On-Device AI Text Generation Without Subscription
-
Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support
-
The Emerging Role of SRAM-Centric Chips in AI Inference
-
Building PyTorch-Native Support for IBM Spyre Accelerator
-
Unity Showcases Manufacturing AI Workflow at Smart Factory Expo
-
Kakao Launches Kanana AI for On-Device Schedule and Recommendation Management
-
Qwen 3.5-27B Q4 Quantization Comparison and Analysis
-
Qualcomm Snapdragon Wear Elite Brings On-Device AI to Smartwatches
-
On-Device AI Laptop Lineups Become Standard Across Major Manufacturers
-
Qualcomm Snapdragon Wear Elite: 2B Parameter NPU for Personal AI Wearables
-
Apple M4 iPad Air Targets AI Users with Double M1 Speed Performance
-
Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17
-
Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference
-
Jan Releases Code-Tuned 4B Model for Efficient Local Code Generation and Development Tasks
-
Change Intent Records: The Missing Artifact in AI-Assisted Development
-
Switch Qwen 3.5 Thinking Mode On/Off Without Model Reload Using setParamsByID
-
Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models
-
AI-Native Store Research
-
Unsloth Dynamic 2.0 GGUFs
-
Qwen3.5-35B Successfully Runs on Raspberry Pi 5 at 3+ Tokens/Second
-
Qwen 3.5-27B Demonstrates Exceptional Performance with Thoughtful Prompt Engineering
-
Meta Reveals AI-Packed Smartwatch In 2026 – Why Wearables Shift Now
-
LLmFit: Terminal Tool for Right-Sizing LLM Models to Your Hardware
-
Galaxy S26 Debuts AI-Powered Scam Detection in Bold Security Push
-
Arduino, Qualcomm Bring On-Device AI and Robotics Learning to Indian School Systems
-
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot
-
Show HN: Caret – Tab to Complete at Any App on Your Mac
-
Arduino, Qualcomm Bring On-Device AI and Robotics Learning to Indian School Systems
-
Arduino and Qualcomm Bring On-Device AI Learning to Indian Schools
-
Running LLMs on Raspberry Pi and Edge Devices: A Practical Guide
-
Qwen3.5 122B Achieves 25 tok/s on 72GB VRAM Setup
-
DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference
-
Apple: Python bindings for access to the on-device Apple Intelligence model
-
Agent System – 7 specialized AI agents that plan, build, verify, and ship code
-
Show HN: 100% LLM Accuracy–No Fine-Tuning, JSON Only
-
What Breaks When AI Agent Frameworks Are Forced Into <1MB RAM and Sub-ms Startup
-
Enhanced Interface Speed Enables High-Performance On-Device AI Features in Smartphones
-
Show HN: Dypai – Build Backends from Your IDE Using AI and MCP
-
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
-
Open-Source Framework Achieves Gemini 3 Deep Think Level Performance Through Local Model Scaffolding
-
Local GPT-OSS 20B Model Demonstrates Practical Agentic Capabilities
-
Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search
-
How Slow Local LLMs Are on My Framework 13 AMD Strix Point
-
AI PCs Explained: 7 Critical Truths About NPUs and Privacy
-
[Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally
-
I Thought I Needed a GPU to Run AI Until I Learned About These Models
-
VaultAI – 42 AI Models on a Portable SSD, Works Offline for $399
-
I Stopped Paying for ChatGPT and Built a Private AI Setup That Anyone Can Run
-
The Path to Ubiquitous AI (17k tokens/sec)
-
Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge
-
AI Integration in Sublime Text: Practical Local LLM Editor Enhancement
-
Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses
-
Running Local LLMs and VLMs on Arduino UNO Q with yzma
-
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
-
Tailscale Releases New Tool to Prevent Sensitive Data Leakage to Cloud AI Services
-
GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs
-
Qwen3-Next 80B MoE Achieves 39 Tokens/Second on RTX 5070/5060 Ti Dual-GPU Setup
-
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release
-
NVIDIA's Dynamic Memory Sparsification Cuts LLM Inference Costs by 8x
-
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment
-
Optimal llama.cpp Settings Found for Qwen3 Coder Next Loop Issues
-
MiniMax M2.5: 230B Parameter MoE Model Coming to HuggingFace
-
Ming-flash-omni-2.0: 100B MoE Omni-Modal Model Released
-
The Future of AI Slop Is Constraints - Implications for Local Models
-
NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics
-
Energy-Based Models Compared Against Frontier AI for Sudoku Solving