Tagged "consumer-gpu"
-
Qwen3.5-35B-A3B Emerges as Game-Changer for Agentic Coding Tasks
-
Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment
-
PyTorch Foundation Announces New Members as Agentic AI Demand Grows
-
Show HN: Pluckr – LLM-Powered HTML Scraper That Caches Selectors and Auto-Heals
-
Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices
-
Show HN: 100% LLM Accuracy–No Fine-Tuning, JSON Only
-
Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods
-
How AI is Redefining Price and Performance in Modern Laptops
-
Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers
-
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
-
South Korea to Launch $687 Million Project to Develop On-Device AI Semiconductors
-
Qwen3's Voice Embeddings Enable Local Voice Cloning and Mathematical Voice Manipulation
-
Custom Portable Workstation Optimized for Local AI Inference Builds
-
Open-Source Framework Achieves Gemini 3 Deep Think Level Performance Through Local Model Scaffolding
-
Nvidia Could Launch Its First Laptops With Its Own Processors
-
Local GPT-OSS 20B Model Demonstrates Practical Agentic Capabilities
-
A Tool to Tell You What LLMs Can Run on Your Machine
-
Open-Source llama.cpp Finds Long-Term Home at Hugging Face
-
GPT-OSS 20B Demonstrates Practical Agentic Capabilities Running Fully Locally
-
GLM-5 Becomes Top Open-Weights Model on Extended NYT Connections Benchmark
-
Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search
-
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
-
Yet Another Fix Coming for Older AMD GPUs on Linux – Thanks to Valve Developer
-
Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization
-
O-TITANS: Orthogonal LoRA Framework for Gemma 3 with Google TITANS Memory Architecture
-
At India AI Impact Summit, Intel Showcases AI PCs and Cost-Efficient Frugal AI
-
Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder
-
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels
-
[Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally
-
At India AI Impact Summit, Intel Showcases Its AI PCs and Cost-Efficient Frugal AI
-
GGML.AI Acquired by Hugging Face
-
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
-
LayerScale Launches Inference Engine Faster Than vLLM, SGLang, and TRT-LLM
-
Kitten TTS V0.8 Released: State-of-the-Art Super-Tiny Text-to-Speech Model Under 25MB
-
Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference
-
Qwen3-Next 80B MoE Achieves 39 Tokens/Second on RTX 5070/5060 Ti Dual-GPU Setup
-
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
-
High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference
-
Cohere Releases Tiny Aya: Efficient 3.3B Multilingual Model for 70+ Languages
-
ASUS Zenbook 14 Launches in India with AI-Capable Hardware, Starting at Rs 1,15,990
-
Ask HN: What is the best bang for buck budget AI coding?
-
GPU-Accelerated DataFrame Library for Local Inference Workloads
-
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release
-
Samsung's REAM: Alternative Model Compression Technique
-
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
-
GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks
-
I Tried a Claude Code Rival That's Local, Open Source, and Completely Free
-
NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics
-
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts
-
Carmack Proposes Using Long Fiber Lines as L2 Cache for Streaming AI Data
-
Community Member Builds 144GB VRAM Local LLM Powerhouse