Intel Arc Pro B70 32GB Achieves 12 Tokens/Sec on Qwen 3.5-27B

1 min read

After extensive debugging to enable vLLM support, practitioners have successfully benchmarked the Intel Arc Pro B70 32GB GPU running Qwen 3.5-27B at Q4 quantization, achieving approximately 12 tokens-per-second generation rates with both llama.cpp and llm-scaler-vllm. This represents a viable alternative to NVIDIA hardware for local inference workloads.

The Arc Pro B70 results are significant for practitioners seeking to diversify hardware options beyond traditional GPU incumbents. Intel's expanding GPGPU software support through vLLM integration demonstrates the ecosystem's maturation, allowing serious evaluation of Intel silicon for production deployments. With 32GB VRAM and competitive token generation speeds, the Arc Pro B70 offers an economically interesting alternative for mid-scale local inference.

These benchmarks encourage hardware diversity in the local LLM space, potentially reducing reliance on single-vendor solutions and creating competitive pressure that benefits the entire ecosystem.


Source: r/LocalLLaMA · Relevance: 8/10