Replacing Claude Code with a local LLM for 20 devs — has anyone actually pulled this off?

rsatX12 · April 27, 2026, 12:34pm

for our 20 developers, we’re looking into hardware with 2x RTX PRO 6000, and MiniMax M2.5 llm— can this actually serve our real software engineering team?

We’re scoping a self-hosted setup at a manufacturing tech company to replace ~€440k/year in Claude Code spend across 20 software engineers. Plan is 1 workstation Threadripper (24 cores) with 2x RTX PRO 6000 Blackwell Max-Q each (192GB VRAM per box), running MiniMax M2.5 INT4 AWQ via vLLM, with lite LLM routing for the hard requests to Claude Opus 4.7 API.

Target: match Opus 4.6 / GPT-5.3-Codex quality (~80% SWE-bench) on the routine work, fine-tune on our codebase for the Viscon-specific stuff, keep cloud fallback for the genuinely hard problems.

Before we commit ~€40k all-in on hardware: has anyone here actually run a local coding stack for 15-20+ concurrent developers in production? Specifically interested in:

Real concurrency numbers on PRO 6000 Blackwell with MiniMax M2.5 (not single-stream benchmarks)
Whether developers actually adopted it or quietly went back to cloud
KV cache / context length tradeoffs at peak load
Routing logic that worked vs fell apart in practice
What broke that you didn’t see coming

War stories welcome — including the ones where it failed. Would rather hear that now than after buying .

matthewjhones · May 4, 2026, 6:35am

Replacing Claude Code with a fully local setup sounds great in theory but becomes complex in real workflows. From what I’ve seen, most practical setups still rely on a hybrid approach where local models handle routine coding tasks while stronger models are used for planning and edge cases. The biggest challenges usually come from context management, tool-calling reliability, and maintaining consistent performance across different tasks. Local coding agents are improving fast, but there’s still a noticeable gap when it comes to full autonomy in real development environments. A similar discussion on this topic can be seen here: https://huggingface.co/proxy/discuss.huggingface.co/t/top-local-ai-models-gguf-for-lawdefiner complete-web-app-development-no-coding-for-2026/174336

CTRAVIESO007 · May 5, 2026, 6:55pm

Hi,

You seem like you spend quite a lot of money on hardware a month,

I have built a KV-Cache engine that cuts VRAM usage by 80% — Nesion makes your LLM run with up to 80% reduction of it’s costs on GPU, so it runs faster, smoother and way more cheap than when not using Nesion

try it for free at: https://nesion.net

Topic		Replies	Views
Could Qwen Be the Best Alternative to Claude Code? Beginners	4	4217	August 5, 2025
Local LLM and ML platform with RTX 5090 GPU Show and Tell	5	3101	September 19, 2025
TOP local AI models (gguf) for complete web app development (no coding) for 2026? Models	2	547	March 17, 2026
BUYING ADVICE for local LLM machine Beginners	10	12171	March 10, 2026
[Hiring] Senior Engineer: Local LLMs, llama.cpp, RAG (NVIDIA, G-Assist) Community Calls	2	29	April 9, 2026

Replacing Claude Code with a local LLM for 20 devs — has anyone actually pulled this off?

Related topics