Replacing Claude Code with a local LLM for 20 devs — has anyone actually pulled this off?

for our 20 developers, we’re looking into hardware with 2x RTX PRO 6000, and MiniMax M2.5 llm— can this actually serve our real software engineering team?

We’re scoping a self-hosted setup at a manufacturing tech company to replace ~€440k/year in Claude Code spend across 20 software engineers. Plan is 1 workstation Threadripper (24 cores) with 2x RTX PRO 6000 Blackwell Max-Q each (192GB VRAM per box), running MiniMax M2.5 INT4 AWQ via vLLM, with lite LLM routing for the hard requests to Claude Opus 4.7 API.

Target: match Opus 4.6 / GPT-5.3-Codex quality (~80% SWE-bench) on the routine work, fine-tune on our codebase for the Viscon-specific stuff, keep cloud fallback for the genuinely hard problems.

Before we commit ~€40k all-in on hardware: has anyone here actually run a local coding stack for 15-20+ concurrent developers in production? Specifically interested in:

  • Real concurrency numbers on PRO 6000 Blackwell with MiniMax M2.5 (not single-stream benchmarks)

  • Whether developers actually adopted it or quietly went back to cloud

  • KV cache / context length tradeoffs at peak load

  • Routing logic that worked vs fell apart in practice

  • What broke that you didn’t see coming

War stories welcome — including the ones where it failed. Would rather hear that now than after buying .

Replacing Claude Code with a fully local setup sounds great in theory but becomes complex in real workflows. From what I’ve seen, most practical setups still rely on a hybrid approach where local models handle routine coding tasks while stronger models are used for planning and edge cases. The biggest challenges usually come from context management, tool-calling reliability, and maintaining consistent performance across different tasks. Local coding agents are improving fast, but there’s still a noticeable gap when it comes to full autonomy in real development environments. A similar discussion on this topic can be seen here: https://huggingface.co/proxy/discuss.huggingface.co/t/top-local-ai-models-gguf-for-lawdefiner complete-web-app-development-no-coding-for-2026/174336

Hi,

You seem like you spend quite a lot of money on hardware a month,

I have built a KV-Cache engine that cuts VRAM usage by 80% — Nesion makes your LLM run with up to 80% reduction of it’s costs on GPU, so it runs faster, smoother and way more cheap than when not using Nesion

try it for free at: https://nesion.net