for our 20 developers, we’re looking into hardware with 2x RTX PRO 6000, and MiniMax M2.5 llm— can this actually serve our real software engineering team?
We’re scoping a self-hosted setup at a manufacturing tech company to replace ~€440k/year in Claude Code spend across 20 software engineers. Plan is 1 workstation Threadripper (24 cores) with 2x RTX PRO 6000 Blackwell Max-Q each (192GB VRAM per box), running MiniMax M2.5 INT4 AWQ via vLLM, with lite LLM routing for the hard requests to Claude Opus 4.7 API.
Target: match Opus 4.6 / GPT-5.3-Codex quality (~80% SWE-bench) on the routine work, fine-tune on our codebase for the Viscon-specific stuff, keep cloud fallback for the genuinely hard problems.
Before we commit ~€40k all-in on hardware: has anyone here actually run a local coding stack for 15-20+ concurrent developers in production? Specifically interested in:
-
Real concurrency numbers on PRO 6000 Blackwell with MiniMax M2.5 (not single-stream benchmarks)
-
Whether developers actually adopted it or quietly went back to cloud
-
KV cache / context length tradeoffs at peak load
-
Routing logic that worked vs fell apart in practice
-
What broke that you didn’t see coming
War stories welcome — including the ones where it failed. Would rather hear that now than after buying .