tinker-rl-frontier_gsm8k_nemotron-120b-nemotron-120b
LoRA adapters trained with GRPO on top of nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 using the
Tinker cloud training service.
Part of the TinkerRL-Bench release for our NeurIPS submission
"A Unified Benchmark for RL Post-Training of Language Models"
(repo).
Training configuration
|
|
| Base model |
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 |
| Experiment tag |
frontier_gsm8k_nemotron-120b |
| Campaign |
None |
| Task |
gsm8k |
| Seed |
42 |
| LoRA rank |
16 |
| Learning rate |
1e-05 |
| Group size |
4 |
| Training steps |
20 |
| Platform |
Tinker (tinker) |
| Training run ID |
657a920a-9e74-55d2-9354-71a6ec2f1f61 |
Metrics
| Metric |
Value |
| First-5 reward avg |
0.175 |
| Last-10 reward avg |
0.1625 |
| Peak reward |
0.875 |
| Peak accuracy |
0.875 |
| Last-10 accuracy |
0.1625 |
Checkpoints in this repo
How to load
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
adapter = "arvindcr4/tinker-rl-frontier_gsm8k_nemotron-120b-nemotron-120b"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter, subfolder="final")
Companion releases
Citation
@misc{tinkerrlbench2026,
title = {A Unified Benchmark for RL Post-Training of Language Models},
author = {Arvind, C. R. and Jeyaraj, Sandhya},
year = {2026},
note = {NeurIPS submission, https://github.com/pes-llm-research/tinker-rl-lab}
}
License
Apache 2.0. The underlying base model retains its original license —
please check nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 for any usage restrictions.