tinker-rl-frontier_gsm8k_nemotron-120b-nemotron-120b

LoRA adapters trained with GRPO on top of nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 using the Tinker cloud training service. Part of the TinkerRL-Bench release for our NeurIPS submission "A Unified Benchmark for RL Post-Training of Language Models" (repo).

Training configuration

Base model nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
Experiment tag frontier_gsm8k_nemotron-120b
Campaign None
Task gsm8k
Seed 42
LoRA rank 16
Learning rate 1e-05
Group size 4
Training steps 20
Platform Tinker (tinker)
Training run ID 657a920a-9e74-55d2-9354-71a6ec2f1f61

Metrics

Metric Value
First-5 reward avg 0.175
Last-10 reward avg 0.1625
Peak reward 0.875
Peak accuracy 0.875
Last-10 accuracy 0.1625

Checkpoints in this repo

Step Original Tinker URI Local path
sampler_weights/final tinker://657a920a-9e74-55d2-9354-71a6ec2f1f61:train:0/sampler_weights/final final

How to load

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
adapter = "arvindcr4/tinker-rl-frontier_gsm8k_nemotron-120b-nemotron-120b"

tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter, subfolder="final")  # or "<step>"

Companion releases

Citation

@misc{tinkerrlbench2026,
  title   = {A Unified Benchmark for RL Post-Training of Language Models},
  author  = {Arvind, C. R. and Jeyaraj, Sandhya},
  year    = {2026},
  note    = {NeurIPS submission, https://github.com/pes-llm-research/tinker-rl-lab}
}

License

Apache 2.0. The underlying base model retains its original license — please check nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 for any usage restrictions.

Downloads last month
-
Video Preview
loading

Model tree for arvindcr4/tinker-rl-frontier_gsm8k_nemotron-120b-nemotron-120b

Adapter
(4)
this model