Models

531

Full-text search

Active filters: rlhf

SUSTech-NLP/UniRRM-8B

Text Generation • 8B • Updated about 23 hours ago • 26 • 2

WisdomShell/RewardAnything-8B-v1

Text Generation • 8B • Updated Jun 5, 2025 • 54 • • 23

Schrieffer/Llama-SARM-4B

Reinforcement Learning • 5B • Updated Dec 11, 2025 • 24 • 2

mradermacher/ATLAS-8B-Thinking-GGUF

Reinforcement Learning • 8B • Updated Sep 13, 2025 • 262 • 2

Schrieffer/Llama-SARM-4B-PostSAEPretrain

Feature Extraction • 5B • Updated Dec 11, 2025 • 12 • 2

sileod/deberta-v3-base-tasksource-nli

Zero-Shot Classification • 0.2B • Updated Aug 13, 2024 • 8.16k • • 133

stanfordnlp/SteamSHP-flan-t5-xl

Updated Oct 10, 2023 • 12 • 43

stanfordnlp/SteamSHP-flan-t5-large

Updated Oct 10, 2023 • 29 • 33

trl-lib/llama-7b-se-peft

Updated Apr 6, 2023 • 4

sileod/deberta-v3-large-tasksource-nli

Zero-Shot Classification • 0.4B • Updated Feb 17, 2024 • 981 • 40

sileod/deberta-v3-large-tasksource-rlhf-reward-model

Text Classification • Updated Mar 28, 2023 • 537 • 11

trl-lib/llama-7b-se-rl-peft

Updated Apr 14, 2023 • 103

trl-lib/llama-7b-se-rm-peft

Updated Apr 6, 2023 • 8

toloka/gpt2-large-rl-prompt-writing

Text Generation • 0.8B • Updated Apr 21, 2023 • 17 • 3

AdamG012/chat-opt-1.3b-rlhf-actor-deepspeed

Text Generation • Updated Apr 25, 2023 • 20 • 5

AdamG012/chat-opt-1.3b-rlhf-critic-deepspeed

Text Generation • Updated Apr 25, 2023 • 12 • 3

AdamG012/chat-opt-1.3b-rlhf-actor-ema-deepspeed

Text Generation • Updated Apr 25, 2023 • 11 • 8

sileod/mdeberta-v3-base-tasksource-nli

Zero-Shot Classification • 0.3B • Updated Oct 19, 2023 • 57 • 18

agi-css/socially-good-lm

Text Generation • Updated May 29, 2023 • 16 • 5

agi-css/hh-rlhf-sft

Text Generation • Updated Jun 1, 2023 • 16 • 3

agi-css/better-base

Text Generation • Updated Jun 1, 2023 • 15 • 6

argilla/roberta-base-reward-model-falcon-dolly

Text Classification • Updated Jun 16, 2023 • 36 • 4

merve/peft-copy-test

Text Generation • Updated Jun 14, 2023 • 6

PKU-Alignment/beaver-7b-v1.0

Reinforcement Learning • 7B • Updated May 9, 2024 • 31 • 13

lyogavin/Anima33B-DPO-Belle-1k

Text Generation • Updated Jul 2, 2023 • 1

lyogavin/Anima33B-DPO-Belle-1k-merged

Text Generation • Updated Jul 2, 2023 • 18 • 12

PKU-Alignment/beaver-7b-v1.0-reward

Reinforcement Learning • 7B • Updated Apr 20, 2024 • 1.07k • 17

PKU-Alignment/beaver-dam-7b

Updated Jul 10, 2023 • 7.76k • 17

PKU-Alignment/beaver-7b-v1.0-cost

Reinforcement Learning • 7B • Updated Apr 20, 2024 • 1.24k • 10

Ablustrund/moss-rlhf-reward-model-7B-zh

Updated Jul 13, 2023 • 1 • 23