Gemma 4 31B IT Assistant — MTP GGUF

This repository contains GGUF quantizations of google/gemma-4-31B-it-assistant to use as MTP.

Do not use with llama.cpp as they do not offer support. This model is only compatible with ik_llama.cpp.

Use only with PR 1744, which enables the gemma 4 MTP to work, otherwise you will encounter errors such as the model failing to load.

Usage

The assistant is a draft model that requires the main Gemma 4 31B target GGUF alongside it.

./build/bin/llama-server \
  --model google_gemma-4-31B-it-Q8_0.gguf \
  --ctx-size 32768 -ctk q8_0 -ctv q8_0 --n-gpu-layers 99 \
  -b 1024 -ub 1024 --jinja \
  --spec-type mtp -md gemma-4-31B-it-assistant-Q8_0.gguf -ngld 99 \
  --draft-max 3 --draft-p-min 0.0

Note: --draft-max 3 is a good starting point.

Downloads last month
5,656
GGUF
Model size
0.5B params
Architecture
gemma4_mtp
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Radamanthys11/Gemma-4-31B-it-assistant-GGUF

Quantized
(3)
this model