Gemma 4 31B IT Assistant — MTP GGUF
This repository contains GGUF quantizations of google/gemma-4-31B-it-assistant to use as MTP.
Do not use with llama.cpp as they do not offer support. This model is only compatible with ik_llama.cpp.
Use only with PR 1744, which enables the gemma 4 MTP to work, otherwise you will encounter errors such as the model failing to load.
Usage
The assistant is a draft model that requires the main Gemma 4 31B target GGUF alongside it.
./build/bin/llama-server \
--model google_gemma-4-31B-it-Q8_0.gguf \
--ctx-size 32768 -ctk q8_0 -ctv q8_0 --n-gpu-layers 99 \
-b 1024 -ub 1024 --jinja \
--spec-type mtp -md gemma-4-31B-it-assistant-Q8_0.gguf -ngld 99 \
--draft-max 3 --draft-p-min 0.0
Note:
--draft-max 3is a good starting point.
- Downloads last month
- 5,656
Hardware compatibility
Log In to add your hardware
8-bit
16-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Radamanthys11/Gemma-4-31B-it-assistant-GGUF
Base model
google/gemma-4-31B-it-assistant