How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kshitijthakkar/qwen3.5-moe-4.7B-d4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kshitijthakkar/qwen3.5-moe-4.7B-d4B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'
Use Docker
docker model run hf.co/kshitijthakkar/qwen3.5-moe-4.7B-d4B
Quick Links

Qwen3.5 MoE 4.54B (from Qwen3.5-4B)

A Qwen3.5 Mixture-of-Experts model created via dual-source weight transfer:

Model Details

Property Value
Total Parameters 4,540,002,816 (4.54B)
Active Parameters 3,030,053,376 (3.03B)
Architecture Qwen3.5 Hybrid MoE
Experts 8 routed + 1 shared, top-2
Hidden Size 2560
Layers 32 (hybrid: DeltaNet + full attention)
Attention GQA 16Q / 4KV, head_dim=256
Context 262,144 tokens
Vocab 248,320
Dtype bfloat16

Design

Total MoE FFN parameters are approximately equal to the dense model's FFN parameters. The speed benefit comes from sparsity: only top-2 experts

  • shared expert are active per token (~1/3 of total FFN).

Most weights are pre-trained (backbone from dense model, experts from 35B-A3B). Only the MoE dimension resize introduces noise, making this model suitable for fine-tuning at nominal cost.

Weight Transfer Sources

Component Source Strategy
Embeddings, LM Head Qwen/Qwen3.5-4B Exact copy
Attention (Q/K/V/O, norms) Qwen/Qwen3.5-4B Exact copy
DeltaNet (linear attention) Qwen/Qwen3.5-4B Exact copy
Vision encoder Qwen/Qwen3.5-4B Exact copy
Layer norms Qwen/Qwen3.5-4B Exact copy
Routed experts Qwen3.5-35B-A3B Slice 256->8, bilinear resize
Shared expert Qwen3.5-35B-A3B Bilinear resize
Router Qwen3.5-35B-A3B Slice + resize

License

Apache 2.0 (following source models)

Downloads last month
37
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including kshitijthakkar/qwen3.5-moe-4.7B-d4B