Canada Quant Labs

Canada's open-weight model lab.

We train, quantize, and deploy sovereign AI models on Canadian Blackwell silicon — for the regulated industries that can't run on someone else's API.

What we do

Where we work

Upstream

Partnerships · partnerships@cql.ca Press · press@cql.ca Web · cql.ca


Open releases — DeepSeek-V4 quantization family

Four artifacts in the same lineage. One base model in two sizes (V4-Flash, V4-Pro); two routed-expert formats (W4A16, NVFP4); Multi-Token Prediction (MTP) draft head retained on three of four. Attention is FP8 block 128×128 across all four.

Model Base Routed experts MTP On-disk Min hardware (TP=2) When to pick
DeepSeek-V4-Flash-W4A16-FP8 V4-Flash W4A16 INT4 g=128 no ~143 GB H200 / DGX Spark / RTX PRO 6000 maximum compatibility, no MTP needed
DeepSeek-V4-Flash-W4A16-FP8-MTP V4-Flash W4A16 INT4 g=128 yes (BF16) 159 GB H200 / RTX PRO 6000 best $/token interactive on V4-Flash
DeepSeek-V4-Flash-NVFP4-FP8-MTP V4-Flash NVFP4 g=16 yes (BF16) 172 GB RTX PRO 6000 / B300 best Blackwell-native interactive on V4-Flash
DeepSeek-V4-Pro-NVFP4-FP8-MTP V4-Pro NVFP4 g=16 yes (byte-identical) 913 GiB 8× B300 (TP=8 + EP) only choice for V4-Pro deployment; +25–37% throughput vs upstream MXFP4

Upstream reference recipes: RedHatAI/DeepSeek-V4-Flash-NVFP4-FP8 (Flash NVFP4 topology) and nvidia/DeepSeek-V3.2-NVFP4 (Pro NVFP4, MTP-exclusion topology).

Hardware shorthand

Reproduction repos

Every artifact has a public reproduction repo with calibration scripts, vLLM patches, bench harnesses, and findings docs:

Upstream contributions filed during this work