top | item 44790365

(no title)

You're probably going to have to wait a couple of days for 4 bit quantized versions to pop up. It's 20B parameters.

discuss

   # Configure NF4 quantization
   quant_config = PipelineQuantizationConfig(
       quant_backend="bitsandbytes_4bit",
       quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16},
       components_to_quantize=["transformer", "text_encoder"],
   )

   # Load the pipeline with NF4 quantization
   pipe = DiffusionPipeline.from_pretrained(
       model_name,
       quantization_config=quant_config,
       torch_dtype=torch.bfloat16,
       use_safetensors=True,
       low_cpu_mem_usage=True
   ).to(device)

seems to use 17gb of vram like this

update: doesn't work well. this approach seems to be recommended: https://github.com/QwenLM/Qwen-Image/pull/6/files