top | item 44790365 (no title) zippothrowaway | 6 months ago You're probably going to have to wait a couple of days for 4 bit quantized versions to pop up. It's 20B parameters. discuss order hn newest pollinations|6 months ago # Configure NF4 quantization quant_config = PipelineQuantizationConfig( quant_backend="bitsandbytes_4bit", quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16}, components_to_quantize=["transformer", "text_encoder"], ) # Load the pipeline with NF4 quantization pipe = DiffusionPipeline.from_pretrained( model_name, quantization_config=quant_config, torch_dtype=torch.bfloat16, use_safetensors=True, low_cpu_mem_usage=True ).to(device) seems to use 17gb of vram like thisupdate: doesn't work well. this approach seems to be recommended: https://github.com/QwenLM/Qwen-Image/pull/6/files
pollinations|6 months ago # Configure NF4 quantization quant_config = PipelineQuantizationConfig( quant_backend="bitsandbytes_4bit", quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16}, components_to_quantize=["transformer", "text_encoder"], ) # Load the pipeline with NF4 quantization pipe = DiffusionPipeline.from_pretrained( model_name, quantization_config=quant_config, torch_dtype=torch.bfloat16, use_safetensors=True, low_cpu_mem_usage=True ).to(device) seems to use 17gb of vram like thisupdate: doesn't work well. this approach seems to be recommended: https://github.com/QwenLM/Qwen-Image/pull/6/files
pollinations|6 months ago
update: doesn't work well. this approach seems to be recommended: https://github.com/QwenLM/Qwen-Image/pull/6/files