sangwulee's comments

sangwulee | 8 months ago | on: Releasing weights for FLUX.1 Krea

The highest quality finetuning data was hand curated internally. I would say our post training pipeline is quite similar to SeedDream 2.0 ~ 3.0 series from ByteDance. Similar to them, we use extensive quality filters and internal models to get the highest quality possible. Even from there, we still hand curate a hand-picked subset.

sangwulee | 8 months ago | on: Releasing weights for FLUX.1 Krea

I actually tried a few experiments in early exploration stages! I trained a small classifier to judge AI vs non-AI images. Use it as a reward model to do small RL / post training experiments. Sadly, was not too successful. We found that directly finetuning the model on high quality photorealistic image was most reliable.

Another note about preference optimisation and RL is that it has really high quality ceiling but needs to be very carefully tuned. It's easy to get perfect anatomy and structure if you decide to completely "collapse" the model. For instance, ChatGPT images are collapsed to have slight yellow color palette. FLUX images always have this glossy, plastic texture with overly blurry background. It's similar to reward hacking behavior you see in LLMs where they sound overly nice and chatty.

I had to make a few compromises to balance between "stable, collapsed, boring model" and "unstable, diverse, explorative" model.

sangwulee | 8 months ago | on: Releasing weights for FLUX.1 Krea

The architecture is the same so we found that some LoRAs work out-of-the box, but some LoRAs don't. In those cases, I would expect people to re-run their LoRA finetuning with the trainer they've used.

sangwulee | 8 months ago | on: Releasing weights for FLUX.1 Krea

We used two types of datasets for post-training. Supervised finetuning data and preference data used for RLHF stage. You can actually use less than < 1M samples to significantly boost the aesthetics. Quality matters A LOT. Quantity helps with generalisation and stability of the checkpoints though.

sangwulee | 8 months ago | on: Releasing weights for FLUX.1 Krea

We have not added a separate RTX accelerated version for FLUX.1 Krea, but the model is fully compatible with existing FLUX.1 dev codebase. I don't think we made a separate onnx export for it though. Doing 4~8 bit quantized version with SVDQuant would be a nice follow up so that the checkpoint is more friendly for consumer grade hardware.

sangwulee | 8 months ago | on: Releasing weights for FLUX.1 Krea

FLUX.1 is one of the most popular open weights text-to-image models. We distilled Krea-1 to FLUX.1 [dev] model so that the community can adopt it seamlessly into existing ecosystem. Any finetuning code, workflows, etc that was built on top of FLUX.1 [dev] can be reused with our model :)

sangwulee | 8 months ago | on: Releasing weights for FLUX.1 Krea

Quick napkin math assuming bfloat16 format : 1B * 16 bits = 16B bits = 2GB. Since it's a 12B parameter model, you get around ~24GB. Downcasting to bfloat16 from float32 comes with pretty minimal performance degradation, so we uploaded the weights in bfloat16 format.

sangwulee | 8 months ago | on: Releasing weights for FLUX.1 Krea

I love owls. Photorealism was one of the focus areas for training because "AI look" (e.g. plastic skin) was biggest complaint for FLUX.1 model series. Photorealism was achieved with both careful curation of finetuning and preference dataset.

sangwulee | 8 months ago | on: Releasing weights for FLUX.1 Krea

Thank you! Glad you find it helpful. The model is focused on photorealism so it should be able to generate most realistic scenes. Although, I think using 3D engines would be more suitable for typical cases for robotics training since it gives you ground truth data on objects, location, etc.

One interesting use case would be if you are focusing on a robotics task that would require perception of realistic scenes.

sangwulee | 8 months ago | on: FLUX.1 Krea [Dev]: An 'Opinionated' Text-to-Image Model

Hi there, I'm Sangwu Lee, one of the researchers behind this model. I'm happy to answer any questions here.

---

I also commented in this other submission: https://news.ycombinator.com/item?id=44748056

sangwulee | 8 months ago | on: FLUX.1 Krea: post-trained text-to-image model from Black Forest Labs and Krea

Hello HackerNews. My name is Sangwu Lee . I work for Krea and I led the research efforts around the post-training for this model. I'll try to answer any questions you may have, but I recommend you read the technical report I wrote on our site (https://www.krea.ai/blog/flux-krea-open-source-release).

I also see that my colleagues already commented here, but I'll try to answer questions you may have.

sangwulee | 8 months ago | on: Releasing weights for FLUX.1 Krea

Hi! I'm lead researcher on Krea-1. FLUX.1 Krea is a 12B rectified flow model distilled from Krea-1, designed to be compatible with FLUX architecture. Happy to answer any technical questions :)