(no title)
saiojd | 2 years ago
EDIT: If I understand correctly these libraries target deployment performance, while torch.compile is also/mostly for training performance?
saiojd | 2 years ago
EDIT: If I understand correctly these libraries target deployment performance, while torch.compile is also/mostly for training performance?
brucethemoose2|2 years ago
- Torch 2.0 only supports static inputs. In actual usage scenarios, this means frequent lengthy recompiles.
- Eventually, these recompiles will overload the compilation cache and torch.compile will stop functioning.
- Some common augmentations (like TomeSD) break compilation, force recompiles, make compilation take forever, or kill the performance gains.
- There are othdr miscellaneous bugs, like compilation freezing the Python thread and causing networking timeouts in web UIs, or errors with embeddings.
- Dynamic input in Torch 2.1 nightly fixes many of these issues, but was only maybe working a week ago? See https://github.com/pytorch/pytorch/issues/101228#issuecommen...
- TVM and AITemplate have massive performance gains. ~2x or more for AIT, not sure about an exact number for TVM.
- AIT supported dynamic input before torch.compile did, and requires no recompilation after the initial compile. Also, weights (models and LORAs) can be swapped out without a recompile.
- TVM supports very performant Vulkan inference, which would massively expand hardware compatibility.
Note that the popular SD Web UIs don't support any of this, with two exceptions I know of: VoltaML (with WIP AIT support) and the Windows DirectML fork of A1111 (which uses optimized ONNX models, I think). There is about 0% chance of ML compilation support in A1111, and the HF diffusers UIs are less bleeding edge and performance/compatibility focused.
And yes, triton torch.compile is aimed at training. There is an alternative backend (Hidet) that explicitly targets inference, but it does not work with Stable Diffusion yet.
saiojd|2 years ago