top | item 36682593

(no title)

dewitt | 2 years ago

From the announcement:

"We're excited to share with you a new library called Keras Core, a preview version of the future of Keras. In Fall 2023, this library will become Keras 3.0. Keras Core is a full rewrite of the Keras codebase that rebases it on top of a modular backend architecture. It makes it possible to run Keras workflows on top of arbitrary frameworks — starting with TensorFlow, JAX, and PyTorch."

Excited about this one. Please let us know if you have any questions.

discuss

albertzeyer|2 years ago

That looks very interesting.

I actually have developed (and am developing) sth very similar, what we call the RETURNN frontend, a new frontend + new backends for our RETURNN framework. The new frontend is supporting very similar Python code to define models as you see in PyTorch or Keras, i.e. a core Tensor class, a base Module class you can derive, a Parameter class, and then a core functional API to perform all the computations. That supports multiple backends, currently mostly TensorFlow (graph-based) and PyTorch, but JAX was something I also planned. Some details here: https://github.com/rwth-i6/returnn/issues/1120

(Note that we went a bit further ahead and made named dimensions a core principle of the framework.)

(Example beam search implementation: https://github.com/rwth-i6/i6_experiments/blob/14b66c4dc74c0...)

One difficulty I found was how design the API in a way that works well both for eager-mode frameworks (PyTorch, TF eager-mode) and graph-based frameworks (TF graph-mode, JAX). That mostly involves everything where there is some state, or sth code which should not just execute in the inner training loop but e.g. for initialization only, or after each epoch, or whatever. So for example:

- Parameter initialization.

- Anything involving buffers, e.g. batch normalization.

- Other custom training loops? Or e.g. an outer loop and an inner loop (e.g. like GAN training)?

- How to implement sth like weight normalization? In PyTorch, the module.param is renamed, and then there is a pre-forward hook, which on-the-fly calculates module.param for each call for forward. So, just following the same logic for both eager-mode and graph-mode?

- How to deal with control flow context, accessing values outside the loop which came from inside, etc. Those things are naturally possible eager-mode, where you would get the most recent value, and where there is no real control flow context.

- Device logic: Have device defined explicitly for each tensor (like PyTorch), or automatically eagerly move tensors to the GPU (like TensorFlow)? Moving from one device to another (or CPU) is automatic or must be explicit?

- How to you allow easy interop, e.g. mixing torch.nn.Module and Keras layers?

I see that you have keras_core.callbacks.LambdaCallback which is maybe similar, but can you effectively update the logic of the module in there?