gallabytes's comments

gallabytes | 1 year ago | on: Maxtext: A simple, performant and scalable Jax LLM

> Some of this complexity may be necessary for achieving optimal performance in Jax. E.g. extra indirection to avoid the compiler making some bad fusion decision, or multiple calls so something can be marked as static for the jit in the outer call

certainly some of it is but not the lion's share - I have a much simpler (private) codebase which scales pretty similarly afaict.

the complexity of Maxtext feels more Serious Engineering ™ flavored, following Best Practices.

gallabytes | 2 years ago | on: Google “We have no moat, and neither does OpenAI”

I literally just don't feel like running them tbh, and see no reason to publish them either way. Mostly prefer to let the outputs speak for themselves.

For a while I was using an FID variant for evaluation during training, but didn't find it very helpful vs just looking at output images.

gallabytes | 4 years ago | on: How “latency numbers everybody should know” decreased from 1990–2020

SIMD - compressing has gotten faster, but (assuming OP is correct rather than just missing info) the reference algorithm didn't have room to take advantage of SIMD. The relevant improvements since 2010 or so mostly look like bandwidth improvements not latency, and coincide with increasing ubiquity of SIMD instructions and SIMD-friendly algorithms.

gallabytes | 6 years ago | on: Stages of denial in encountering K

there almost certainly is an encoding such that every program any human will ever write fits in 128 bytes, though I doubt we'll ever design one. to convince yourself of this, notice that you don't expect to ever produce two programs with the same blake2 hash.

there's a lot of room for improvement in conciseness of code. I would still be surprised if it was meaningfully possible to write a full-featured modern OS with one page of APL

gallabytes | 9 years ago | on: Logical Induction

This makes me think the only thing we disagree on is the meaning of the words "red team" and "blue team" :)

When I say it feels like we spend a lot of time red teaming, that means I think we spend somewhere between 30 and 60% of research time trying to break things and see how they fail. This is fully compatible with not immediately implementing things - it's much less expensive to break something /before/ you build it.

gallabytes | 9 years ago | on: Logical Induction

I find that perception fairly surprising, as for a very long time it felt like we did more red team than blue team. I do acknowledge that this has been changing recently, but only significantly in the context of building on the results in this paper.

gallabytes | 10 years ago | on: Relation Between Type Theory, Category Theory and Logic

We have something between the two in HoTT - universes of types is stratified by homotopy levels, corresponding to how many dimensions of structure a type has. A space with only points is thus a 0-type, a space with at most 1 point is a -1-type, and a space with only one is a -2-type.

The catch is that univalence is inconsistent with LEM at h-levels greater than -1, but assuming it is perfectly consistent for -1 types, which can be thought of as the "at most true" propositions of classical logic.

page 1