top | item 45340610

(no title)

prngl | 5 months ago

This is cool.

There's an interesting parallel with ML compilation libraries (TensorFlow 1, JAX jit, PyTorch compile) where a tracing approach is taken to build up a graph of operations that are then essentially compiled (or otherwise lowered and executed by a specialized VM). We're often nowadays working in dynamic languages, so they become essentially the frontend to new DSLs, and instead of defining new syntax, we embed the AST construction into the scripting language.

For ML, we're delaying the execution of GPU/linalg kernels so that we can fuse them. For RPC, we're delaying the execution of network requests so that we can fuse them.

Of course, compiled languages themselves delay the execution of ops (add/mul/load/store/etc) so that we can fuse them, i.e. skip over the round-trip of the interpreter/VM loop.

The power of code as data in various guises.

Another angle on this is the importance of separating control plane (i.e. instructions) from data plane in distributed systems, which is any system where you can observe a "delay". When you zoom into a single CPU, it acknowledges its nature as a distributed system with memory far away by separating out the instruction pipeline and instruction cache from the data. In Cap'n Web, we've got the instructions as the RPC graph being built up.

I just thought these were some interesting patterns. I'm not sure I yet see all the way down to the bottom though. Feels like we go in circles, or rather, the stack is replicated (compiler built on interpreter built on compiler built on interpreter ...). In some respect this is the typical Lispy code is data, data is code, but I dunno, feels like there's something here to cut through...

discuss

ryanrasti|5 months ago

Agree -- I think that's a powerful generalization you're making.

> We're often nowadays working in dynamic languages, so they become essentially the frontend to new DSLs, and instead of defining new syntax, we embed the AST construction into the scripting language.

And I'd say that TypeScript is the real game-changer here. You get the flexibility of the JavaScript runtime (e.g., how Cap'n Web cleverly uses `Proxy`s) while still being able to provide static types for the embedded DSL you're creating. It’s the best of both worlds.

I've been spending all of my time in the ORM-analog here. Most ORMs are severely lacking on composability because they're fundamentally imperative and eager. A call like `db.orders.findAll()` executes immediately and you're stuck without a way to add operations before it hits the database.

A truly composable ORM should act like the compilers you mentioned: use TypeScript to define a fully typed DSL over the entirety of SQL, build an AST from the query, and then only at the end compile the graph into the final SQL query. That's the core idea I'm working on with my project, Typegres.

If you find the pattern interesting: https://typegres.com/play/

prngl|5 months ago

I do find the pattern interesting and powerful.

But at the same time, something feels off about it (just conceptually, not trying to knock your money-making endeavor, godspeed). Some of the issues that all of these hit is:

- No printf debugging. Sometimes you want things to be eager so you can immediately see what's happening. If you print and what you see is <RPCResultTracingObject> that's not very helpful. But that's what you'll get when you're in a "tracing" context, i.e. you're treating the code as data at that point, so you just see the code as data. One way of getting around this is to make the tracing completely lazy, so no tracing context at all, but instead you just chain as you go, and something like `print(thing)` or `thing.execute()` actually then ships everything off. This seems like how much of Cap'n Web works except for the part where they embed the DSL, and then you're in a fundamentally different context.

- No "natural" control flow in the DSL/tracing context. You have to use special if/while/for/etc so that the object/context "sees" them. Though that's only the case if the control flow is data-dependent; if it's based on config values that's fine, as long as the context builder is aware.

- No side effects in the DSL/tracing context because that's not a real "running" context, it's only run once to build the AST and then never run again.

Of the various flavors of this I've seen, it's the ML usage I think that's pushed it the furthest out of necessity (for example, jax.jit https://docs.jax.dev/en/latest/_autosummary/jax.jit.html, note the "static*" arguments).

Is this all just necessary complexity? Or is it because we're missing something, not quite seeing it right?

ignoramous|5 months ago

> I just thought these were some interesting patterns.

Reading this from TFA ...

  Alice and Bob each maintain some state about the connection. In particular, each maintains an "export table", describing all the pass-by-reference objects they have exposed to the other side, and an "import table", describing the references they have received. 

  Alice's exports correspond to Bob's imports, and vice versa. Each entry in the export table has a signed integer ID, which is used to reference it. You can think of these IDs like file descriptors in a POSIX system. Unlike file descriptors, though, IDs can be negative, and an ID is never reused over the lifetime of a connection.

  At the start of the connection, Alice and Bob each populate their export tables with a single entry, numbered zero, representing their "main" interfaces.

  Typically, when one side is acting as the "server", they will export their main public RPC interface as ID zero, whereas the "client" will export an empty interface. However, this is up to the application: either side can export whatever they want.

... sounds very similar to how Binder IPC (and soon RPC) works on Android.