top | item 39782008

(no title)

bear9628 | 1 year ago

One of the reasons why we are using Zig is that we can consume the C API mostly as is.

When importing the C headers Zig translates the C headers into Zig declarations. Unfortunately this is not always possible for C macros, and the reason why we have to maintain those structs. And this is where we have to step in with Custom Zig code. But most of the time we actually consume the C APIs as is.

> If we were to provide zig consumable ways of creating the struct from C headers, would that help?

Yes, that would be awesome. I'm curious how they will look like in the future.

We have had the most difficulties with the module magic, function info, and varatt/Datum macros. Fortunately you have to solve the module magic/function info "only once". The Datum conversion and VARATT macros are more troublesome. We have some conversion support for a number of common zig types. But ideally we would like users to be able to use the C APIs as is, while we provide some type directed default conversions for convenience.

The main problems we've been facing with C macro translations are type conversions/casts in macros, especially if the underyling struct heavily uses union (for example VARATT macros). In some cases translating inline functions instead of macros might work better, due to the translator having more type information available. We fixed some of the translations manually. You can find them in varatt.zig and datum.zig (where we opted to implement the text to cstring translation ourselves).

Data structures like lists, slist, dlist, hash tables are quite consumable as is. We have some typed wrappers for those and provide iterators. Macros with control flow can not be reused, but I think this is fair, especially as the foreach macros are a very common C patterns. All in all we have had no troubles with them.

discuss

anarazel|1 year ago

Hi,

> > If we were to provide zig consumable ways of creating the struct from C headers, would that help?

> Yes, that would be awesome.

I don't know much about zig and won't have a whole lot of time to learn - but if you can outline what is required to make C macros [un]usable, it might be possible to improve something. Either on its own, or as part of future work.

> I'm curious how they will look like in the future.

There's quite a few things.

For one, I'd like to introduce a faster function calling infrastructure for the simple cases (mainly small number of arguments, without SRF support). That'll need to be declared in the function info struct.

For another, eventually I want to support a different encoding for variable length types. Including making it reasonably efficient to have variable-length integers.

> We have had the most difficulties with the module magic, function info, and varatt/Datum macros. Fortunately you have to solve the module magic/function info "only once". The Datum conversion and VARATT macros are more troublesome. We have some conversion support for a number of common zig types. But ideally we would like users to be able to use the C APIs as is, while we provide some type directed default conversions for convenience.

Ugh, the varatt stuff doesn't look easily maintainable long-term. It looks like you just need it for getDatumTextSliceZ()?

At the same time, I don't really know why you need it? Most of this should be doable via C functions, and the parts that are not, you could easily wrap yourself - you already seem to have some C code as part of pgzx.

bear9628|1 year ago

Edit: I wasn't finished before submitting the response by accident :)

> > > If we were to provide zig consumable ways of creating the struct from C headers, would that help? > > Yes, that would be awesome. > I don't know much about zig and won't have a whole lot of time to learn - but if you can outline what is required to make C macros [un]usable, it might be possible to improve something. Either on its own, or as part of future work.

Hm. Sometimes it is difficult to tell until you try to use a macro. This is because the compiler ignores code (no type checking) that is not used. Difficult to explain, but assume that you write a program that emits typed code that is eventually compiled. This is what enables comptime, and "best effort" C header imports.

The toolchain tries to convert macros into inline functions. That means any macro that contains some form of control flow or opens/closes a code block can't be used. Most obvious ones are the foreach loops, PG_TRY and friends or the PG_RETURN_X macros (luckily we can just use the XGetDatum functions).

Union types are difficult as I said. But maybe this is rather a Zig problem.

Sometimes using consts. For example when working with the varattrib variants we have bit wise operations and shift for example:

``` #define VARSIZE_4B(PTR) \ ((((varattrib_4b *) (PTR))->va_4byte.va_header >> 2) & 0x3FFFFFFF) ```

Now the 2 and the bit pattern might be translated into different types (e.g. int), which might not be compatible with va_header (which is an uint32). Sometimes the types for the constants look ok, sometimes not. Maybe this is something that could still improved in Zig, not sure. I haven't tried this, but I wonder what would happen if I annotate the types for the constatns in the macro (which might not make them more readible :) ).

We later decided to allow mixing C with Zig code in case we need some kind of "complex" wrapping in C. This might not be fully ideal, but fortunately Zig is also a C compiler which allows us to fallback to C if we find something to complicated.

> For one, I'd like to introduce a faster function calling infrastructure for the simple cases (mainly small number of arguments, without SRF support). That'll need to be declared in the function info struct.

This sounds great. In pgzx we actually allow developers to capture the function call info as argument in their function implementation (not shown in our examples). For example if someone wants to use the collation, do some checks on nargs, implement a function with variable number of arguments.

But out of the box we try to derive input and return types and conversions at compile time. I would have to see how the new API looks like, but I think we still would be able to continue to automatically derive the conversions to extract the arguments into values in Zig.

> Ugh, the varatt stuff doesn't look easily maintainable long-term. It looks like you just need it for getDatumTextSliceZ()?

> At the same time, I don't really know why you need it? Most of this should be doable via C functions, and the parts that are not, you could easily wrap yourself - you already seem to have some C code as part of pgzx.

True. We introduced C into the code base later in our development. The project is still very new and we might revisit some choices on the Datum encoding.

The `getDatumTextSliceZ` actually resembles the `text_to_cstring` function, which we might want to use in the future instead. In `Zig` a string is a slice, which is a fat pointer (pointer + length field). The type `[:0]const u8` represents a slice with 0 terminator (fun fact, Zig gives you a stack traces if you forget to write the terminator into your buffer). Initially we implemented this function directly so we can directly initialize the fat pointer without having to get the string length after doing the conversion.

We added C to our code base later in time to allow us to wrap simple cases more easily without having to reproduce Postgres code in Zig. I guess we should revisit `getDatumTextSliceZ` :). Either have a small C wrapper over `text_to_cstring` that also returns the length or just bite the bullet and do a `strlen` after.

Another motivator to try to fix the VARATT/VARDATA macros was to allow developers to use those in their own extensions. Looking at some extensions in contrib we find e.g. `VARDATA_ANY` or `VARSIZE_ANY_EXHDR` being used quite a bit.