For your level 2 code, `uint64_t data[];` is wrong for types whose alignment is greater than `uint64_t`, and also wasteful for types whose alignment is smaller (for example, under an ilp32 ABI on 64-bit architectures).
For your level 3 code, it should be `int main() { List(Foo) foo_list = {NULL};`
Note that working around a lack of `typeof` means you can't return anything. Also, your particular workaround allows `const`ness errors since `==` is symmetrical.
You can't safely omit `payload` since you need it to know the correct size. Consider a `List(int64_t)` and you try to add an `int32_t` to it - this should be fine, but you can't `sizeof` the `int32_t`. Your code is actually lacking quite a bit to make this work.
=====
There are 2 major limitations to generics in C right now:
* Delegating to a vtable (internal or external) is limited in functionality, since structs cannot contain macros, only functions.
* Delegating to an external vtable (mandatory to avoid overhead) means that you have to forward-declare all of the types you'll ever use a vtable with. So far the best approach I've found is to declare (but not define) static functions in the same forwarding header I declare the typedefs in; note that GCC and Clang differ in what phase the "undefined static" warning appears in for the case where you don't actually include that particular type's header in a given TU.
(think about writing a function that accepts either `struct SizedBuffer {void *p; size_t len;};` or `struct BoundedBuffer {void *begin; void *end;};`, and also const versions thereof - all from different headers).
> Delegating to an external vtable (mandatory to avoid overhead) means that you have to forward-declare all of the types you'll ever use a vtable with.
We went down the rabbit hole of writing a compiler for this as part of a project I used to work on (Apache Clownfish[1], a subproject of the retired Apache Lucy project). We started off parsing .h files, but eventually it made sense to create our own small header language (.cfh "Clownfish Header" files).
Here's some generated code for invoking the CharBuf version of the "Clone" method defined in parent class "Obj":
We had our reasons for going to these extremes: the point of Clownfish was to provide a least-common-denominator object model for bindings to multiple dynamic languages (similar problem domain to SWIG), and the .cfh files also were used to derive types for the binding languages. But there was truly an absurd amount of boilerplate being generated to get around the issue you identify.
This is why almost everybody just uses casts to void* for the invocant, skipping type safety.
> it should be `int main() { List(Foo) foo_list = {NULL};`
In C `int main()` means the function takes an unknown number of arguments. You need `int main(void)` to mean it doesn't take any arguments. This is a fact frequently forgotten by those who write C++.
I would love for `union`s to be federated, that is, a type could declare itself as thought it was part of a union with another type, without having to pre-declare all possible types in one place.
The trick#0 you mention is how I made an entire C dialect. Here is a generic binary heap, for example https://github.com/gritzko/librdx/blob/master/abc/HEAPx.h The syntax is a bit heavyweight, but a huge huge advantage is: you get regular C structs in the end, very plain, very predictable, very optimizable. Compiler would eat them like donuts.
In the other cases, it is void* and runtime memory sizing and you have to define macros anyway.
Author here. Binary heaps and linked lists are different use cases. A binary heap must read the data you put in it to store it correctly, but a linked list doesn't. If I were writing a generic binary heap, maybe I would weigh my options differently. I mentioned this in the footnotes.
I agree, there are actually several reasons to prefer the header impl. Debugging is better, both because you can step through the header code where you can’t with a macro function, and because the type information available to the debugger is better. There are more opportunities for compiler optimizations because each instantiation is monomorphized and you don’t pay a runtime cost with variable sizing, generic structures can also be placed on the stack because of the fixed sizing.
There are workarounds for at least two of the problems the author mentions. Naming can be changed from Bar_func(args…) to func(Bar)(args…) with a function name macro that just does name mangling. You can avoid some of the binary bloat by using weak symbols, letting the linker deduplicate functions shared between translation units at link time.
There are other problems for generic containers of pointer types however, you can work around them by using a typedef or a type alias.
Intrusive data structures are more convenient in C still, but working with them in a debugger is a pain.
The casting of the function type assumes that the item pointer type (e.g. Foo*) has the same representation as void*, which the C standard doesn’t guarantee (in standardese: the two types aren’t “compatible”). Calling the function with the converted type therefore constitutes undefined behavior. It also impacts aliasing analysis by compilers (see [0], incidentally), even if the pointer representation happens to be the same.
This casting of the functions to different argument types constitutes the core of the type safety of the generic invocations; I’m not sure it can be fixed.
because i work on a legacy project that is coupled to safety regulations and other quality guarantees, and we cannot just simply roll out a solution ported to c++ on the next release, or even tenth, so perhaps we make it work until we can.
however we can set a standard and expectation for new projects to use c++, and we do and set an expectation to target a specific std.
i see this sentiment quite a lot on hackernews -- feels like a lot of people saying "git gud" -- i would expect a lot more nuance applied here.
> Structurally identical types will be considered the same type in GCC 15 and Clang later in 2025 thanks to a rule change
Beware that only tagged unions are considered the same type under the new rule, provided they have the same structrure and the same tag.
The List(T) macro should be changed to generate a different tag for each different T. Which is trivial (with ##) for simple one-word types, but impossible for even mildly complex ones like pointers to char (strings).
Of course you can force yourself to typedef any type before using it in a List, but it looses much of its versatility. Example:
I believe "type witness" is the generic (hah) term for "member that doesn't do anything but hold the type". Lot less literature out there about type witnesses than I had thought though...
There's also the method used in the Linux kernel to embed the list information (struct list_head) within the type specific struct.
https://kernelnewbies.org/FAQ/LinkedLists
struct ListNode(T) {
ListNode* next;
T data;
}
T!int node;
Why suffer the C preprocessor? Using preprocessor macros is like using a hammer for finish carpentry, rather than a nail gun. A nail gun is 10x faster, drives the nail perfectly every time, and no half moon dents in your work.
Interesting idea with the union and using typeof(). We (I) went with large macros defining wrappers instead, which, I believe, is a necessity with intrusive data structures, or at least I don't immediately see how to do that with unions & typeof. Maybe it's possible...?
I'm curious what a hashmap looks like with this approach. It's one thing to pass through or hold onto a generic value, but another to perform operations on it. Think computing the hash value or comparing equality of generic keys in a generic hashmap.
I first would question what a user wants to do with a hashmap that uses polymorphic key-values of unknowable type at compile-time.
As a thought experiment, you could certainly have users define their own hash and equality functions and attach them to the table-entries themselves. On first thought, that sounds like it would be rife with memory safety issues.
At the end of the day, it is all just bytes. You could simply say that you will only key based on raw memory sequences.
For my own hashmap implementation I followed a wasteful aproach since I’m. It targeting embedded.
I created a structure called a hashmap object. It has two elements: a void pointer and a char pointer. The first one is the data and the second one is the metadata. The metadata is basically a string were the user can put anything, the type of the data, more data, whatever.
Then I preallocate 10s of thousands of hashmap objects. That way users of my hashmap don’t have to think about aollocating and de allocating hashmap nodes, they just insert, delete and search freely. They still have to care about allocating and de allocating they’re own data though.
The key idea here seems to be to use function pointer's type to enforce type safety rather than using the data "handle" type (that is often found in implementations inspired by Sean Barrett's strechy_buffers).
> One annoying thing about C is that it does not consider these two variables to have the same type
Good share, thanks. My approach so far has been to define macros that generate the container type for a given underlying type, as well as all of the relevant type-safe functions, which end up calling into functions that take void*. But the union trick plus the structural identity does seem to simplify things a bit.
In that example they also had replace the union with a struct - presumably to work around this issue. But that seems wasteful to me too. Doing it within an if(0) seems strictly better.
When I saw the title I assumed it was originally "Why I" or "How I" and was trimmed automatically by HN, but this is the original. Could it be that the author was influenced by HN's title guidelines and titled it thus?
This is all well and good but, for my mileage, nothing can ever beat C++ generics and the RAII pattern when there is a choice between plain old C and C++ . One second while I get my fire-retardant suit on...
[+] [-] o11c|8 months ago|reply
For your level 3 code, it should be `int main() { List(Foo) foo_list = {NULL};`
Note that working around a lack of `typeof` means you can't return anything. Also, your particular workaround allows `const`ness errors since `==` is symmetrical.
You can't safely omit `payload` since you need it to know the correct size. Consider a `List(int64_t)` and you try to add an `int32_t` to it - this should be fine, but you can't `sizeof` the `int32_t`. Your code is actually lacking quite a bit to make this work.
=====
There are 2 major limitations to generics in C right now:
* Delegating to a vtable (internal or external) is limited in functionality, since structs cannot contain macros, only functions.
* Delegating to an external vtable (mandatory to avoid overhead) means that you have to forward-declare all of the types you'll ever use a vtable with. So far the best approach I've found is to declare (but not define) static functions in the same forwarding header I declare the typedefs in; note that GCC and Clang differ in what phase the "undefined static" warning appears in for the case where you don't actually include that particular type's header in a given TU.
(think about writing a function that accepts either `struct SizedBuffer {void *p; size_t len;};` or `struct BoundedBuffer {void *begin; void *end;};`, and also const versions thereof - all from different headers).
[+] [-] rectang|8 months ago|reply
We went down the rabbit hole of writing a compiler for this as part of a project I used to work on (Apache Clownfish[1], a subproject of the retired Apache Lucy project). We started off parsing .h files, but eventually it made sense to create our own small header language (.cfh "Clownfish Header" files).
Here's some generated code for invoking the CharBuf version of the "Clone" method defined in parent class "Obj":
Usage: We had our reasons for going to these extremes: the point of Clownfish was to provide a least-common-denominator object model for bindings to multiple dynamic languages (similar problem domain to SWIG), and the .cfh files also were used to derive types for the binding languages. But there was truly an absurd amount of boilerplate being generated to get around the issue you identify.This is why almost everybody just uses casts to void* for the invocant, skipping type safety.
[1] https://github.com/apache/lucy-clownfish
[+] [-] kccqzy|8 months ago|reply
In C `int main()` means the function takes an unknown number of arguments. You need `int main(void)` to mean it doesn't take any arguments. This is a fact frequently forgotten by those who write C++.
[+] [-] EPWN3D|8 months ago|reply
[+] [-] n_plus_1_acc|8 months ago|reply
`malloc(sizeof(*node) + data_size);`
[+] [-] unknown|8 months ago|reply
[deleted]
[+] [-] KingLancelot|8 months ago|reply
[deleted]
[+] [-] gritzko|8 months ago|reply
The trick#0 you mention is how I made an entire C dialect. Here is a generic binary heap, for example https://github.com/gritzko/librdx/blob/master/abc/HEAPx.h The syntax is a bit heavyweight, but a huge huge advantage is: you get regular C structs in the end, very plain, very predictable, very optimizable. Compiler would eat them like donuts.
In the other cases, it is void* and runtime memory sizing and you have to define macros anyway.
[+] [-] dhooper|8 months ago|reply
[+] [-] variadix|8 months ago|reply
There are workarounds for at least two of the problems the author mentions. Naming can be changed from Bar_func(args…) to func(Bar)(args…) with a function name macro that just does name mangling. You can avoid some of the binary bloat by using weak symbols, letting the linker deduplicate functions shared between translation units at link time.
There are other problems for generic containers of pointer types however, you can work around them by using a typedef or a type alias.
Intrusive data structures are more convenient in C still, but working with them in a debugger is a pain.
[+] [-] knutwannheden|8 months ago|reply
Made me laugh out loud!
[+] [-] layer8|8 months ago|reply
This casting of the functions to different argument types constitutes the core of the type safety of the generic invocations; I’m not sure it can be fixed.
[0] https://news.ycombinator.com/item?id=44421185
[+] [-] dhooper|8 months ago|reply
[+] [-] cherryteastain|8 months ago|reply
[+] [-] _proofs|8 months ago|reply
however we can set a standard and expectation for new projects to use c++, and we do and set an expectation to target a specific std.
i see this sentiment quite a lot on hackernews -- feels like a lot of people saying "git gud" -- i would expect a lot more nuance applied here.
[+] [-] Kranar|8 months ago|reply
[+] [-] pjmlp|8 months ago|reply
I was really disappointed that Microsoft decided to backtrack on C++'s is the future, after the new found Linux and FOSS love.
https://herbsutter.com/2012/05/03/reader-qa-what-about-vc-an...
https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-...
Not that it matters much, as nowadays C and C++ have a new policy at Microsoft due to goverments and cyberlaws.
https://azure.microsoft.com/en-us/blog/microsoft-azure-secur...
https://blogs.windows.com/windowsexperience/2024/11/19/windo...
[+] [-] petabyt|8 months ago|reply
[+] [-] unknown|8 months ago|reply
[deleted]
[+] [-] brunker2|8 months ago|reply
[+] [-] uecker|8 months ago|reply
[+] [-] teo_zero|8 months ago|reply
Beware that only tagged unions are considered the same type under the new rule, provided they have the same structrure and the same tag.
The List(T) macro should be changed to generate a different tag for each different T. Which is trivial (with ##) for simple one-word types, but impossible for even mildly complex ones like pointers to char (strings).
Of course you can force yourself to typedef any type before using it in a List, but it looses much of its versatility. Example:
[+] [-] alcover|8 months ago|reply
I don't get it. Tagged union is just a design pattern.
[+] [-] chuckadams|8 months ago|reply
[+] [-] gizmo686|8 months ago|reply
I've mostly seen in done in Haskell, but have used it in it Scala as well to simulate type hierarchies not present in the actual type system.
In a way, this union trick is kind of like a phantom type, since the secondary type is never actually used.
[+] [-] mfuzzey|8 months ago|reply
[+] [-] nixpulvis|8 months ago|reply
[+] [-] germandiago|8 months ago|reply
[+] [-] WalterBright|8 months ago|reply
[+] [-] eqvinox|8 months ago|reply
e.g. hash table wrapper: https://github.com/FRRouting/frr/blob/master/lib/typesafe.h#...
(cf. https://docs.frrouting.org/projects/dev-guide/en/latest/list...)
[+] [-] hgs3|8 months ago|reply
[+] [-] lhearachel|8 months ago|reply
As a thought experiment, you could certainly have users define their own hash and equality functions and attach them to the table-entries themselves. On first thought, that sounds like it would be rife with memory safety issues.
At the end of the day, it is all just bytes. You could simply say that you will only key based on raw memory sequences.
[+] [-] makz|8 months ago|reply
For my own hashmap implementation I followed a wasteful aproach since I’m. It targeting embedded.
I created a structure called a hashmap object. It has two elements: a void pointer and a char pointer. The first one is the data and the second one is the metadata. The metadata is basically a string were the user can put anything, the type of the data, more data, whatever.
Then I preallocate 10s of thousands of hashmap objects. That way users of my hashmap don’t have to think about aollocating and de allocating hashmap nodes, they just insert, delete and search freely. They still have to care about allocating and de allocating they’re own data though.
[+] [-] HexDecOctBin|8 months ago|reply
> One annoying thing about C is that it does not consider these two variables to have the same type
C23 solves that too: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3037.pdf
Supported by latest GCC and Clang, but not by MSVC.
[+] [-] unknown|8 months ago|reply
[deleted]
[+] [-] unknown|8 months ago|reply
[deleted]
[+] [-] canyp|8 months ago|reply
[+] [-] david2ndaccount|8 months ago|reply
[+] [-] josephg|8 months ago|reply
[+] [-] tehnub|8 months ago|reply
[+] [-] jayde2767|8 months ago|reply