Low-Level Optimization with Zig

[+] dustbunny|9 months ago|reply

What interests me most by zig is the ease of the build system, cross compilation, and the goal of high iteration speed. I'm a gamedev, so I have performance requirements but I think most languages have sufficient performance for most of my requirements so it's not the #1 consideration for language choice for me.

I feel like I can write powerful code in any language, but the goal is to write code for a framework that is most future proof, so that you can maintain modular stuff for decades.

C/C++ has been the default answer for its omnipresent support. It feels like zig will be able to match that.

[+] haberman|9 months ago|reply

> I feel like I can write powerful code in any language, but the goal is to write code for a framework that is most future proof, so that you can maintain modular stuff for decades.

I like Zig a lot, but long-term maintainability and modularity is one of its weakest points IMHO.

Zig is hostile to encapsulation. You cannot make struct members private: https://github.com/ziglang/zig/issues/9909#issuecomment-9426...

Key quote:

> The idea of private fields and getter/setter methods was popularized by Java, but it is an anti-pattern. Fields are there; they exist. They are the data that underpins any abstraction. My recommendation is to name fields carefully and leave them as part of the public API, carefully documenting what they do.

You cannot reasonably form API contracts (which are the foundation of software modularity) unless you can hide the internal representation. You need to be able to change the internal representation without breaking users.

Zig's position is that there should be no such thing as internal representation; you should publicly expose, document, and guarantee the behavior of your representation to all users.

I hope Zig reverses this decision someday and supports private fields.

[+] FlyingSnake|9 months ago|reply

I recently, for fun, tried running zig on an ancient kindle device running stripped down Linux 4.1.15.

It was an interesting experience and I was pleasantly surprised by the maturity of Zig. Many things worked out of the box and I could even debug a strange bug using ancient GDB. Like you, I’m sold on Zig too.

I wrote about it here: https://news.ycombinator.com/item?id=44211041

[+] osigurdson|9 months ago|reply

I've dabbled in Rust, liked it, heard it was bad so kind of paused. Now trying it again and still like it. I don't really get why people hate it so much. Ugly generics - same thing in C# and Typescript. Borrow checker - makes sense if you have done low level stuff before.

[+] wg0|9 months ago|reply

Zig seems to be simpler Rust and better Go.

Off topic - One tool built on top of Zig that I really really admire is bun.

I cannot tell how much simpler my life is after using bun.

Similar things can be said for uv which is built in Rust.

[+] raincole|9 months ago|reply

I wonder how zig works on consoles. Usually consoles hate anything that's not C/C++. But since zig can be transpiled to C, perhaps it's not completely ruled out?

[+] 9d|9 months ago|reply

> C/C++ has been the default

You're not really going to make something better than C. If you try, it will most likely become C++ anyway. But do try anyway. Rust and Zig are evidence that we still dream that we can do better than C and C++.

Anyway I'm gonna go learn C++.

[+] el_pollo_diablo|9 months ago|reply

> In fact, even state-of-art compilers will break language specifications (Clang assumes that all loops without side effects will terminate).

I don't doubt that compilers occasionally break language specs, but in that case Clang is correct, at least for C11 and later. From C11:

> An iteration statement whose controlling expression is not a constant expression, that performs no input/output operations, does not access volatile objects, and performs no synchronization or atomic operations in its body, controlling expression, or (in the case of a for statement) its expression-3, may be assumed by the implementation to terminate.

[+] tialaramex|9 months ago|reply

C++ says (until the future C++ 26 is published) all loops, but as you noted C itself does not do this, only those "whose controlling expression is not a constant expression".

Thus in C the trivial infinite loop for (;;); is supposed to actually compile to an infinite loop, as it should with Rust's less opaque loop {} -- however LLVM is built by people who don't always remember they're not writing a C++ compiler, so Rust ran into places where they're like "infinite loop please" and LLVM says "Aha, C++ says those never happen, optimising accordingly" but er... that's the wrong language.

[+] uecker|9 months ago|reply

You don't really need comptime to be able to inline and unroll a string comparison. This also works in C: https://godbolt.org/z/6edWbqnfT (edit: fixed typo)

[+] Retro_Dev|9 months ago|reply

Yep, you are correct! The first example was a bit too simplistic. A better one would be https://github.com/RetroDev256/comptime_suffix_automaton

Do note that your linked godbolt code actually demonstrates one of the two sub-par examples though.

[+] saagarjha|9 months ago|reply

> As an example, consider the following JavaScript code…The generated bytecode for this JavaScript (under V8) is pretty bloated.

I don't think this is a good comparison. You're telling the compiler for Zig and Rust to pick something very modern to target, while I don't think V8 does the same. Optimizing JITs do actually know how to vectorize if the circumstances permit it.

Also, fwiw, most modern languages will do the same optimization you do with strings. Here's C++ for example: https://godbolt.org/z/TM5qdbTqh

[+] vanderZwan|9 months ago|reply

In general it's a bit of an apples to fruit salad comparison, albeit one that is appropriate to highlight the different use-cases of JS and Zig. The Zig example uses an array with a known type of fixed size, the JS code is "generic" at run time (x and y can be any object). Which, fair enough, is something you'd have to pay the cost for in JS. Ironically though in this particular example one actually would be able to do much better when it comes to communicating type information to the JIT: ensure that you always call this function with Float64Arrays of equal size, and the JIT will know this and produce a faster loop (not vectorized, but still a lot better).

Now, one rarely uses typed arrays in practice because they're pretty heavy to initialize so only worth it if one allocates a large typed array one once and reuses them a lot aster that, so again, fair enough! One other detail does annoy me a little bit: the article says the example JS code is pretty bloated, but I bet that a big part of that is that the JS JIT can't guarantee that 65536 equals the length of the two arrays so will likely insert a guard. But nobody would write a for loop that way anyway, they'd write it as i < x.length, for which the JIT does optimize at least one array check away. I admit that this is nitpicking though.

[+] Retro_Dev|9 months ago|reply

You can change the `target` in those two linked godbolt examples for Rust and Zig to an older CPU. I'm sorry I didn't think about the limitations of the JS target for that example. As for your link, It's a good example of what clang can do for C++ - although I think that the generated assembly may be sub-par, even if you factor in zig compiling for a specific CPU here. I would be very interested to see a C++ port of https://github.com/RetroDev256/comptime_suffix_automaton though. It is a use of comptime that can't be cleanly guessed by a C++ compiler.

[+] csjh|9 months ago|reply

> High level languages lack something that low level languages have in great adundance - intent.

Is this line really true? I feel like expressing intent isn't really a factor in the high level / low level spectrum. If anything, more ways of expressing intent in more detail should contribute towards them being higher level.

[+] wk_end|9 months ago|reply

I agree with you and would go further: the fundamental difference between high-level and low-level languages is that in high-level languages you express intent whereas in low-level languages you are stuck resorting to expressing underlying mechanisms.

[+] jeroenhd|9 months ago|reply

I think this isn't referring to intent as in "calculate the tax rate for this purchase" but rather "shift this byte three positions to the left". Less about what you're trying to accomplish, and more about what you're trying to make the machine do.

Something like purchase.calculate_tax().await.map_err(|e| TaxCalculationError { source: e })?; is full of intent, but you have no idea what kind of machine code you're going to end up with.

[+] timewizard|9 months ago|reply

That for loop syntax is horrendous.

So I have two lists, side by side, and the position of items in one list matches positions of items in the other? That just makes my eyes hurt.

I think modern languages took a wrong turn by adding all this "magic" in the parser and all these little sigils dotted all around the code. This is not something I would want to look at for hours at a time.

[+] int_19h|9 months ago|reply

Such arrays are an extremely common pattern in low-level code regardless of language, and so is iterating them in parallel, so it's natural for Zig to provide a convenient syntax to do exactly that in a way that makes it clear what's going on (which IMO it does very well). Why does it make your eyes hurt?

[+] KingOfCoders|9 months ago|reply

I do love the allocator model of Zig, I would wish I could use something like an request allocator in Go instead of GC.

[+] usrnm|9 months ago|reply

Custom allocators and arenas are possible in go and even do exist, but they ara just very unergonomic and hard to use properly. The language itself lacks any way to express and enforce ownership rules, you just end up writing C with a slightly different syntax and hoping for the best. Even C++ is much safer than go without GC

[+] WalterBright|9 months ago|reply

> Rust's memory model allows the compiler to always assume that function arguments never alias. You must manually specify this in Zig.

I've avoided such manual specification of aliasing because:

1. few people understand it

2. using it erroneously can result in baffling bugs in your code

[+] WalterBright|9 months ago|reply

> The flexibility of Zig's comptime has resulted in some rather nice improvements in other programming languages.

Compile time function execution and functions with constant arguments were introduced in D in 2007, and resulted in many other languages adopting something similar.

https://dlang.org/spec/function.html#interpretation

[+] flohofwoe|9 months ago|reply

> I love Zig for it's verbosity.

I love Zig too, but this just sounds wrong :)

For instance, C is clearly too sloppy in many corners, but Zig might (currently) swing the pendulum a bit too far into the opposite direction and require too much 'annotation noise', especially when it comes to explicit integer casting in math expressions (I wrote about that a bit here: https://floooh.github.io/2024/08/24/zig-and-emulators.html).

When it comes to performance: IME when Zig code is faster than similar C code then it is usually because of Zig's more aggressive LLVM optimization settings (e.g. Zig compiles with -march=native and does whole-program-optimization by default, since all Zig code in a project is compiled as a single compilation unit). Pretty much all 'tricks' like using unreachable as optimization hints are also possible in C, although sometimes only via non-standard language extensions.

C compilers (especially Clang) are also very aggressive about constant folding, and can reduce large swaths of constant-foldable code even with deep callstacks, so that in the end there often isn't much of a difference to Zig's comptime when it comes to codegen (the good thing about comptime is of course that it will not silently fall back to runtime code - and non-comptime code is still of course subject to the same constant-folding optimizations as in C - e.g. if a "pure" non-comptime function is called with constant args, the compiler will still replace the function call with its result).

TL;DR: if your C code runs slower than your Zig code, check your C compiler settings. After all, the optimization heavylifting all happens down in LLVM :)

[+] messe|9 months ago|reply

With regard to the casting example, you could always wrap the cast in a function:

    fn signExtendCast(comptime T: type, x: anytype) T {
        const ST = std.meta.Int(.signed, @bitSizeOf(T));
        const SX = std.meta.Int(.signed, @bitSizeOf(@TypeOf(x)));
        return @bitCast(@as(ST, @as(SX, @bitCast(x))));
    }

    export fn addi8(addr: u16, offset: u8) u16 {
        return addr +% signExtendCast(u16, offset);
    }

This compiles to the same assembly, is reusable, and makes the intent clear.

[+] titzer|9 months ago|reply

Zig has some interesting ideas, and I thought the article was going to be more on the low-level optimizations, but it turned out to be "comptime and whole program compilation are great". And I agree. Virgil has had the full language available at compile time, plus whole program compilation since 2006. But Virgil doesn't target LLVM, so speed comparisons end up being a comparison between two compiler backends.

Virgil leans heavily into the reachability and specialization optimizations that are made possible by the compilation model. For example it will aggressively devirtualize method calls, remove unreachable fields/objects, constant-promote through fields and heap objects, and completely monomorphize polymorphic code.

[+] skywal_l|9 months ago|reply

Maybe with the new x86 backend we might see some performance differences between C and Zig that could definitely be attributed solely to the Zig project.

[+] Zambyte|9 months ago|reply

Regarding the explicit integer casting, it seems like there is some cleanup that will be coming soon: https://ziggit.dev/t/short-math-notation-casting-clarity-of-...

[+] int_19h|9 months ago|reply

I rather suspect that the pendulum will swing rather strongly towards more verbose and explicit languages in general in the upcoming years solely because it makes things easier for AI.

(Note that this is orthogonal to whether and to what extent use of AI for coding is a good idea. Even if you believe that it's not, the fact is that many devs believe otherwise, and so languages will strive to accommodate them.)

[+] Retro_Dev|9 months ago|reply

Ahh perhaps I need to clarify:

I don't love the noise of Zig, but I love the ability to clearly express my intent and the detail of my code in Zig. As for arithmetic, I agree that it is a bit too verbose at the moment. Hopefully some variant of https://github.com/ziglang/zig/issues/3806 will fix this.

I fully agree with your TL;DR there, but would emphasize that gaining the same optimizations is easier in Zig due to how builtins and unreachable are built into the language, rather than needing gcc and llvm intrinsics like __builtin_unreachable() - https://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Other-Builtins....

It's my dream that LLVM will improve to the point that we don't need further annotation to enable positive optimization transformations. At that point though, is there really a purpose to using a low level language?

[+] knighthack|9 months ago|reply

I'm not sure why allowances are made for Zig's verbosity, but not Go's.

What's good for the goose should be good for the gander.

[+] 9d|9 months ago|reply

> People will still mistakenly say "C is faster than Python", when the language isn't what they are benchmarking.

Yeah but some language features are disproportionately more difficult to optimize. It can be done, but with the right language, the right concept is expressed very quickly and elegantly, both by the programmer and the compiler.

[+] kamma4434|9 months ago|reply

I know nothing of Zig, but I worked long enough in lisp to know that the best macros are the ones you don’t write. They are wonderful but they have just as many drawbacks, and don’t compose nicely.

200 comments