top | item 4058663

Musl libc

85 points| pmarin | 13 years ago |etalabs.net

22 comments

order
[+] dalias|13 years ago|reply
Addressing afhof's comments...

1 & 2. I don't see any reason to believe there will ever be a new machine with non-power-of-two word size, and I'm doubtful that there are even any historical post-standardization C implementations like that. Certainly Linux, which makes much bigger assumptions about word size and the relationship of types, will never support such hypothetical systems. Generality is good if it buys you anything practical, but my approach has been to avoid excess generality unless it has a practical benefit. This approach has been very beneficial in the dynamic linker, wherein assuming that things that are the same on real-world archs actually are the same, the amount of per-arch code to maintain is only some 30 lines (which might grow to 60-100 once TLS is supported), as opposed to many hundreds or even thousands in other implementations.

At this point, musl does not have official documentation/manual. When it does, these sorts of requirements as well as all the implementation documentation required by ISO C and POSIX will be documented.

3. The casts are all necessary (C does not define implicit conversions between pointer types) and correct. Casting to (void *) rather than an explicit type is my preference because it reduces duplication of the type in multiple places, but in any case this is purely a stylistic matter and has nothing to do with the generated code or correctness.

4. The ONES macro was leftover from other files on which this code was based (strchr and strcpy). Obviously it's not needed in memcpy which copies a fixed number of bytes without searching for any terminator character, so it could/should be removed. Thanks for catching this.

[+] afhof|13 years ago|reply
Some things come to mind when looking through the source. Perhaps they are pedantic gripes, but perhaps they need some work. I had these observations while looking through memcpy.c:

1. Assumption of power of 2 word size. The ALIGN macro is defined as one less than the size of size_t (which should be enough to hold the length of something in memory). I would be more comfortable with it being defined as the the size of a processor WORD rather than as a length of memory.

2. Using the ALIGN macro in calculations of misaligned memory copies. This goes to #1's point: Suppose the word size is not a power of 2? Suppose the word size is 6 octets rather than 4 or 8. The & mask might produce incorrect results since there would be 0's in the mask.

3. There seems to be a lot of casting going on where it either shouldn't be or it looks wrong. A good example of this is the cast to void on line 19. If there is a cast, why isn't to the (size_t*) type? Even if this is correct, I now have to think a lot more about it, or hunt through the commit log. Yes, I could. No, I shouldn't have to.

4. The ONES macro is unused, and the ALIGN MACRO are not part of a header somewhere. An alignment macro seems like it might be beneficial in other parts of the library. I recognize that this increases interdependency, but I feel it may be justified since it would be useful in other place, and may be changed in the future.

I am very pleased that this library is being developed. I think competition drives up the quality of code, and libc is one of the places where high quality code is invaluable. (Looking at the comparison chart they provided was impressive!) I must admit I don't have good solution to the issues I found, so I'm afraid my criticism may not be as constructive as I would prefer.

Thanks for making this, it was quite pleasant to look at non Drepper'd libc code. :)

Here is the link I looked at for reference: http://git.etalabs.net/cgi-bin/gitweb.cgi?p=musl;a=blob;f=sr...

[+] huhtenberg|13 years ago|reply
> Assumption of power of 2 word size

You don't say. And I bet they have no intention of supporting middle-endian byte ordering either. What an amateur act :)

[+] archangel_one|13 years ago|reply
I agree that it's conceptually nice to support non-power of two word sizes, but how often do those come up these days in practice? If none of the architectures they're targeting actually have such a word size, it seems like it might be a useful simplifying assumption.
[+] revelation|13 years ago|reply
I think the usual benchmark if you get to the point that you can run Linux and use an extensive libc is compatibility with existing software, not necessarily a conformant implementation.
[+] GregorR|13 years ago|reply
I've been experimenting with using NetBSD pkgsrc on musl to see how far it goes. Many hundreds of packages compile and pass all their tests, including, for instance, every major language interpreter, various development libraries and resources, and XFCE4. Things that don't compile usually fall into one of two categories: (1) using #ifdef __linux__ to mean #ifdef __GLIBC__, and (2) fundamentally unportable packages that have huge hacks for every single platform they support. There's rumbling about creating a package support wiki where people can document what packages work, don't work, or require patches to work, but it doesn't yet exist.
[+] nwmcsween|13 years ago|reply
#musl on irc.freenode.net to discuss development.
[+] vasco|13 years ago|reply
Don't really get the theory behind changing the default stack size for threads. Feels like they did it just to be different which might get someone scratching their heads for a bit.
[+] dalias|13 years ago|reply
The glibc default thread stack size is unacceptable/broken for a couple reasons. It eats memory like crazy (usually 8-10 megs per thread), and by memory I mean commit charge, which is a highly finite resource on a well-configured system without overcommit. Even if you allow overcommit, on 32-bit systems you'll exhaust virtual memory quickly, putting a low cap on the number of threads you can create (just 300 threads will use all 3GB of address space).

With that said, musl's current default is way too low. It's caused problems with several major applications such as git. We're in the process of trying to establish a good value for the default, which will likely end up being somewhere between 32k and 256k. I'm thinking 80k right now (96k including guard page and POSIX thread-local storage) but I would welcome evidence/data that helps make a good choice.

[+] chj|13 years ago|reply
A grand project! Love the benchmark table.