top | item 45545217

(no title)

acidx | 4 months ago

One thing to note, too, is that `atoi()` should be avoided as much as possible. On error (parse error, overflow, etc), it has an unspecified return value (!), although most libcs will return 0, which can be just as bad in some scenarios.

Also not mentioned, is that atoi() can return a negative number -- which is then passed to malloc(), that takes a size_t, which is unsigned... which will make it become a very large number if a negative number is passed as its argument.

It's better to use strtol(), but even that is a bit tricky to use, because it doesn't touch errno when there's no error but you need to check errno to know if things like overflow happened, so you need to set errno to 0 before calling the function. The man page explains how to use it properly.

I think it would be a very interesting exercise for that web framework author to make its HTTP request parser go through a fuzz-tester; clang comes with one that's quite good and easy to use (https://llvm.org/docs/LibFuzzer.html), especially if used alongside address sanitizer or the undefined behavior sanitizer. Errors like the one I mentioned will most likely be found by a fuzzer really quickly. :)

discuss

MathMonkeyMan|4 months ago

Unspecified, really? cppreference's [C documentation][1] says that it returns zero. The [OpenGroup][2] documentation doesn't specify a return value when the conversion can't be performed. This recent [draft][3] of the ISO standard for C says that if the value cannot be represented (does that mean over/underflow, bad parse, both, neither?), then it's undefined behavior.

So three references give three different answers.

You could always use sscanf instead, which tells you how many values were scanned (e.g. zero or one).

[1]: https://en.cppreference.com/w/c/string/byte/atoi.html

[2]: https://pubs.opengroup.org/onlinepubs/9799919799/functions/a...

[3]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf

acidx|4 months ago

The Linux man page (https://man7.org/linux/man-pages/man3/atoi.3.html#VERSIONS) says that POSIX.1 leaves it unspecified. As you found out, it's really something that should be avoided as much as possible, because pretty much everywhere disagrees how it should behave, especially if you value portability.

sscanf() is not a good replacement either! It's better to use strtol() instead. Either do what Lwan does (https://github.com/lpereira/lwan/blob/master/src/lib/lwan-co...), or look (https://cvsweb.openbsd.org/src/lib/libc/stdlib/strtonum.c?re...) at how OpenBSD implemented strtonum(3).

For instance, if you try to parse a number that's preceded by a lot of spaces, sscanf() will take a long time going through it. I've been hit by that when fuzzing Lwan.

Even cURL is avoiding sscanf(): https://daniel.haxx.se/blog/2025/04/07/writing-c-for-curl/