top | item 35697869

Strlcpy and Strlcat – Consistent, Safe, String Copy and Concatenation (1999) [pdf]

43 points| todsacerdoti | 2 years ago |openbsd.org | reply

52 comments

order
[+] q3k|2 years ago|reply
A person smarter than me once told me: “Just use snprintf()”.

The tiny performance hit is usually worth the safety and simplicity. Only one error condition to check, covering both concatenation and copies, sensible 'size' argument behaviour (size includes null terminating byte, null terminating byte is always written)...

Just use snprintf().

[+] unwind|2 years ago|reply
I came here to post exactly that; I think using snprintf() is the proper solution to the "I need to build a string out of many parts in C" problem pretty much every time.

Especially since these functions' return values make it hard to use them to step forward in the buffer, so subsequent calls need to walk over the already-inserted characters again and again. That kind of redundant work makes me grit my teeth, which I guess is a sign that I'm a C programmer at heart. :)

Compare:

    char buf[1024];
    snprintf(buf, sizeof buf, "%s%s", "foo", "bar");
and

    char buf[1024];
    strcpy(buf, "foo");
    strlcat(buf, "bar", sizeof buf);
The latter is given a pointer to the base buffer, so it has to walk past the existing characters in order to find the end where the concatenation should start.

I also find the "size" arguments confusing and weird. I think it's idiomatic in C to (tightly) associate a size with a buffer pointer, and speaking in terms of "there are n bytes available at this location" makes sense to me. So the prototype really should have been

    size_t strlcat_unwind(char *buf, size_t buf_size, const char *src);
[+] attractivechaos|2 years ago|reply
At least in glibc, the printf family is much slower than strcpy. It is a huge performance penalty if string copy is frequent (e.g. when you incrementally write to a large string buffer).
[+] krylon|2 years ago|reply
As a bonus, if you pass NULL for the first argument, its return value tells you how much memory it would have used, so you can allocate a buffer that's the right size. Bigger performance hit, but in practice, I found the impact negligible in a codebase that doesn't do a lot of stringy stuff but needs to emit reasonable messages to logfiles etc.
[+] Someone|2 years ago|reply
> The tiny performance hit is usually worth the safety and simplicity

It need not even be a performance hit. The compiler could recognize the idiom of code such as

  snprintf(buffer, bufLen, "%s%s%s", s, t, u);
and optimize it into (example possibly buggy; my C is rusty)

    b = buffer;
    e = b + bufLen:
    while((b < e) && (*b++ = *s++));
    while((b < e) && (*b++ = *t++));
    while((b < e) && (*b++ = *u++));
    *b = 0;
[+] pixelbeat__|2 years ago|reply
There have been various attempts to improve on strcpy() etc. The current thinking is that strscpy() is the best general interface to this functionality. See the "improving ... strcpy" section at

https://www.pixelbeat.org/programming/gcc/string_buffers.htm...

[+] TonyTrapp|2 years ago|reply
The sad part is really that there are several implementation of safe(r) string manipulation functions, they all do mostly the same thing, but have different names and parameter orders, preventing interoperability between major compilers. They really need to get their shit together and finally agree on a single variant to be standardized.
[+] Denvercoder9|2 years ago|reply
One drawback of these functions is that the return value is the length of the source string. This is seldomly useful, as it's mostly a micro-optimization to avoid an extra string scan when using static buffers for small strings, but dynamic buffers for large ones. However, it prevents these functions from being used with input that might not be null-terminated, and causes performance to be bounded by the size of the source (which usually isn't controlled by the caller), instead of the destination (which usually is).
[+] ben_bai|2 years ago|reply
> it prevents these functions from being used with input that might not be null-terminated

Those are not valid C-strings! String functions are(have) "undefined behavior" for non string inputs!

[+] TheNewAndy|2 years ago|reply
I've never understood the circumstances where it is ok for a copy/concatenate operation to also include a bonus "truncate the result" operation at the end. I'm a believer that you should use strcpy and strcat, and also make sure your buffers are large enough. If you can't make them large enough, then you can't do the operation you are trying to do (and I would say that doing it incorrectly via truncation isn't obviously better since this seems like silent corruption to me)
[+] theamk|2 years ago|reply
Truncating strings is a practical alternative that provides useful signal with minimal extra error handling code. Yes, in the ideal world there will be no string truncation, but if there are no alternatives in a lot of cases truncation is better than nothing.

Which of those two error messages would you rather see?

    sprintf: output buffer is to small
or

    fopen: cannot open file /usr/bin/superlong-dist-path/208277409874874/fi
[+] tedunangst|2 years ago|reply
You will be surprised to learn that there were programmers who did not make the buffers large enough.
[+] dwattttt|2 years ago|reply
Someone will likely be able to answer more specifically, but I seem to recall an old common operation was filling fixed width text fields; truncating a message would make sense there (not ideal, but probably familiar).
[+] GuB-42|2 years ago|reply
Before using any function of the strcpy family, try to think about why you are using them.

The first question is: do I need a copy? Obviously, often, you do, but maybe you can avoid it. Copying data can be costly in terms of performance, and of course, without copying, no need to allocate memory or deal with buffers sizes.

Now, that you are sure that you indeed need a copy, maybe you should consider using memcpy() instead. If you already know the sizes of your strings, then it is a no-brainer. If you don't, maybe doing strlen()/strnlen() first on your input data is a good idea, and if it is too big, handle the error appropriately, and if it is not, then do a memcpy() now that you have the size.

The default behavior of "safe" variants of strcpy() it to truncate the string to the maximum size. It is certainly better than a buffer overflow, but it may not be what you want, just ask your ass

isstant.

[+] godshatter|2 years ago|reply
The standard C functions are enough of a mess mentally for me that I try to use them only for simple things.

If I'm dealing with lots of string manipulations I usually create a struct with a count and an alloc size and a char* pointer and write init, cleanup, extend, and append functions specific for it and use that as a string type. It's less of a mental load for me to deal with reallocating an array and keeping count and alloc updated correctly once than it is knowing that character arrays are being recounted all the time and worrying about exactly where the '\0' is going to end up and if my buffer is large enough for every string I'm dealing with.

[+] wongarsu|2 years ago|reply
Null-terminated strings have to be up there with null-pointers as design ideas that seemed great at the time, but turned out to be extremely costly because of the insane number of bugs they produce.

I wonder what an alternative world would look like where C had decided to use pascal-strings instead (prefixing the string with its length). We would undoubtedly have argued about the length of the size prefix, but a mistake there is trivial to spot compared to wrong application of str[nls]*cpy(_s)?.

[+] theamk|2 years ago|reply
I used to program in Pascal a lot, and its counted strings were one of the worst features.

First, you cannot really "argue" about length of the size prefix - it is a part of ABI/calling convention, and unless you have template-like mechanism and overloading/conventions, you will have to use whatever stdlib was compiled with.

And Borland Pascal's stdlib was compiled with 1 byte prefix, making max string length of 255 bytes. This made sense when total RAM was measured in hundreds of kilobytes, but generated so many bugs when a normally short string had to be just a bit longer.

I was somewhat envious of my C-using friends who could put entire help message into a single string. But not envious enough to switch to much slower C tooltchain :)

There were other reasons, too, like using less registers in assembly-level loops (old machines had much fewer registers than modern ones). Or not having to worry about validating strings when reading binary data from disk.

[+] tyingq|2 years ago|reply
That also would have made for a nice performance benefit passing to zero-copy type calls. And reduced complexity for new languages written in C implementing their own string types. Tcl was irritating when it was first invented, as it directly used C strings, and couldn't process binary data at all.
[+] pjmlp|2 years ago|reply
Specially given the alternatives outside Bell Labs in systems programming, since at least 1958 starting with JOVIAL.
[+] david2ndaccount|2 years ago|reply
pascal strings are terrible, the correct solution is to just always have pointer+length and even better to make that a built-in type for all types.
[+] knorker|2 years ago|reply
As other have said, this is no longer considered to be "true".

Yes, they make it harder to get memory corruption. But that's a very narrow definition of "safe".

If you tell the computer to copy a string, or append to a string, it's wrong, and not safe, to only do a partial copy or concatenation.

It's a huge footgun. Smaller than strcpy(), sure, but "safe" it is not.

I don't really have a better answer for C.

[+] pjmlp|2 years ago|reply
The better answer would be to add data types like SDS[0] to the standard library, and use them as ADTs (Abstract Data Types) [1].

Unfortunely WG14 has proven in 30 years of existence, that it isn't something that they care to fix, and while 3rd party solutions exist, without vocabulary types on the stardard library adoption will never take off.

[0] - https://github.com/antirez/sds

[1] - https://en.wikipedia.org/wiki/Abstract_data_type