In principle, I feel like "Ignore the language-provided standard libraries and use this little macro I wrote in all your code" is bad advice.
If the standard libs are that bad, the community should really lobby for the inclusion of a "safe" string copy into the C standard. Failing that, use a battle-tested third-party library like bstrlib[0].
The C standards committee is incredibly conservative, only adding new features if absolutely necessary. Unless the composition and goals of the committee drastically change, a new standard string API will never happen.
If you must write C, you should use a real string library, and not string.h.
strcpy_safe is a template function, not a macro. That immediately makes it orders of magnitude better than you were claiming.
I agree that a standard fix for this would be great. It's been five years since I wrote the post, the solution has been available for many years longer than that, and the standards committee has shown no interest in fixing this issue. In the interim I think it would be irresponsible for developers to continue using strncpy or any of the other unsafe options when strcpy_safe is so simple and effective.
What choice do you have? “Fix the standard” is ideal, but few programmers have the luxury of waiting a decade for the fix to happen and become available in all their target compilers.
Strings are so tiny an abstraction over plain memory that every C programmer ends up doing them their own way. And that can be a good thing. I already proposed a couple of ways to represent or muck with strings in my other comment.
Rather use safeclib. It is C11.
It has a safe version of the wrongly spec'ed truncating functions, in this case strncpy_s. This version guarantees NULL termination.
Two years after this post, the Linux kernel added strscpy, the api of which is equivalent to the safe strncpy in this post. Internally, it stops copying once it reaches the null terminator.
The article's proposed implementation uses a template to guarantee that egregious mistakes will fail to compile. This is clever, but it's C++. The kernel's version isn't quite as safe because it can overrun the destination. In principle it's impossible for it to be safe, because this is C, and neither the compiler nor the programmer can tell how big the destination is. We have to rely, as the article complains, on the programmer calling the function to know how big the buffer is.
I've had patches rejected from the Linux kernel that tried to switch away from strncpy in favor of strlcpy. The behavior that strncpy continues to zero out the rest of the destination in the case where size of source is less than destination was being relied upon to not leak uninitialized memory to userspace. Seems strscpy alone suffers the same.
strscpy seems like a nice interface. Is an implementation outside of the kernel available? Preferably one that is permissively-licensed?
Also, it doesn't appear to be specified what happens if the count argument is too large to be represented as a ssize_t. The destination buffer would have to be extremely large, so it probably doesn't happen in practice, but it'd be good to specify it, or at least explicitly state it's unspecified / undefined.
> strlcpy is designed to solve the null-termination problems – it always null-terminates. It’s certainly an improvement over strncpy, however it isn’t natively available in VC++.
Let's not forget that it's also not natively available in glibc. Though you'll find it in every BSD you can think of.
Like snprintf(3), the strlcpy() and strlcat() functions return the total
length of the string they tried to create. For strlcpy() that means the
length of src. For strlcat() that means the initial length of dst plus
the length of src.
So if you've mmaped in something like the OpenStreetMaps xml dump file and try to strlcpy out the first 100 bytes (because you didn't want to have to null terminate it yourself after a memcpy), you'll notice that your program is running rather slowly.
I feel like there are a lot of good alternatives to C. What we did on our team was get our C compiling under g++, and then refactor as C++14. Now string copies are trivial and safe. I'm sure it would have been even safer to switch to Rust or something, but no one in my company has any experience with that, while there were a few C++ gurus.
What are the good alternatives? You can either copy memory manually, or have a system like C++'s that causes lots of tiny allocations, which is very inefficient. Both in space and time. You can also opt for immutable strings to be able to share strings more conveniently and efficiently, at the cost of O(n^2) build time (where automatic optimizations fail) or decreased convenience (explicit string-stream aka string builder).
If efficiency is not a concern for you and you rather like the convenience, good for you.
Personally I feel plain const char pointers (plus length, depending on expected size) are good enough to emulate the immutable string type. For modifiable strings, I still prefer viewing them as chunks of memory, using char pointer + length + capacity. Nice thing is, you can easily go from the former to the latter in the lifetime of any one string. You can also pool many strings in large chunks if you build only one string at any one time.
I don’t understand why silently truncating strings is considered to be acceptable practice.
If you have the length of both, simply assert the length of the destination is <= the length of the source. Problem solved (bugs become obvious in testing).
The more I think about this, the more sense it makes to say you should check the length of the buffer before they copy. I'm not sure an assert is always right, exactly what should be done if the string doesn't fit may depend on what the program is doing. But silently truncating is generally not the right thing to do.
What about user input, strings coming from files, and in general the diversity of things that can go wrong when your product ships. This is particularly important if your product is attacked by those who would like to exploit it.
strcpy_s does this. It has a template variant to infer the length of the destination buffer and it will terminate the program if the source string is too large.
If the destination buffer doesn't have a fixed length then it will fail to compile.
This is similar to your suggestion with these improvements:
- Template deduction of the destination size is done - this is essential to avoid bugs
- Detection of overwrites is done in release builds as well as during testing
- Detection of overwrites is automatic - it doesn't require manually adding asserts to call sites
- strcpy_s exists in some standard libraries
Because the API is stupid and not intuitive. It's for memory buffers where the string is either NULL terminated or it is the length of the memory buffer. This saves an extra byte. yay?
I'd say "Stop using C and C++ already" or "Stop using null terminated strings already", but I recognize that those are not always practical solutions.
I do have some maintainability concerns though about the promulgation of numerous competing home-brew solutions to a near trivial problem throughout a code base.
Well, again, you have to assume that the caller called you right, i.e. that the pointer you were passed points to a buffer with exactly one zero in it, at the end. Furthermore, unless there's also a (potentially wrong) length argument, you have to read every byte of the destination buffer while you're copying and stop when you see a zero. There's a lot to be said for a reimplementation like bstrlib that wraps char arrays in structs and gives you replacements for most of the string functions, but then you still have all these C APIs to deal with that use char*. And you have the overhead of trying to do things right at runtime. Sibling comment mocks languages that do compile-time safety checking, but I totally see the appeal.
Of course, but everyone wants to overdo the complexity of the time worn solution. OMG you need to null terminate the string after all the _other_ gymnastics..gee C sure does suck! Why don't we use rust|go|c++ ad-nauseam.
bzero(buf,sz); /memset nazis here/
strncpy(buf,src,sz - 1);
See the bug? Of course not, because it's not visible, but I've seen this pattern used where dst is a pointer. This means that four or eight bytes (three or seven if you don't count the null terminator) gets copied, regardless of how many bytes dst points to. With strcpy_safe an attempt to copy to dst would not work and the developer would have to fix their code.
strncp is fine, however it was never meant for null-terminated strings! A safe alternative is using snprintf and checking the return value for truncation.
snprintf is hardly safe. It does guarantee null termination, so that is progress, but it requires the caller to pass the buffer size. As I mention in the "More typing means more errors" section of the article software developers are endlessly creative about ways to pass the wrong buffer size. They usually pass the right buffer size, but 99.9% correct still means a lot of bugs. These bugs can be avoided by using a template function to infer the destination buffer size. A template wrapper around snprintf would work very nicely.
[+] [-] AdmiralAsshat|7 years ago|reply
If the standard libs are that bad, the community should really lobby for the inclusion of a "safe" string copy into the C standard. Failing that, use a battle-tested third-party library like bstrlib[0].
[0]http://bstring.sourceforge.net/
[+] [-] pcwalton|7 years ago|reply
If you must write C, you should use a real string library, and not string.h.
[+] [-] brucedawson|7 years ago|reply
I agree that a standard fix for this would be great. It's been five years since I wrote the post, the solution has been available for many years longer than that, and the standards committee has shown no interest in fixing this issue. In the interim I think it would be irresponsible for developers to continue using strncpy or any of the other unsafe options when strcpy_safe is so simple and effective.
[+] [-] mikeash|7 years ago|reply
[+] [-] jstimpfle|7 years ago|reply
[+] [-] rurban|7 years ago|reply
https://rurban.github.io/safeclib/
[+] [-] lmm|7 years ago|reply
[+] [-] bitwize|7 years ago|reply
[+] [-] gp7|7 years ago|reply
https://lwn.net/Articles/659214/
[+] [-] sevensor|7 years ago|reply
[+] [-] ndesaulniers|7 years ago|reply
[+] [-] iamthad|7 years ago|reply
Also, it doesn't appear to be specified what happens if the count argument is too large to be represented as a ssize_t. The destination buffer would have to be extremely large, so it probably doesn't happen in practice, but it'd be good to specify it, or at least explicitly state it's unspecified / undefined.
https://www.kernel.org/doc/htmldocs/kernel-api/API-strscpy.h...
[+] [-] IChooseYou|7 years ago|reply
[+] [-] wmil|7 years ago|reply
"C strings and string functions aren't designed to be safe. If you want safe strings use a library."
[+] [-] beefhash|7 years ago|reply
Let's not forget that it's also not natively available in glibc. Though you'll find it in every BSD you can think of.
[+] [-] jandrese|7 years ago|reply
[+] [-] umvi|7 years ago|reply
[+] [-] jstimpfle|7 years ago|reply
If efficiency is not a concern for you and you rather like the convenience, good for you.
Personally I feel plain const char pointers (plus length, depending on expected size) are good enough to emulate the immutable string type. For modifiable strings, I still prefer viewing them as chunks of memory, using char pointer + length + capacity. Nice thing is, you can easily go from the former to the latter in the lifetime of any one string. You can also pool many strings in large chunks if you build only one string at any one time.
[+] [-] hedora|7 years ago|reply
If you have the length of both, simply assert the length of the destination is <= the length of the source. Problem solved (bugs become obvious in testing).
[+] [-] gweinberg|7 years ago|reply
[+] [-] brucedawson|7 years ago|reply
Asserts are necessary, but not sufficient.
[+] [-] brucedawson|7 years ago|reply
If the destination buffer doesn't have a fixed length then it will fail to compile.
This is similar to your suggestion with these improvements: - Template deduction of the destination size is done - this is essential to avoid bugs - Detection of overwrites is done in release builds as well as during testing - Detection of overwrites is automatic - it doesn't require manually adding asserts to call sites - strcpy_s exists in some standard libraries
[+] [-] russdill|7 years ago|reply
[+] [-] dang|7 years ago|reply
[+] [-] kodis|7 years ago|reply
I do have some maintainability concerns though about the promulgation of numerous competing home-brew solutions to a near trivial problem throughout a code base.
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] hzhou321|7 years ago|reply
[+] [-] sevensor|7 years ago|reply
[+] [-] needlepont|7 years ago|reply
[+] [-] mcguire|7 years ago|reply
How does this work if the size of the destination isn't known at compile time? I.e., it's not "buffer[5]”?
[+] [-] brucedawson|7 years ago|reply
And this is progress. I have fixed code that looked something like this:
See the bug? Of course not, because it's not visible, but I've seen this pattern used where dst is a pointer. This means that four or eight bytes (three or seven if you don't count the null terminator) gets copied, regardless of how many bytes dst points to. With strcpy_safe an attempt to copy to dst would not work and the developer would have to fix their code.Every mistake that can be made, will be made.
[+] [-] yarosv|7 years ago|reply
> The focus of this post is on fixed-size string buffers, but the technique applies to any type of fixed-length buffer.
[+] [-] ahoka|7 years ago|reply
[+] [-] brucedawson|7 years ago|reply
[+] [-] kevindqc|7 years ago|reply