If I understand correctly, everything is working as intended: a fuzzer caught a bug in an unreleased version of clang. The title makes it sound like somebody fucked up pretty badly.
No, the point here is that the fuzzer "caught" a bug in sqlite. The blog post mostly clears things up. To me it seemed pretty neutral just explaining it was actually with clang. No snarkyness our anything.
Does anyone have a link to the reported clang bug?
For example, it might well be that the function that clang believes does not modify pmem->flags "promises" that it does not modify the value. In which case it might be an SQLite bug, that clang just did not happen to "correctly optimize" before.
This can happen, e.g., if you mark your function with __attribute__((const)), for example, which promises that your function does not access memory via arguments that have pointers. It then becomes trivial for clang to make the transformation above.
The bug wouldn't even need to be on the function definition, it might just be on the declaration, or on a declaration somewhere, in which case you have an ODR violation as well.
All that seems very unlikely since it appears to be clear that the function does modify memory, but still, I'd just wait for the clang bug to be filled with a reproducer, and the issue to be tracked down.
It does strongly hint at a bug in clang, but from the linked page, I at least cannot conclude that yet.
A fuzzer should be one of the last lines of defence against human errors. This was also caught by fuzzing sqlite, not the offending compiler clang. When all but the last lines of defence are breached it's absolutely cause for concern and shouldn't be brushed away like all is fine.
Question for any low-level optimizing compiler engineers here: I obviously realize these are all important, but in your judgment, how much of making an error-free compiler would you say is about having a comprehensive test suite, vs. having very careful software engineers, vs. having extremely thorough code reviews, vs. something else? Put another way, if you were to lose one of these, which ones do you think would have the most/least negative impact (or be the easiest/hardest to make up for with other things) in terms of the correctness of the final product?
Somewhat unrelated but I read somewhere that the Rust team test their releases by compiling the entirety of crates.io and all Rust GitHub repositories to find regressions. The tool is called Crater. Source: https://www.pietroalbini.org/blog/shipping-a-compiler-every-...
The test suite is ultra important. Even in the unlikely case that you have an engineer who thoroughly understands the machine and language specs, you have a ton of existing, useful, programs that rely on under-defined behaviour in one or both of them.
These sort of things get very tricky. Years ago in college me and my team was building an operating system and we were 100% sure we found a GCC bug. It turned out to be a gdb bug (we were disassembling with gdb) instead that made us think it was a GCC bug, but it was actually a bug in our code. Very subtle.
I don't understand trying to use the trunk version of a under-development compiler. Would you use the binaries built from an unstable compiler in production?
Given that here we talk about SQLite, what's the advantage to use clang 11 instead of a previous version?
You're looking at the wrong side of sqlite usage. If you develop sqlite, why wouldn't you test it on every compiler/version combination you can get your hands on? Whether someone uses it in production is irrelevant.
There was a chance this miscompilation would not have been found until after the release of clang 11. Surely both the clang and sqlite developers benefit from people trying pre-release versions and reporting bugs?
If I read the article right, the SQLite people don't use clang 11, but OSSFuzz does and someone ran it against SQLite and filed a bug. This bug turned out to not be a big in SQLite but in clang 11.
I'm not sure why OSSFuzz uses a prerelease compiler, but maybe it's purposefully for this reason, to add some extra surface area for finding compiler bugs?
I believe Chromium uses the under-development trunk of Clang, and bumps it forward every couple weeks - and that’s probably one of the most heavily used code bases in the world.
Not working in sqlite, but postgres. We've found numerous bugs in gcc, clang and postgres by building / testing with unreleased compilers. And found usability issues with new compiler warnings.
In which case the C aliasing rules would allow the compiler to assume that the assignment through "some_other_type" does not affect the assignment through whatever type pMem->flags is.
I have seen this bug happen before, when compilers got better at inlining, where it was something like
I don't see any type-punning of the flags going on. (The sqlite3VdbeMemRelease method doesn't touch the flags directly, but it calls vdbeMemClear which in turn calls vdbeMemClearExternAndSetNull which resets flags to MEM_Null.)
It doesn't. The compiler is moving the access to `pMem->flags` to come later in the sequence, after `pMem` has been modified by `sqlite3VdbeMemRelease`.
But for some reason, it's thinking it can just reread the flags from the structure after that function, which has a non-const pointer
void sqlite3VdbeMemRelease(Mem *p);
I'll be waiting for the excuses of the "C standard evangelists" and the extreme optimization people on how the SQLite developers are wrong on this case and how it's funny to pull the rug under them.
Holy cow, this happened at -O1. This doesn't seem like the sort of optimization that should be possible at such a low level. I've run into plenty of trouble with higher level optimization flags in compilers before, but this is wild.
[+] [-] klysm|5 years ago|reply
[+] [-] iforgotpassword|5 years ago|reply
[+] [-] jfkebwjsbx|5 years ago|reply
[+] [-] JDevlieghere|5 years ago|reply
[+] [-] fluffything|5 years ago|reply
Does anyone have a link to the reported clang bug?
For example, it might well be that the function that clang believes does not modify pmem->flags "promises" that it does not modify the value. In which case it might be an SQLite bug, that clang just did not happen to "correctly optimize" before.
This can happen, e.g., if you mark your function with __attribute__((const)), for example, which promises that your function does not access memory via arguments that have pointers. It then becomes trivial for clang to make the transformation above.
The bug wouldn't even need to be on the function definition, it might just be on the declaration, or on a declaration somewhere, in which case you have an ODR violation as well.
All that seems very unlikely since it appears to be clear that the function does modify memory, but still, I'd just wait for the clang bug to be filled with a reproducer, and the issue to be tracked down.
It does strongly hint at a bug in clang, but from the linked page, I at least cannot conclude that yet.
[+] [-] globular-toast|5 years ago|reply
[+] [-] schlupa|5 years ago|reply
[+] [-] mehrdadn|5 years ago|reply
[+] [-] g_airborne|5 years ago|reply
[+] [-] neeeeees|5 years ago|reply
[+] [-] ridiculous_fish|5 years ago|reply
[+] [-] gnulinux|5 years ago|reply
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] mister_hn|5 years ago|reply
Given that here we talk about SQLite, what's the advantage to use clang 11 instead of a previous version?
[+] [-] viraptor|5 years ago|reply
[+] [-] Thiez|5 years ago|reply
[+] [-] skrebbel|5 years ago|reply
I'm not sure why OSSFuzz uses a prerelease compiler, but maybe it's purposefully for this reason, to add some extra surface area for finding compiler bugs?
[+] [-] anoncareer0212|5 years ago|reply
combine both to find zen koan, test early and often
[+] [-] asveikau|5 years ago|reply
[+] [-] billti|5 years ago|reply
https://chromium.googlesource.com/chromium/src.git/+/master/...
[+] [-] fulafel|5 years ago|reply
[+] [-] anarazel|5 years ago|reply
That doesn't mean we use such compilers in prod.
[+] [-] zuzun|5 years ago|reply
[+] [-] adr_|5 years ago|reply
[+] [-] loeg|5 years ago|reply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246630#c26
[+] [-] saagarjha|5 years ago|reply
[+] [-] Twirrim|5 years ago|reply
[+] [-] aidenn0|5 years ago|reply
I have seen this bug happen before, when compilers got better at inlining, where it was something like
The compiler could reorder the \*p=foo line to be before the getAddressOfSomething call for the same reason.TL;DR: Turn of strict aliasing via compiler flags (-fno-strict-aliasing on gcc) if you ever type-pun anywhere without using a union.
[+] [-] skissane|5 years ago|reply
I don't see any type-punning of the flags going on. (The sqlite3VdbeMemRelease method doesn't touch the flags directly, but it calls vdbeMemClear which in turn calls vdbeMemClearExternAndSetNull which resets flags to MEM_Null.)
[+] [-] chx|5 years ago|reply
That's the topic of Gary Bernhardt's (of JS WAT and Execute Program fame) very important "A Case Study in Not Being A Jerk in Open Source". https://www.destroyallsoftware.com/blog/2018/a-case-study-in...
[+] [-] fctorial|5 years ago|reply
[1]https://github.com/smparkes/sqlite/blob/8caf9219240123fbe6cf...
[+] [-] bilalhusain|5 years ago|reply
from the post by Dr. Richard Hipp "From the -S output, it looks like Clang-11.0.0 is compiling these three lines as if there were written as:"
[+] [-] amake|5 years ago|reply
[+] [-] outsomnia|5 years ago|reply
>> Clang seems to be assuming that the sqlite3VdbeMemRelease() function does not change the value of pMem->flags.
[+] [-] raverbashing|5 years ago|reply
But for some reason, it's thinking it can just reread the flags from the structure after that function, which has a non-const pointer
void sqlite3VdbeMemRelease(Mem *p);
I'll be waiting for the excuses of the "C standard evangelists" and the extreme optimization people on how the SQLite developers are wrong on this case and how it's funny to pull the rug under them.
[+] [-] aidenn0|5 years ago|reply
[edit] I see what you're asking. The point is that the compiler changes:
To something equivalent to: Which is correct only if sqlite3VdbMemRelease doesn't change pMem->flags[+] [-] eyegor|5 years ago|reply
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] pwagland|5 years ago|reply
Can anyone point to the bug report (and/or fix) in Clang?
[+] [-] danhor|5 years ago|reply
[+] [-] nurettin|5 years ago|reply
[+] [-] zmodem|5 years ago|reply
[+] [-] hoseja|5 years ago|reply
[+] [-] hoseja|5 years ago|reply
[+] [-] jabl|5 years ago|reply