Clang-11.0.0 Miscompiled SQLite

[+] klysm|5 years ago|reply

If I understand correctly, everything is working as intended: a fuzzer caught a bug in an unreleased version of clang. The title makes it sound like somebody fucked up pretty badly.

[+] iforgotpassword|5 years ago|reply

No, the point here is that the fuzzer "caught" a bug in sqlite. The blog post mostly clears things up. To me it seemed pretty neutral just explaining it was actually with clang. No snarkyness our anything.

[+] jfkebwjsbx|5 years ago|reply

What strikes me as odd is why SQLite is committing changes just to make an unreleased compiler happy...

[+] JDevlieghere|5 years ago|reply

Indeed, the commit hash reported in the version corresponds to something that landed on master at 11PM this Monday.

[+] fluffything|5 years ago|reply

Not really, at least, not given the current info.

Does anyone have a link to the reported clang bug?

For example, it might well be that the function that clang believes does not modify pmem->flags "promises" that it does not modify the value. In which case it might be an SQLite bug, that clang just did not happen to "correctly optimize" before.

This can happen, e.g., if you mark your function with __attribute__((const)), for example, which promises that your function does not access memory via arguments that have pointers. It then becomes trivial for clang to make the transformation above.

The bug wouldn't even need to be on the function definition, it might just be on the declaration, or on a declaration somewhere, in which case you have an ODR violation as well.

All that seems very unlikely since it appears to be clear that the function does modify memory, but still, I'd just wait for the clang bug to be filled with a reproducer, and the issue to be tracked down.

It does strongly hint at a bug in clang, but from the linked page, I at least cannot conclude that yet.

[+] globular-toast|5 years ago|reply

A fuzzer should be one of the last lines of defence against human errors. This was also caught by fuzzing sqlite, not the offending compiler clang. When all but the last lines of defence are breached it's absolutely cause for concern and shouldn't be brushed away like all is fine.

[+] schlupa|5 years ago|reply

It is a fuck up. Ignoring a sequence point (function call) is serious business. Probably a too agressive inlining.

[+] mehrdadn|5 years ago|reply

Question for any low-level optimizing compiler engineers here: I obviously realize these are all important, but in your judgment, how much of making an error-free compiler would you say is about having a comprehensive test suite, vs. having very careful software engineers, vs. having extremely thorough code reviews, vs. something else? Put another way, if you were to lose one of these, which ones do you think would have the most/least negative impact (or be the easiest/hardest to make up for with other things) in terms of the correctness of the final product?

[+] g_airborne|5 years ago|reply

Somewhat unrelated but I read somewhere that the Rust team test their releases by compiling the entirety of crates.io and all Rust GitHub repositories to find regressions. The tool is called Crater. Source: https://www.pietroalbini.org/blog/shipping-a-compiler-every-...

[+] neeeeees|5 years ago|reply

The test suite is ultra important. Even in the unlikely case that you have an engineer who thoroughly understands the machine and language specs, you have a ton of existing, useful, programs that rely on under-defined behaviour in one or both of them.

[+] ridiculous_fish|5 years ago|reply

These sorts of bugs are more common than you may think! Here’s gcc miscompiling fish shell: https://github.com/fish-shell/fish-shell/issues/6962

[+] gnulinux|5 years ago|reply

These sort of things get very tricky. Years ago in college me and my team was building an operating system and we were 100% sure we found a GCC bug. It turned out to be a gdb bug (we were disassembling with gdb) instead that made us think it was a GCC bug, but it was actually a bug in our code. Very subtle.

[+] unknown|5 years ago|reply

[deleted]

[+] mister_hn|5 years ago|reply

I don't understand trying to use the trunk version of a under-development compiler. Would you use the binaries built from an unstable compiler in production?

Given that here we talk about SQLite, what's the advantage to use clang 11 instead of a previous version?

[+] viraptor|5 years ago|reply

You're looking at the wrong side of sqlite usage. If you develop sqlite, why wouldn't you test it on every compiler/version combination you can get your hands on? Whether someone uses it in production is irrelevant.

[+] Thiez|5 years ago|reply

There was a chance this miscompilation would not have been found until after the release of clang 11. Surely both the clang and sqlite developers benefit from people trying pre-release versions and reporting bugs?

[+] skrebbel|5 years ago|reply

If I read the article right, the SQLite people don't use clang 11, but OSSFuzz does and someone ran it against SQLite and filed a bug. This bug turned out to not be a big in SQLite but in clang 11.

I'm not sure why OSSFuzz uses a prerelease compiler, but maybe it's purposefully for this reason, to add some extra surface area for finding compiler bugs?

[+] anoncareer0212|5 years ago|reply

there's 3 versions of this comment floating around and another 2 floating around asking why this wasn't discovered earlier

combine both to find zen koan, test early and often

[+] asveikau|5 years ago|reply

sqlite testing is pretty involved, and impressive. There are articles written on this, such as: https://www.sqlite.org/testing.html

[+] billti|5 years ago|reply

I believe Chromium uses the under-development trunk of Clang, and bumps it forward every couple weeks - and that’s probably one of the most heavily used code bases in the world.

https://chromium.googlesource.com/chromium/src.git/+/master/...

[+] fulafel|5 years ago|reply

OSS-Fuzz wants to get the new sanitizer features fresh from the oven I imagine to fing new bugs in fuzzed sw as early as possible.

[+] anarazel|5 years ago|reply

Not working in sqlite, but postgres. We've found numerous bugs in gcc, clang and postgres by building / testing with unreleased compilers. And found usability issues with new compiler warnings.

That doesn't mean we use such compilers in prod.

[+] zuzun|5 years ago|reply

The Google people do use them in production, it's the clang version they build Chromium with.

[+] adr_|5 years ago|reply

Clang 11 hasn't been released yet, right?

[+] loeg|5 years ago|reply

Right. But we've also observed non-determinism / undefined behavior in Clang 10:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246630#c26

  ==120363== Conditional jump or move depends on uninitialised value(s)
  ==120363==    at 0x1634474: llvm::ConstantExpr::getGetElementPtr(llvm::Type*, llvm::Constant*, llvm::ArrayRef<llvm::Value*>, bool, llvm::Optional<unsigned int>, llvm::Type*) (Constants.cpp:2191)
  ==120363==    by 0x112D6D9: getGetElementPtr (Constants.h:1163)
  ==120363==    by 0x112D6D9: (anonymous namespace)::SymbolicallyEvaluateGEP(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1005)
  ==120363==    by 0x112DF70: (anonymous namespace)::ConstantFoldInstOperandsImpl(llvm::Value const*, unsigned int, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1039)
  ==120363==    by 0x112C165: (anonymous namespace)::ConstantFoldConstantImpl(llvm::Constant const*, llvm::DataLayout const&, llvm::TargetLibraryInfo const*, llvm::SmallDenseMap<llvm::Constant*, llvm::Constant*, 4u, llvm::DenseMapInfo<llvm::Constant*>, llvm::detail::DenseMapPair<llvm::Constant*, llvm::Constant*> >&) [clone .part.0] (ConstantFolding.cpp:1114)
  ==120363==    by 0x112C5CF: llvm::ConstantFoldConstant(llvm::Constant const*, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1194)
  ==120363==    by 0x188F410: prepareICWorklistFromFunction (InstructionCombining.cpp:3584)
  ==120363==    by 0x188F410: combineInstructionsOverFunction(llvm::Function&, llvm::InstCombineWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::ProfileSummaryInfo*, unsigned int, llvm::LoopInfo*) (InstructionCombining.cpp:3703)
  ==120363==    by 0x189205F: runOnFunction (InstructionCombining.cpp:3789)
  ==120363==    by 0x189205F: llvm::InstructionCombiningPass::runOnFunction(llvm::Function&) (InstructionCombining.cpp:3768)
  ==120363==    by 0x16F4352: llvm::FPPassManager::runOnFunction(llvm::Function&) (LegacyPassManager.cpp:1482)
  ==120363==    by 0x16F4DE8: llvm::FPPassManager::runOnModule(llvm::Module&) (LegacyPassManager.cpp:1518)
  ==120363==    by 0x16F51A2: runOnModule (LegacyPassManager.cpp:1583)
  ==120363==    by 0x16F51A2: llvm::legacy::PassManagerImpl::run(llvm::Module&) (LegacyPassManager.cpp:1695)
  ==120363==    by 0x1FF4CFE: EmitAssembly (BackendUtil.cpp:954)
  ==120363==    by 0x1FF4CFE: clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (BackendUtil.cpp:1677)
  ==120363==    by 0x2C471A8: clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (CodeGenAction.cpp:335)
  ==120363==  Uninitialised value was created by a stack allocation
  ==120363==    at 0x112C653: (anonymous namespace)::SymbolicallyEvaluateGEP(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.c

[+] saagarjha|5 years ago|reply

Nope, 10.0 was just released recently.

[+] Twirrim|5 years ago|reply

Why is OSSFuzz using such a bleeding edge compiler? That seems a little nuts.

[+] aidenn0|5 years ago|reply

Given that it's SQLite, this is likely a compiler bug. However, the code given is insufficient to demonstrate a compiler bug. Given:

      c = pMem->flags;
      sqlite3VdbeMemRelease(pMem);
      pMem->flags = MEM_Str|MEM_Term|(c&(MEM_AffMask|MEM_Subtype));

You could have say:

    sqlite3VdbeMemRelease(struct foo pMem) {
        *(some_other_type *)&pMem->flags = bar;
    }

In which case the C aliasing rules would allow the compiler to assume that the assignment through "some_other_type" does not affect the assignment through whatever type pMem->flags is.

I have seen this bug happen before, when compilers got better at inlining, where it was something like

   void getAddressOfSomething(intptr_t *address);
   ...
   char *p;
   getAddressOfSomething((intptr_t *)&p)
   *p=foo

The compiler could reorder the \*p=foo line to be before the getAddressOfSomething call for the same reason.

TL;DR: Turn of strict aliasing via compiler flags (-fno-strict-aliasing on gcc) if you ever type-pun anywhere without using a union.

[+] skissane|5 years ago|reply

The code for sqlite3VdbeMemRelease can be found here: https://www.sqlite.org/src/file?name=src/vdbemem.c&ci=4218c7...

I don't see any type-punning of the flags going on. (The sqlite3VdbeMemRelease method doesn't touch the flags directly, but it calls vdbeMemClear which in turn calls vdbeMemClearExternAndSetNull which resets flags to MEM_Null.)

[+] chx|5 years ago|reply

> Turn of strict aliasing via compiler flags (-fno-strict-aliasing on gcc) if you ever type-pun anywhere without using a union.

That's the topic of Gary Bernhardt's (of JS WAT and Execute Program fame) very important "A Case Study in Not Being A Jerk in Open Source". https://www.destroyallsoftware.com/blog/2018/a-case-study-in...

[+] fctorial|5 years ago|reply

      c = pMem->flags;
      sqlite3VdbeMemRelease(pMem);
      pMem->flags = MEM_Str|MEM_Term|(c&(MEM_AffMask|MEM_Subtype));

'pMem->flags' is a u16[1]. Shouldn't it be copied to 'c'. How can 'sqlite3VdbeMemRelease' alter the value of 'c'.

[1]https://github.com/smparkes/sqlite/blob/8caf9219240123fbe6cf...

[+] bilalhusain|5 years ago|reply

'sqlite3VdbeMemRelease' doesn't alter the value of 'c' but may alter the value of 'pmem->flags' which is what the compiler replaced 'c' with.

from the post by Dr. Richard Hipp "From the -S output, it looks like Clang-11.0.0 is compiling these three lines as if there were written as:"

      sqlite3VdbeMemRelease(pMem);
      pMem->flags = MEM_Str|MEM_Term|(pMem->flags&(MEM_AffMask|MEM_Subtype));

[+] amake|5 years ago|reply

It doesn't. The compiler is moving the access to `pMem->flags` to come later in the sequence, after `pMem` has been modified by `sqlite3VdbeMemRelease`.

[+] outsomnia|5 years ago|reply

He said

>> Clang seems to be assuming that the sqlite3VdbeMemRelease() function does not change the value of pMem->flags.

[+] raverbashing|5 years ago|reply

You're right.

But for some reason, it's thinking it can just reread the flags from the structure after that function, which has a non-const pointer

void sqlite3VdbeMemRelease(Mem *p);

I'll be waiting for the excuses of the "C standard evangelists" and the extreme optimization people on how the SQLite developers are wrong on this case and how it's funny to pull the rug under them.

[+] aidenn0|5 years ago|reply

pMem is a pointer; the contents of pMem will not be copied.

[edit] I see what you're asking. The point is that the compiler changes:

      c = pMem->flags;
      sqlite3VdbeMemRelease(pMem);

To something equivalent to:

      sqlite3VdbeMemRelease(pMem);
      c = pMem->flags;

Which is correct only if sqlite3VdbMemRelease doesn't change pMem->flags

[+] eyegor|5 years ago|reply

Holy cow, this happened at -O1. This doesn't seem like the sort of optimization that should be possible at such a low level. I've run into plenty of trouble with higher level optimization flags in compilers before, but this is wild.

[+] unknown|5 years ago|reply

[deleted]

[+] pwagland|5 years ago|reply

I see several references in this thread to "a bug in Clang that is already fixed", but I can't see anywhere where anyone references this bug.

Can anyone point to the bug report (and/or fix) in Clang?

[+] danhor|5 years ago|reply

https://bugs.llvm.org/show_bug.cgi?id=46195, it's now linked in the sqlite bug

[+] nurettin|5 years ago|reply

Why would OSFuzz send the bug report to SQLite instead of Clang?

[+] zmodem|5 years ago|reply

OSSFuzz is an automated system.

[+] hoseja|5 years ago|reply

So... if I'm understanding correctly, you're accessing released memory? How is this anything but undefined behaviour?

[+] hoseja|5 years ago|reply

So... apparently I'm not understanding correctly haha.

[+] jabl|5 years ago|reply

If I understand correctly, sqlite3VdbeMemRelease doesn't free pMem nor pMem->flags, just some other fields in the pMem struct.

207 comments