Reasons to Prefer Blake3 over Sha256

[+] tptacek|2 years ago|reply

I'd probably use a Blake too. But:

SHA256 was based on SHA1 (which is weak). BLAKE was based on ChaCha20, which was based on Salsa20 (which are both strong).

NIST/NSA have repeatedly signaled lack of confidence in SHA256: first by hastily organising the SHA3 contest in the aftermath of Wang's break of SHA1

No: SHA2 lacks the structure the SHA1 attack relies on it (SHA1 has a linear message schedule, which made it possible to work out a differential cryptanalysis attack on it).

Blake's own authors keep saying SHA2 is secure (modulo length extension), but people keep writing stuff like this. Blake3 is a good and interesting choice on the real merits! It doesn't need the elbow throw.

[+] pbsd|2 years ago|reply

While there is more confidence now on the security of SHA-2, or rather the lack of transference of the SHA-1 approach to SHA-2, this was not the case in 2005-2006 when NIST decided to hold the SHA-3 competition. See for example the report on Session 4 of the 2005 NIST workshop on hash functions [1].

[1] https://csrc.nist.gov/events/2005/first-cryptographic-hash-w...

[+] honzaik|2 years ago|reply

Also, the NSA is currently recommending to change SHA3/Keccak inside Dilithium and Kyber into SHA2-based primitives... https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/SPTp...

[+] pclmulqdq|2 years ago|reply

Most people who publicly opine on the Blake vs. SHA2 debate seem to be relatively uninformed on the realities of each one. SHA2 and the Blakes are both usually considered to be secure.

The performance arguments most people make are also outdated or specious: the original comparisons of Blake vs SHA2 performance on CPUs were largely done before Intel and AMD had special SHA2 instructions.

[+] ianopolous|2 years ago|reply

Would be interesting to hear Zooko's response to this. (Peergos lead here)

[+] omginternets|2 years ago|reply

What do you mean by "weak" and "strong", here?

[+] unknown|2 years ago|reply

[deleted]

[+] gavinhoward|2 years ago|reply

Good, terse article that basically reinforces everything I've seen in my research about cryptographic hashing.

Context: I'm building a VCS meant for any size of file, including massive ones. It needs a cryptographic hash for the Merkle Tree.

I've chosen BLAKE3, and I'm going to use the original implementation because of its speed.

However, I'm going to make it easy to change hash algorithms per commit, just so I don't run into the case that Git had trying to get rid of SHA1.

[+] AdamN|2 years ago|reply

Smart idea doing the hash choice per-commit. Just make sure that somebody putting in an obscure hash doesn't mess up everybody's usage of the repo if they don't have a library to evaluate that hash installed.

[+] tromp|2 years ago|reply

For short inputs, Blake3 behaves very similar to Blake2, on which it is based. From Blake's wikipedia page [1]:

BLAKE3 is a single algorithm with many desirable features (parallelism, XOF, KDF, PRF and MAC), in contrast to BLAKE and BLAKE2, which are algorithm families with multiple variants. BLAKE3 has a binary tree structure, so it supports a practically unlimited degree of parallelism (both SIMD and multithreading) given long enough input.

[1] https://en.wikipedia.org/wiki/BLAKE_(hash_function)

[+] cesarb|2 years ago|reply

While I really like Blake3, for all reasons mentioned in this article, I have to say it does have one tiny disadvantage over older hashes like SHA-256: its internal state is slightly bigger (due to the tree structure which allows it to be highly parallelizable). This can matter when running on tiny microcontrollers with only a few kilobytes of memory.

[+] londons_explore|2 years ago|reply

The internal state is no bigger when hashing small things though right?

I assume most microcontrollers are unlikely to be hashing things much bigger than RAM.

[+] Retr0id|2 years ago|reply

Blake3 is a clear winner for large inputs.

However, for smaller inputs (~1024 bytes and down), the performance gap between it and everything else (blake2, sha256) gets much narrower, because you don't get to benefit from the structural parallelization.

If you're mostly dealing with small inputs, raw hash throughput is probably not high on your list of concerns - In the context of a protocol or application, other costs like IO latency probably completely dwarf the actual CPU time spent hashing.

If raw performance is no longer high on your list of priorities, you care more about the other things - ubiquitous and battle-tested library support (blake3 is still pretty bleeding-edge, in the grand scheme of things), FIPS compliance (sha256), greater on-paper security margin (blake2). Which is all to say, while blake3 is great, there are still plenty of reasons not to prefer it for a particular use-case.

[+] zahllos|2 years ago|reply

I agree that if you can, BLAKE3 (or even BLAKE2) are nicer choices than SHA2. However I would like to add the following comments:

* SHA-2 fixes the problems with SHA-1. SHA-1 was a step up over SHA-0 that did not completely resolve flaws in SHA-0's design (SHA-0 was broken very quickly).

* JP Aumasson (one of the B3 authors) has said publicly a few times SHA-2 will never be broken: https://news.ycombinator.com/item?id=13733069 is an indirect source, can't seem to locate a direct one from Xitter (thanks Elon).

Thus it does not necessarily follow that SHA-2 is a bad choice because SHA-1 is broken.

[+] gavinhoward|2 years ago|reply

All that may be true.

However, I don't think we can say for sure if SHA2 will be broken. Cryptography is hard like that.

In addition, SHA2 is still vulnerable to length extension attacks, so in a sense, SHA2 is broken, at least when length extension attacks are part of the threat model.

[+] EdSchouten|2 years ago|reply

What I dislike about BLAKE3 is that they added explicit logic to ensure that identical chunks stored at different offsets result in different Merkle tree nodes (a.k.a. the ‘chunk counter’).

Though this feature is well intended, it makes this hash function hard to use for a storage system where you try to do aggressive data deduplication.

Furthermore, on platforms that provide native instructions for SHA hashing, BLAKE3 isn’t necessarily faster, and certainly more power hungry.

[+] oconnor663|2 years ago|reply

We go over some of our reasoning around that in section 7.5 of https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blak.... An early BLAKE3 prototype actually didn't include the chunk counter (https://github.com/oconnor663/bao/blob/master/docs/spec_0.9....), so I'm definitely sympathetic to the use cases that wish it wasn't there. However, after publication we found out that something like a chunk counter is necessary for the security of the Bao streaming verification tool: https://github.com/oconnor663/bao/issues/41. It could be that there's a design that's the best of both worlds, but I'm not sure.

[+] lazide|2 years ago|reply

Huh?

The storage system doing this wouldn’t use that part of the hash, it would do it itself so no issues? (Hash chunks, instead of feeding everything in linearly)

Otherwise the hash isn’t going to be even remotely safe for most inputs?

[+] jasonwatkinspdx|2 years ago|reply

Answer: identify chunks via something like rsyncs rolling window or GearHash, then name those chunks by Blake3.

Trying to use Blake3's tree structure directly to dedupe is a misunderstanding of the problem you're trying to solve. Removing the counter would not let you use Blake3 alone for this purpose.

[+] persnickety|2 years ago|reply

Could you point to how this is implemented and how it can be used? From the sound of it, you're trying to do something like rsync's running-window comparison?

[+] ndsipa_pomu|2 years ago|reply

At this rate, it's going to take over 700 years before we get Blake's 7

[+] benj111|2 years ago|reply

I had to scroll disappointingly far down to get to the Blake's 7 reference.

Thank you for not disappointing though.

The down side of that algorithm though is that everything dies at the end.

[+] nayuki|2 years ago|reply

It's an interesting set of reasons, but I prefer Keccak/SHA-3 over SHA-256, SHA-512, and BLAKE. I trust the standards body and public competition and auditing that took place - more so than a single author trumpeting the virtues of BLAKE.

[+] jasonwatkinspdx|2 years ago|reply

Ironic, because the final NIST report explaining their choice mentions that BLAKE has more open examination of cryptanalysis than Keccak as a point in favor of BLAKE.

[+] stylepoints|2 years ago|reply

Until it starts coming installed by default on Linux and other mojor OS's, it won't be mainstream.

[+] theamk|2 years ago|reply

Python 3.11 will have it https://bugs.python.org/issue39298

[+] sylvain_kerkour|2 years ago|reply

At the end of the day, what really matters for most people is

1) Certifications (FIPS...)

2) Speed.

SHA-256 is fast enough for maybe 99,9% of use cases as you will saturate your I/O way before SHA-256 becomes your bottleneck[0][1]. Also, from my experience with the different available implementations, SHA-256 is up to 1.8 times faster than Blake3 on arm64.

[0] https://github.com/skerkour/go-benchmarks/blob/main/results/...

[1] https://kerkour.com/fast-hashing-algorithms

[+] oconnor663|2 years ago|reply

I mostly agree with you, but there are a couple other bullet points I like to throw in the mix:

- Length extension attacks. I think all of the SHA-3 candidates did the right thing here, and we would never accept a new cryptographic hash function that didn't do the right thing here, but SHA-2 gets a pass for legacy reasons. That's understandable, but we need to replace it eventually.

- Kind of niche, but BLAKE3 supports incremental verification, i.e. checking the hash of a file while you stream it rather learning whether it was valid at the end of the stream. https://github.com/oconnor663/bao. That's useful if you know the hash of a file but you don't necessarily trust the service that's storing it.

[+] jandrewrogers|2 years ago|reply

I think SHA-256 is still marginal for speed in modern environments unless your I/O is unusually limited relative to CPU. Current servers can support 10s of GB/s combined throughput for network and storage, which is achievable in practice for quite a few workloads. Consequently, you have to plan for the CPU overhead of the crypto at the same GB/s throughput since it is usually applied at the I/O boundaries. The fact that SHA256 requires burning the equivalent of several more cores relative to Blake3 has been a driver in Blake3 anecdotally creeping into a lot of data infrastructure code lately. At these data rates, the differences in performance of the hash functions is not a trivial cost in the cases where you would use a hash function (instead of e.g. authenticated encryption).

The arm64 server case is less of a concern for other reasons. Those cores are significantly weaker than amd64 cores, and therefore tend to not be used for data-intensive processing regardless. This allows you to overfit for AVX-512 or possibly use SHA256 on arm64 builds depending on the app.

There is a strong appetite for as much hashing performance per core as possible for data-intensive processing because it consumes a significant percentage of the total CPU time in many cases. Due to the rapid growing scale, non-cryptographic hash functions are no longer fit for purpose much of the time.

[+] jrockway|2 years ago|reply

Fast hash functions are really important, and SHA256 is really slow. Switching the hash function where you can is enough to result in user-visible speedups for common hashing use cases; verifying build artifacts, seeing if on-disk files changed, etc. I was writing something to produce OCI container images a few months ago, and the 3x SHA256 required by the spec for layers actually takes on the order of seconds. (.5s to sha256 a 50MB file, on my 2019-era Threadripper!) I was shocked to discover this. (gzip is also very slow, like shockingly slow, but fortunately the OCI spec lets you use Zstd, which is significantly faster.)

[+] adrian_b|2 years ago|reply

SHA256 is very fast on most modern CPUs, i.e. all AMD Zen, all Intel Atom since 2016, Intel Core Ice Lake or newer, Armv8 and Armv9.

I use every day both SHA-256 and BLAKE3. BLAKE3 is faster only because it is computed by multiple threads using all available CPU cores. When restricted to a single thread, it is slower on CPUs with fast hardware SHA-256.

The extra speed of BLAKE3 is not always desirable. The fact that it uses all cores can slow down other concurrent activities, without decreasing the overall execution time of the application.

There are cases when the computation of a hash like SHA-256 can be done as a background concurrent activity, or when the speed of hashing is limited by the streaming speed of data from the main memory or from a SSD, so spawning multiple threads does not gain anything and it only gets in the way of other activities.

So the right choice between SHA-256 and BLAKE3 depends on the application. Both can be useful. SHA-256 has the additional advantage that it needs negligible additional code, only a few lines are necessary to write a loop that computes the hash using the hardware instructions.

[+] coppsilgold|2 years ago|reply

sha256 is not slow on modern hardware. openssl doesn't have blake3, but here is blake2:

    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    BLAKE2s256        75697.37k   308777.40k   479373.40k   567875.81k   592687.09k   591254.18k
    BLAKE2b512        63478.11k   243125.73k   671822.08k   922093.51k  1047833.51k  1048959.57k
    sha256           129376.82k   416316.32k  1041909.33k  1664480.49k  2018678.67k  2043838.46k

This is with the x86 sha256 instructions: sha256msg1, sha256msg2, sha256rnds2

[+] richardwhiuk|2 years ago|reply

If you want a fast hash function (and don't care about it's cryptographic properties), don't use a cryptographic hash function.

[+] dragontamer|2 years ago|reply

> BLAKE3 is much more efficient (in time and energy) than SHA256, like 14 times as efficient in typical use cases on typical platforms.

[snip]

> AVX in Intel/AMD, Neon and Scalable Vector Extensions in Arm, and RISC-V Vector computing in RISC-V. BLAKE3 can take advantage of all of it.

Uh huh... AVX/x86 and NEON/ARM you say?

https://www.felixcloutier.com/x86/sha256rnds2

https://developer.arm.com/documentation/ddi0596/2021-12/SIMD...

If we're talking about vectorized instruction sets like AVX (Intel/AMD) or NEON (aka: ARM), the advantage is clearly with SHA256. I don't think Blake3 has any hardware implementation at all yet.

Your typical cell phone running ARMv8 / NEON will be more efficient with the SHA256 instructions than whatever software routine you need to run Blake3. Dedicated hardware inside the cores is very difficult to beat on execution speed or efficiency.

I admit that I haven't run any benchmarks on my own. But I'd be very surprised if any software routine were comparable to the dedicated SHA256 instructions found on modern cores.

[+] eatonphil|2 years ago|reply

From another thread:

> On my machine with sha extensions, blake3 is about 15% faster (single threaded in both cases) than sha256.

https://news.ycombinator.com/item?id=22237387

[+] insanitybit|2 years ago|reply

> I don't think Blake3 has any hardware implementation at all yet.

> https://github.com/BLAKE3-team/BLAKE3

> The blake3 Rust crate, which includes optimized implementations for SSE2, SSE4.1, AVX2, AVX-512, and NEON, with automatic runtime CPU feature detection on x86. The rayon feature provides multithreading.

There aren't blake3 instructions, like some hardware has for SHA1, but it does use hardware acceleration.

edit: Re-reading, I think you're saying "If we're going to talk about hardware acceleration, SHA1 still has the advantage because of specific instructions" - that is true.

[+] jonhohle|2 years ago|reply

I just tested the C implementation on a utility I wrote[0] and at least on macOS where SHA256 is hardware accelerated beyond just NEON, BLAKE3 ends up being slower than SHA256 from CommonCrypto (the Apple provided crypto library). BLAKE3 ends up being 5-10% slower for the same input set.

As far as I'm aware, Apple does not expose any of the hardware crypto functions, so unless what exists supports BLAKE3 and they add support in CommonCrypto, there's no advantage to using it from a performance perspective.

The rust implementation is multithreaded and ends up beating SHA256 handily, but again, for my use case the C impl is only single threaded, and the utility assumes a single threaded hasher with one running on each core. Hashing is the bottleneck for `dedup`, so finding a faster hasher would have a lot of benefits.

0 - https://github.com/ttkb-oss/dedup

[+] Godel_unicode|2 years ago|reply

I don’t understand why people use sha256 when sha512 is often significantly faster:

https://crypto.stackexchange.com/questions/26336/sha-512-fas...

[+] oconnor663|2 years ago|reply

A couple reasons just on the performance side:

- SHA-256 has hardware acceleration on many platforms, but SHA-512 mostly doesn't.

- Setting aside hardware acceleration, SHA-256 is faster on 32-bit platforms, like a lot of embedded devices. If you have to choose between "fast on a desktop" vs "fast in embedded", it can make sense to assume that desktops are always fast enough and that your bottlenecks will be in embedded.

[+] garblegarble|2 years ago|reply

This may only be applicable to certain CPUs - e.g. sha512 is a lot slower on M1

    $ openssl speed sha256 sha512
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    sha256          146206.63k   529723.90k  1347842.65k  2051092.82k  2409324.54k  2446518.95k
    sha512           85705.68k   331953.22k   707320.92k  1149420.20k  1406851.34k  1427259.39k

[+] ur-whale|2 years ago|reply

One metric that is seldom mentioned for crypto algos is code complexity.

I really wish researchers would at least pay lip service to it.

TEA (an unfortunately somewhat weak symmetric cipher) was a very nice push in that direction.

TweetNaCl was another very nice push in that direction by djb

Why care about that metric you ask?

Well here are a couple of reasons: - algo fits in head - algo is short -> cryptanalysis likely easier - algo is short -> less likely to have buggy implementation - algo is short -> side-channel attacks likely easier to analyse - algo fits in a 100 line c++ header -> can be incorporated into anything - algo can be printed on a t-shirt, thereby skirting export control restrictions - algo can easily be implemented on tiny micro-controllers

etc ...

[+] oconnor663|2 years ago|reply

We put a lot of effort into section 5.1.2 of https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blak..., and the complicated part of BLAKE3 (incrementally building the Merkle tree) ends up being ~4 lines of code. Let me know what you think.

[+] rstuart4133|2 years ago|reply

> One metric that is seldom mentioned for crypto algos is code complexity. ... TEA (an unfortunately somewhat weak symmetric cipher) was a very nice push in that direction.

Spec is also a push in that direction [0]. It's code looks to be as complex as TEA's (1/2 a page of C), blindingly fast, yet as far I know has no known attacks despite being subject to a fair bit of scrutiny. About the only reason I can see for it not being largely ignored is it was designed by NSA.

SHA3 is also a simple algorithm. Downright pretty, in fact. It's a pity it's so slow.

[0] https://en.wikipedia.org/wiki/Speck_(cipher)

[+] LegibleCrimson|2 years ago|reply

How does the extended output work, and what's the point of extended output?

From what I can see, BLAKE3 has 256 bits of security, and extended output doesn't provide any extra security. In this case, what's the point of extended output over doing something like padding with 0-bits or extending by re-hashing the previous output and appending it to the previous output (eg, for 1024 bits, doing h(m) . h(h(m)) . h(h(h(m))) . h(h(h(h(m))))). Either way, you get 256 bits of security.

Is it just because the design of the hash makes it simple to do, so it's just offered as a consistent option for arbitrary output sizes where needed, or is there some greater purpose that I'm missing?

[+] oconnor663|2 years ago|reply

> From what I can see, BLAKE3 has 256 bits of security, and extended output doesn't provide any extra security.

128 bits of collision resistance but otherwise correct. As a result of that we usually just call it 128 bits across the board, but yes in an HMAC-like use case you would generally expect 256 bits of security from the 256 bit output. Extended outputs don't change that, because the internal chaining values are 256 bits even when the output is larger.

> extending by re-hashing the previous output and appending it to the previous output

It's not quite that simple, because you don't want later parts of your output to be predictable from earlier parts (which might be published, depending on the use case). You also want it to be parallelizable.

You could compute H(m) as a "pre-hash" and then make an extended output something like H(H(m)|1)|H(H(m)|2)|... That's basically what BLAKE3 is doing in the inside. The advantage of having the algorithm do it for you is that 1) it's an "off the shelf" feature that doesn't require users to roll their own crypto and 2) it's slightly faster when the input is short, because you don't have to spend an extra block operation computing the pre-hash.

> what's the point of extended output?

It's kind of niche, but for example Ed25519 needs a 512 bit hash output internally to "stretch" its secret seed into two 256-bit keys. You could also use a BLAKE3 output reader as a stream cipher or a random byte generator. (These sorts of use cases are why it's nice not to make the caller tell you the output length in advance.)

[+] unknown|2 years ago|reply

[deleted]

[+] aborsy|2 years ago|reply

When it’s said SHA2 will remain secure in foreseeable future, are there estimates on the number of decades?

The quantum computers apparently don’t help much with hash attacks, and SHA2 has received a lot of cryptanalysis.

[+] colmmacc|2 years ago|reply

It's very hard to see Blake3 getting included in FIPS. Meanwhile, SHA256 is. That's probably the biggest deciding factor on whether you want to use it or not.

[+] vluft|2 years ago|reply

I dunno, if your crypto choices were just "the best thing that won't be included in FIPS" you would do pretty well; blake3, chacha20, 25519 sigs & dh...

[+] footlose_3815|2 years ago|reply

I replaced sha with blake in my deduplication needs, and it sped up the comparisons by a factor of 4 at least.

For my use case, it is great

119 comments