top | item 44668114

(no title)

janwas | 7 months ago

Highway author here :) I'm curious what you disagree with, because it all sounds very sensible to me?

discuss

order

jesse__|7 months ago

There's a lot to discuss.

First off, a number of statements are nonsense. Take, for example

> you shouldn't be writing SIMD instructions directly unless you're writing a SIMD library or an optimizing compiler.

Why would writing an optimizing compiler qualify as territory for directly writing SIMD code, but anything else is off the table? That makes no sense at all.

Furthermore, I was writing a library. It's just embedded in my game engine.

> Instead you should reach for one of the many available libraries

This blanket statement is only true in a narrow set of circumstances. In my mind, it requires that you ship on multiple architectures and probably multiple compilers. If you have narrower constraints, it's extremely easy to write your own wrappers (like I did) and not take a dependency. A good trade IMO. Furthermore, someone's got to write the libraries, so doing it yourself as a learning exercise has value.

> There are loads of libraries like this [...] and provide targeting for a vast trove of SIMD options without hand-writing for every option.

The original commentor seems to be under the impression that using a SIMD library would somehow have produced a better result. The fact is, the library code is super fucking boring. I barely mentioned it in the article because it's basically just boilerplate an LLM could probably spit out, first try. The interesting part of the series is the observation that you can precompute a matrix of intermediates and look them up, instead of recomputing them in the hot loop, effectively trading memory bandwidth for less instructions. A good trade for this algorithm, which saturates the instruction pipelines.

The thing the original commentor does get right is the notion that thinking about data layout is important. But, that has nothing to do with the library you're using .. you just have to do it. They seem to be conflating the use of a library with the act of writing wide code, as if you can't do one without the other, which is obviously false.

> I was going to quickly rewrite the example in Highway ..

Right. I'll believe this when I see it.

I could pick it apart more, but.. I think you get my drift.

janwas|7 months ago

Thanks for expanding on your viewpoint.

> Why would writing an optimizing compiler qualify as territory for directly writing SIMD code, but anything else is off the table?

I understood "directly writing" to mean assembly or even intrinsics. In general, I would advise not touching intrinsics directly, because the intrinsic definitions themselves have in several cases had bugs. Here's one AVX-512 example: https://github.com/google/highway/commit/7da2b760c012db04103....

When using a wrapper library, these can be fixed in one spot, but every direct user of intrinsics has to deal with it themselves.

> it's extremely easy to write your own wrappers (like I did) and not take a dependency. A good trade IMO

I understand wanting to reduce dependencies. The tradeoff is a bit more complex: for example many readers would be familiar with Highway terminology. We have also made efforts to be a lightweight dependency :)

> doing it yourself as a learning exercise has value.

Understandable :) Though it's a bit regrettable to tie your user code to the library prototype - if used elsewhere, it would probably have to be ported.

> The fact is, the library code is super fucking boring.

True for many ops. However, emulating AES or other complex ops is nontrivial. And it is easy to underestimate the sheer toil of keeping things working across compiler versions and their many bugs. We recently hit the 3000 commit mark in Highway :)

llm_nerd|7 months ago

>First off, a number of statements are nonsense.

100% of my original comment is absolutely and completely correct. Indisputable correct.

>Furthermore, I was writing a library.

Little misunderstandings like this pervade your take.

>seems to be under the impression that using a SIMD library would somehow have produced a better result.

To be clear, I wasn't speaking to you or for your benefit, or specifically to your exercise. You'll notice I didn't email a list of recommendations to you, because I do not care what you do or how you do it. I didn't address my comment to you.

I -- and I was abundantly clear on this -- was speaking to the random reader who might be considering optimizing their code with some hand-crafted SIMD. That following the path in this (and an endless chain of similar) submission(s) is usually ill advised, generally, not even speaking to this specific project, but rather to the average "I want to take advantage of SIMD in my code" consideration.

HN has a fetish for SIMD code recently and there is almost always a better approach than hand-crafting some SSE3 calls in one's random project.

>The original commentor seems to be under the impression that using a SIMD library would somehow have produced a better result.

Again, I could not care less about your project. But the average developer does care that their code runs on a wide variety of platforms optimally. You don't, but again, you and your project was tangential to my comment which was general.

>The thing the original commentor does get right is the notion that thinking about data layout is important.

Aside from the entirety of my comment being correct, the point was that many of the SIMD tools and libraries force you down a path where you are coerced into such structures. Versus often relying upon the compiler to make the best of suboptimal structures. We've seen many times where people complain that their compiler isn't vectorizing things that they think it should, but there is a choice between endlessly fighting with the compiler, and hand-rolling SSE calls, that not only supports much more hardware it leads you down the path of best practices.

Which is of course why C++ 26 is getting std::simd.

Again, you are irrelevant to my comment. Your project is irrelevant to it. I know this is tough to stomach.

>Right. I'll believe this when I see it.

I actually cloned the project but then this submission fell off the front page and it seemed not worth my time. Not to mention that it can't be built on macOS which happened to be the machine I was on at the moment.

Because again, I don't care about your or your project, and my commentary was to the SIMD sideliners considering how to approach it.

>I could pick it apart more, but.. I think you get my drift.

None of your retorts are valid, and my comment stands as completely correct. The drift is that you feel defensive about a general comment because you did something different, which....eh.