Bit-twiddling optimizations in Zed's Rope

rgrmrts|1 year ago

I really like all the blog posts and videos the Zed team has put out, thank you if you’re reading this!

Unrelated to this specific post I’m such a fan of Zed. It’s the first feature complete text editor in recent memory that I’ve truly enjoyed using (i.e. it stays out of the way, is really fast, feels well engineered). I’m coming to Zed after years of Emacs which I still have love for but no longer feels like a competitive piece of software (it does not take full advantage of how good computers are today, e.g. gpu rendering or multicore). I really hope Zed stays a fast and lightweight text editor instead of becoming some bloated growth-at-all-cost VC ware (not that they’ve exhibited any signs of that happening). I’d also happily pay for Zed without a subscription based thing for access to LLM features (which I do not use).

seanw444|1 year ago

> it does not take full advantage of how good computers are today, e.g. gpu rendering or multicore

Why does Emacs need that though? I hear people say this all the time and I don't get it. Multicore kind of works against the structure that Emacs touts as a feature. And GPU rendering? In many applications, I totally agree with these complaints. But it's a text editor.

I tried Zed myself, and it's good. But it doesn't dethrone Emacs (for me personally).

benreesman|1 year ago

I really want to admire the Zed folks even though they are a new faction in the editor wars where I’m an emacs guy: they take mostly all the things I care about seriously.

The are serious about local vs. remote vs. shared. They are serious about hardware acceleration because they care about users who type fast and edit big files. They care about real computer science on how to push the current hardware to serve the user, rather than treating the hardware as a crutch to give the user a slightly worse experience at a fraction of the development cost. They care about code highlighting the snippets on their blog like very similar to the default Zed theme.

These are cool, serious people. If they give me emacs key bindings with full paredit-everywhere, I might switch my daily driver.

And this is about using modern SIMD-style stuff, branch less stuff, Lemire stuff in a modern and relevant context. Other commenters have pointed out that you can do the mask or popcount better with intrinsically, and yeah, but they probably know that, and B. They got the massive win, the remaining factor of K on the pop count is coming.

Long Zed.

eviks|1 year ago

Why not store just a small u8 count of newlines in a chunk instead of their u128 positions and then only loop through the last chunk for precision?

You don't need information about the position of newlines in all the chunks located before the one your offset lands on

teo_zero|1 year ago

As I understand it, they do exactly what you say. TFA is about optimizing the last chunk's loop.

dzaima|1 year ago

SIMD can work quite well here too - with 128-bit SIMD (available on both baseline x86-64 and aarch64) this can be just ≤8 loop iterations checking for the newline character (each iteration counting the number of newline characters encountered, and a lzcnt on the last iteration), and similar for characters (assuming valid UTF-8, it's a single comparison to test if a byte starts a new char).

dmitrygr|1 year ago

  >  // Parallel bit count intermediates
  >  let a = v - ((v >> 1) & (u64::MAX / 3));
  >  let b = (a & (u64::MAX / 5)) + ((a >> 2) & (u64::MAX / 5));
  >  let c = (b + (b >> 4)) & (u64::MAX / 0x11);
  >  let d = (c + (c >> 8)) & (u64::MAX / 0x101);

That "parallel bit count" is almost certainly slower than using two POPCNT instructions on a modern cpu. Should just call __builtin_popcount() and let the compiler do it the most optimal way. Luckily, people do this sort of thing so often that many modern compilers will try (and often succeed) to detect you trying this insanity and convert it to a POPCOUNT (or a pair of POPCOUNTs as the case may be here)

ot|1 year ago

What that code does is a per-byte-pair popcount, which is not what the POPCNT instruction does (it computes the popcount for the whole word).

On processors with BMI2 the whole algorithm reduces to a PDEP as mentioned in another comment, but if you don't have that this is pretty much the best you can do (unless you use lookup tables but those have pros and cons).

akoboldfrying|1 year ago

Which compilers support __builtin_popcount()? From memory, it's a gcc extension. If the compiler selects a CPU POPCOUNT instruction for it, are you sure it will work on all machines that you want to run it on?

The above code is completely source- and binary-portable and reasonably fast -- certainly faster than naively looping through the bits, and within a small constant factor of a CPU POPCOUNT instruction.

camel-cdr|1 year ago

nth_set_bit_u64: wouldn't that be __builtin_ctzll(_pdep_u64(1<<n, v)) with BMI2?

kwillets|1 year ago

That's my guess as well.

Bitstring rank/select is a well-known problem, and the BMI and non-BMI (Hacker's Delight) versions are available as a reference.

SkiFire13|1 year ago

That's assuming you're ok with your program not running on some older cpus.

stouset|1 year ago

I believe the equivalent ARM64 instructions are in SVE2 which isn’t yet supported on Apple’s M-series chips as of M4, sadly.

sapiogram|1 year ago

Is there a way to adjust text contrast in light mode in Zed yet? The editor is unfortunately unusable for me, because of how washed out the colors are.

mattbaker|1 year ago

There are themes now and a UI to install them, I also didn’t like the washed out colors.

veltas|1 year ago

> Turns out CPUs are pretty good with zeros and ones.

I've been saying this a lot recently, that CPU's are powerful 1-bit vector machines!

eliasson|1 year ago

Does anyone know of any good presentations about the Zed architecture and internals around?

I know that they have some recordings from their coding session, but I am looking for something more overall.

m1keil|1 year ago

This is all really cool, but I just want soft wrap to be fixed

Am4TIfIsER0ppos|1 year ago

> opacity: 0;

> filter: blur(1px);

Wonderful styling!

ramon156|1 year ago

Isnt the tab example wrong? Id assume it to be

aa -> -> bb -> -> bb

It only takes up two spaces, after all

50 comments