When can we expect discord not to take 20-30 seconds to launch on a $5000 PC? What exactly is it doing, recompiling the client from source using a single core each time it opens?
It needs to download the 20 distinct update patches that they added in the past 4 hours, in series, all of which combine to together change the actual product in precisely no way whatsoever.
The onus is really on Discord, but you can use https://openasar.dev to partially fix the problem for yourself - it's an open source drop-in replacement for the client updater/bootstrapper.
In true JS fashion, it rewrites itself in a new framework every time you open it, and every fifth time, it chooses a different package manager just for fun.
Reading through the post they seem to have been hyper focused on compression ratios and reducing the payload size/network bandwidth as much as possible, but I don't see a single mention of CPU time or evidence of any actual measureable improvement for the end user. I have been involved with a few such efforts at my own company, and the conclusion always was that the added compression/decompression overhead on both sides resulted in worse performance. Especially considering we are talking about packets at the scale of bytes or a few kilobytes at most.
The explicitly mention compression time. It’s actually lower in the new approach.
> the compression time per byte of data is significantly lower for zstandard streaming than zlib, with zlib taking around 100 microseconds per byte and zstandard taking 45 microseconds
I think one thing this blog post did not mention was the historical context of moving from uncompressed to compressed traffic (using zlib), something I worked on in 2017. IIRC, the bandwidth savings were massive (75%). It did use more server side CPU, and negligible client side CPU, so we went for it anyways as bandwidth is a very precious thing to optimize for especially with cloud bandwidth costs.
Either way the incremental improvements here are great - and it's important to consider optimization both from transport level (compression, encoding) and also from a protocol level (the messages actually sent over the wire.)
Also one thing not mentioned is client side decompression on desktop used to use a JS implementation of zlib (pako) to a native implementation, that's exposed to the client via napi bindings.
> Looking once again at MESSAGE_CREATE, the compression time per byte of data is significantly lower for zstandard streaming than zlib, with zlib taking around 100 microseconds per byte and zstandard taking 45 microseconds.
They're going from 2+MB (for some reason) to 300KB - even if decompression is "slow," that's going to be a win for their bandwidth costs and for perceived speed for _most_ users.
I was surprised to see little server-side CPU benchmarking too, though. While I'd expect overall client timing for (transfer + decompress) to be improved dramatically unless the user was on a ridiculously fast network connection, I can't imagine server load not being affected in a meaningful way.
Some of those payloads are much larger than a few kilobytes (READY, MESSAGE_CREATE etc.)
There is a section and data on "time to compress". No time to decompress though.
Performance is probably the wrong lens. Mobile data is often expensive in terms of money, whereas compression is cheap in terms of CPU time. More compression is almost always the right answer for users of mobile apps.
Interesting way to approach this (dictionary based compression over JSON and Erlang ETF) vs. moving to a schema-based system like Cap'n Proto or Protobufs where the repeated keys and enumeration values would be encoded in the schema explicitly.
Also would be interested in benchmarks between Zstandard vs. LZ4 for this use case - for a very different use case (streaming overlay/HUD data for drones), I ended up using LZ4 with dictionaries produced by the Zstd dictionary tool. LZ4 produced similar compression at substantially higher speed, at least on the old ARM-with-NEON processor I was targeting.
I guess it's not totally wild but it's a bit surprising that common bootstrapping responses (READY) were 2+MB, as well.
>Diving into the actual contents of one of those PASSIVE_UPDATE_V1 dispatches, we would send all of the channels, members, or members in voice, even if only a single element changed.
> the metrics that guided us during the [zstd experiment] revealed a surprising behavior
This feels so backwards. I'm glad that they addressed this low-hanging fruit, but I wonder why they didn't do this metrics analysis from the start, instead of during the zstd experiment.
I also wonder why they didn't just send deltas from the get-go. If PASSIVE_UPDATE_V1 was initially implemented "as a means to scale Discord servers to hundreds of thousands of users", why was this obvious optimization missed?
Something important that didn't get mentioned, neither in the post nor in the comments, is whether this is safe in the face of compression oracle attacks[1] like BREACH[2]. Given how much effort it seems Discord put into the compression rollout, I would be inclined to believe that they surely must have considered this, and I wish that they had written something more specific.
Sounds like a problem with your computer. I can have discord, 50 browser tabs, two different games, a JetBrains IDE and various other stuff open at the same time without any trouble at all.
And my computer isn't particularly crazy. Maybe like $1500.
How many servers have you joined and how many of those are large and active? Also relevant, do you need to be in all of them?
Most of the time I have seen people complain about this it is because they have joined a ton of hyperactive servers.
You could argue it shouldn't be an issue and more dynamically load things like messages on servers. But then you'd have people complaining that switching servers takes so long.
One thing I appreciate very much about this article is that they describe things they tried and didn't work as well. It's becoming increasingly rare (and understandably why) for articles to describe failed attempts but it's very interesting and helpful as someone unfamiliar with the space!
[+] [-] transcriptase|1 year ago|reply
[+] [-] dumbo-octopus|1 year ago|reply
[+] [-] nicoburns|1 year ago|reply
[+] [-] maxfurman|1 year ago|reply
[+] [-] Jowsey|1 year ago|reply
[+] [-] ezfe|1 year ago|reply
[+] [-] solardev|1 year ago|reply
[+] [-] cevn|1 year ago|reply
[+] [-] dimal|1 year ago|reply
[+] [-] alexander2002|1 year ago|reply
[+] [-] dancemethis|1 year ago|reply
[+] [-] paxys|1 year ago|reply
[+] [-] dumbo-octopus|1 year ago|reply
> the compression time per byte of data is significantly lower for zstandard streaming than zlib, with zlib taking around 100 microseconds per byte and zstandard taking 45 microseconds
[+] [-] jhgg|1 year ago|reply
Either way the incremental improvements here are great - and it's important to consider optimization both from transport level (compression, encoding) and also from a protocol level (the messages actually sent over the wire.)
Also one thing not mentioned is client side decompression on desktop used to use a JS implementation of zlib (pako) to a native implementation, that's exposed to the client via napi bindings.
[+] [-] ihumanable|1 year ago|reply
Time to compress is a measure of how long the CPU spends compressing. So this is in the blogpost
[+] [-] TulliusCicero|1 year ago|reply
> Looking once again at MESSAGE_CREATE, the compression time per byte of data is significantly lower for zstandard streaming than zlib, with zlib taking around 100 microseconds per byte and zstandard taking 45 microseconds.
[+] [-] xnx|1 year ago|reply
[+] [-] bri3d|1 year ago|reply
I was surprised to see little server-side CPU benchmarking too, though. While I'd expect overall client timing for (transfer + decompress) to be improved dramatically unless the user was on a ridiculously fast network connection, I can't imagine server load not being affected in a meaningful way.
[+] [-] hiddencost|1 year ago|reply
[+] [-] usernamear|1 year ago|reply
[+] [-] BoorishBears|1 year ago|reply
[+] [-] zarzavat|1 year ago|reply
[+] [-] bri3d|1 year ago|reply
Also would be interested in benchmarks between Zstandard vs. LZ4 for this use case - for a very different use case (streaming overlay/HUD data for drones), I ended up using LZ4 with dictionaries produced by the Zstd dictionary tool. LZ4 produced similar compression at substantially higher speed, at least on the old ARM-with-NEON processor I was targeting.
I guess it's not totally wild but it's a bit surprising that common bootstrapping responses (READY) were 2+MB, as well.
[+] [-] echelon|1 year ago|reply
Protos or a custom wire protocol would be far better suited to the task.
[+] [-] fearthetelomere|1 year ago|reply
> the metrics that guided us during the [zstd experiment] revealed a surprising behavior
This feels so backwards. I'm glad that they addressed this low-hanging fruit, but I wonder why they didn't do this metrics analysis from the start, instead of during the zstd experiment.
I also wonder why they didn't just send deltas from the get-go. If PASSIVE_UPDATE_V1 was initially implemented "as a means to scale Discord servers to hundreds of thousands of users", why was this obvious optimization missed?
[+] [-] jhgg|1 year ago|reply
[+] [-] RainyDayTmrw|1 year ago|reply
[1]: https://en.wikipedia.org/wiki/Oracle_attack [2]: https://en.wikipedia.org/wiki/BREACH
[+] [-] acer4666|1 year ago|reply
[+] [-] myprotegeai|1 year ago|reply
2024-09-20T13:28:42.946055-07:00 hostname kernel: audit: type=1400 audit(1726864122.944:11828880): apparmor="DENIED" operation="ptrace" class="ptrace" profile="snap.discord.discord" pid=1055465 comm="Utils" requested_mask="read" denied_mask="read" peer="unconfined"
[+] [-] sfn42|1 year ago|reply
And my computer isn't particularly crazy. Maybe like $1500.
[+] [-] creesch|1 year ago|reply
Most of the time I have seen people complain about this it is because they have joined a ton of hyperactive servers.
You could argue it shouldn't be an issue and more dynamically load things like messages on servers. But then you'd have people complaining that switching servers takes so long.
[+] [-] jimmyl02|1 year ago|reply
[+] [-] nickphx|1 year ago|reply
[+] [-] mrinfinitiesx|1 year ago|reply
[+] [-] mevv|1 year ago|reply
[deleted]