This is perhaps only tangentially related...
but I'm trying to do something to support sharding of websocket connections based on a user_id....
I want the connections to be distributed to separate instances of a WS server based on modulo-based or consistent hashing algorithm... above that in the stack we have a message broker or queue partitioned using the same sharding algorithm for the number of websocket servers running. This queue/message broker sends publishes messages to each web socket service which in-turn pushes the message to the client. This is how I want to handle horizontally scalable web socket connections...
The question I have is the best way to do this and the best way to migrate connections. From one service to another when/if the replication factor of web socket servers changes (we add/remove servers)...
One way I would think would be to send a disconnect message to the websocket service when this event happens and it would disconnect it's clients and then have a random delay for when re-connection happens... so as to avoid trying to reinitialize all of those connections at once. This seems sub-optimal as with consistent-hashing many of the connections will stay on the same server, and terminating those connections would be unnecessary.
What is the best way to migrate an open TCP from one server to another? Or if the connection must be closed... how can we minimize the number of connections that are to be closed.
BTW: I know I can broadcasts to all instances of the websocket server, and then unless that server holds the connection it is no-op... if the server does have a connection then send the message back... This also seems it would have a limit to how far this scaling-strategy would grow before running into limitations.
Linux has TCP connection repair, which can be used to transparently migrate an active TCP session between two Linux boxes. The TCP traffic will be paused for the duration of the migration, but the other endpoint will not notice as long as the migration is complete before TCP timeouts kick in.
I am one of the co-founders of https://ably.com. We have built a Serverless websockets platform that is near infinitely scalable and designed to handle the complexity you’re describing such as cluster resizing, connection failures, automatic sharding and reallocations using hash rings across our global edge service etc. I’d be really interested to hear why you wouldn’t consider offloading this to a service like ours which powers the likes of HubSpot, Expedia, Spotify etc. Feel free to contact me directly @mattheworiordan if you’d prefer not to comment here. I’m really keen to hear different perspectives on what we’re doing right and where we can improve!
Can you just store the shard id for each user in a database? Each server could check if it is authorative for each incoming connection, and redirect if not.
That way, you can migrate users one by one (sending a 'redirectUserConnections(userId)' to the original server, and even pick a shard that is geographically close to the user. (If you care about race conditions during migrations, you need to be really careful though.)
µWebSockets is a simple to use yet thoroughly optimized, standards compliant and secure implementation of WebSockets (and HTTP). It comes with built-in pub/sub support, URL routing, TLS 1.3, SNI, IPv6, permessage-deflate
Header-only libraries are a common thing in C++. They allow pretty easy consumption, because the consumer only needs to specify the include path. That's fairly easy with any build system (and C++ has a lot of them).
Any binary library would also require building the library itself using the same toolchain and then linking to it - which requires a lot more configuration on the consuming side. It might be easy if both projects use the same build system (e.g. cmake), but that's far from guaranteed in C++.
I think things have changed, but back when I was doing C++, templates (generics) were implemented in header files, so it was not unusual to have entire programs written in .h files.
Having dealt only a tiny bit with Node, GYP (it's garbage C++ interop layer) and the inexplicable dependency on Python (???) being installed on the devs machine in order to build the C++ code that got copied into node_modules, I know better than to ever try another library making use of that.
I don't think you have less dep when building other c++ projects. People also use automake, autoconf or whatever to generate makefiles. It is just that the author choose python here. I never see anybody writes makefile manually and build moderate or big size project. There are always tons of tools used.
Cool little library, I'm using it, but author has an ego the size of Wolfram's. This is just one snippet, there are many if you rummage through the docs and issues:
> In the 1970s, programming was an elite's task. Today programming is done by uneducated "farmers" and as a result, the care for smart algorithms, memory usage, CPU-time usage and the like has dwindled in comparison.
I used to vehemently agree with stuff like this, but now I just cringe. Yeah software today sucks, but comparing it with the 70s/80s/90s is just stupid. We don't ship software for one piece of incredibly simple hardware and a few thousand customers anymore. Elitism isn't going to make the situation any better and frankly unproductive whining like this is just getting exhausting.
> In the 1970s, programming was an elite's task. Today programming is done by uneducated "farmers" and as a result, the care for smart algorithms, memory usage, CPU-time usage and the like has dwindled in comparison.
What's wrong with that statement? In 1970's, the number of people who dealt with programming was significantly smaller than today. It's a matter of fact. And people who worked with computers back then were highly educated and pioneers of the technology, there was no room for half-hearted effort or "let's copy the first result from stackoverflow.com".
Today, programmers who are meticulous, detail-oriented and who genuinely recognize the problem first are in minority and that's how I read the above.
The author might have stated it abruptly, but it does not mean he's wrong. You're using the "cool little library", yet you have this kind of resentment for the person who made it, have you even tried to look at the situation from their perspective before you passed the judgement?
I'm asking out of curiosity, because I can relate to having to work with under-skilled individuals who spend more time sorting out their CV and online presence than their work.
As an uneducated farmer this gave me a giggle, it suggests if I filed a ticket about a performance problem in the project it'd likely be fixed instantaneously. That quote is basically an advert, if the author wishes to establish such a high bar for themselves it'd be foolish not to avail of it.
To be fair to the author, that quote is part of a longer passage about how one should not optimize for free form ascii first when fixed size buffers are an option. At least that how I read it, it's clearly opinionated writing.
I find it interesting that everybody in this thread seems to take exception to the comparison to "farmers". I mean, farming is brutal hard work but it also does not require a high degree of thinking, so I can see his point even if he has made it in an offensive way. Personally, I'm much happier sitting in a warm comfortable place thinking hard about code instead of being out in a cold field 20 hours a day, but meh.
Much more offensive (to me) is this gem from later on:
> Designing clever, binary and minimally repetitive protocols saves enormous amounts of CPU-time otherwise lost to compression.
Well yes, but also, and more importantly no. This is technically true - not compressing the data is more efficient in terms of computation per bit on the wire. But it is also wrong in a much more subtle way.
The advantage of using JSON (or an equivalent) and then compressing it down is that we inject a large amount of redundancy into the format. The CPU is taking that redundancy back out for us to save bandwidth, but we get to exploit it in much more valuable ways. The encoder/decoder are generic, simple and highly optimized. We can use them everywhere and trust them to work. This has value in itself, and it means that we can spend our limited budget of programmer time in optimizing other code where the payoff is higher. It means less time tracking down bugs in "super 10x programmer's one-off encoder that was only written for this specific application". And lastly, but my favourite - it means that when something goes wrong we have a hope of repairing the damage to the data. Because redundancy gives us options for error checking and recovery, while optimizing the binary format generally does not.
Also, if you want the format to really be optimal it should not be designed as byte streams (as low-level binary formats generally are) and instead should be a stream of variable-sized tokens that are fed through an arithmetic encoder.
The first half of this week I spent repairing data corruption caused by some of my code. Luckily there is a lot of redundancy in how I encoded the data (in a structure that is described in JSON inside the records) so I could write a tool that repaired the damage. I guess I'm lucky that I'm just a "farmer" :)
I mean you can be critical about performance and lack of optimization without punching down. People like this have never worked on applications in a normal corporation, I'm sure.
[+] [-] cphoover|3 years ago|reply
I want the connections to be distributed to separate instances of a WS server based on modulo-based or consistent hashing algorithm... above that in the stack we have a message broker or queue partitioned using the same sharding algorithm for the number of websocket servers running. This queue/message broker sends publishes messages to each web socket service which in-turn pushes the message to the client. This is how I want to handle horizontally scalable web socket connections...
The question I have is the best way to do this and the best way to migrate connections. From one service to another when/if the replication factor of web socket servers changes (we add/remove servers)...
One way I would think would be to send a disconnect message to the websocket service when this event happens and it would disconnect it's clients and then have a random delay for when re-connection happens... so as to avoid trying to reinitialize all of those connections at once. This seems sub-optimal as with consistent-hashing many of the connections will stay on the same server, and terminating those connections would be unnecessary.
What is the best way to migrate an open TCP from one server to another? Or if the connection must be closed... how can we minimize the number of connections that are to be closed.
BTW: I know I can broadcasts to all instances of the websocket server, and then unless that server holds the connection it is no-op... if the server does have a connection then send the message back... This also seems it would have a limit to how far this scaling-strategy would grow before running into limitations.
[+] [-] dsiddharth|3 years ago|reply
You're welcome to checkout the docs and get started: https://docs.hathora.dev/#/buildkit/README
If you'd like to get in touch, feel free to shoot me a message: sid [at] hathora.dev
[+] [-] 10000truths|3 years ago|reply
https://lwn.net/Articles/495304/
[+] [-] matt_oriordan|3 years ago|reply
[+] [-] eximius|3 years ago|reply
Then you can use whatever logic you want for it. If the client disconnects, it just asks again and reconnects.
Use a token from the directory endpoint to authorize the connection so they have to do it.
[+] [-] vanviegen|3 years ago|reply
That way, you can migrate users one by one (sending a 'redirectUserConnections(userId)' to the original server, and even pick a shard that is geographically close to the user. (If you care about race conditions during migrations, you need to be really careful though.)
[+] [-] Savageman|3 years ago|reply
[+] [-] detaro|3 years ago|reply
Can't you not involve the client and send it an explicit "reconnect to server XYZ" message, only move clients that really need to move?
[+] [-] matesz|3 years ago|reply
[1] https://github.com/uNetworking/uWebSockets/discussions/1466#...
[+] [-] londons_explore|3 years ago|reply
[1]: https://github.com/uNetworking/uWebSockets#battery-batteries...
[+] [-] ly3xqhl8g9|3 years ago|reply
[+] [-] stevefan1999|3 years ago|reply
[deleted]
[+] [-] white_dragon88|3 years ago|reply
[+] [-] 314|3 years ago|reply
[+] [-] alexhultman|3 years ago|reply
[+] [-] GiffertonThe3rd|3 years ago|reply
[+] [-] arthurcolle|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] sirsinsalot|3 years ago|reply
[+] [-] Tepix|3 years ago|reply
µWebSockets is a simple to use yet thoroughly optimized, standards compliant and secure implementation of WebSockets (and HTTP). It comes with built-in pub/sub support, URL routing, TLS 1.3, SNI, IPv6, permessage-deflate
[+] [-] commitpizza|3 years ago|reply
I guess that it is a backend library for doing websocket communication for node and perhaps also bun?
[+] [-] Deukhoofd|3 years ago|reply
https://github.com/uNetworking/uWebSockets/blob/master/misc/...
[+] [-] 323|3 years ago|reply
[+] [-] chrisweekly|3 years ago|reply
Interesting. Bun keeps getting more compelling.
[+] [-] rwl4|3 years ago|reply
[+] [-] Matthias247|3 years ago|reply
Any binary library would also require building the library itself using the same toolchain and then linking to it - which requires a lot more configuration on the consuming side. It might be easy if both projects use the same build system (e.g. cmake), but that's far from guaranteed in C++.
[+] [-] ChrisMarshallNY|3 years ago|reply
[+] [-] pkrumins|3 years ago|reply
[+] [-] lloydatkinson|3 years ago|reply
[+] [-] mmis1000|3 years ago|reply
[+] [-] alexhultman|3 years ago|reply
[+] [-] the_optimist|3 years ago|reply
[+] [-] hcayless|3 years ago|reply
Addendum: though it is clearly a u in the URL. Someone may be confused...
[+] [-] wccrawford|3 years ago|reply
[+] [-] 323|3 years ago|reply
> In the 1970s, programming was an elite's task. Today programming is done by uneducated "farmers" and as a result, the care for smart algorithms, memory usage, CPU-time usage and the like has dwindled in comparison.
https://github.com/uNetworking/uWebSockets/blob/master/misc/...
[+] [-] zarzavat|3 years ago|reply
Stephen Wolfram is going to be so pleased that he finally has his own unit named after him!
Mathematics, physics, computing, and now psychometrics, is there anything that man can’t make groundbreaking progress on?
[+] [-] c7DJTLrn|3 years ago|reply
[+] [-] zmxz|3 years ago|reply
What's wrong with that statement? In 1970's, the number of people who dealt with programming was significantly smaller than today. It's a matter of fact. And people who worked with computers back then were highly educated and pioneers of the technology, there was no room for half-hearted effort or "let's copy the first result from stackoverflow.com".
Today, programmers who are meticulous, detail-oriented and who genuinely recognize the problem first are in minority and that's how I read the above.
The author might have stated it abruptly, but it does not mean he's wrong. You're using the "cool little library", yet you have this kind of resentment for the person who made it, have you even tried to look at the situation from their perspective before you passed the judgement?
I'm asking out of curiosity, because I can relate to having to work with under-skilled individuals who spend more time sorting out their CV and online presence than their work.
[+] [-] dmw_ng|3 years ago|reply
[+] [-] scaredginger|3 years ago|reply
[+] [-] goalieca|3 years ago|reply
[+] [-] xorcist|3 years ago|reply
[+] [-] 314|3 years ago|reply
Much more offensive (to me) is this gem from later on:
> Designing clever, binary and minimally repetitive protocols saves enormous amounts of CPU-time otherwise lost to compression.
Well yes, but also, and more importantly no. This is technically true - not compressing the data is more efficient in terms of computation per bit on the wire. But it is also wrong in a much more subtle way.
The advantage of using JSON (or an equivalent) and then compressing it down is that we inject a large amount of redundancy into the format. The CPU is taking that redundancy back out for us to save bandwidth, but we get to exploit it in much more valuable ways. The encoder/decoder are generic, simple and highly optimized. We can use them everywhere and trust them to work. This has value in itself, and it means that we can spend our limited budget of programmer time in optimizing other code where the payoff is higher. It means less time tracking down bugs in "super 10x programmer's one-off encoder that was only written for this specific application". And lastly, but my favourite - it means that when something goes wrong we have a hope of repairing the damage to the data. Because redundancy gives us options for error checking and recovery, while optimizing the binary format generally does not.
Also, if you want the format to really be optimal it should not be designed as byte streams (as low-level binary formats generally are) and instead should be a stream of variable-sized tokens that are fed through an arithmetic encoder.
The first half of this week I spent repairing data corruption caused by some of my code. Luckily there is a lot of redundancy in how I encoded the data (in a structure that is described in JSON inside the records) so I could write a tool that repaired the damage. I guess I'm lucky that I'm just a "farmer" :)
[+] [-] Cthulhu_|3 years ago|reply
[+] [-] adamrezich|3 years ago|reply
[+] [-] alexhultman|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]