> But we’ve hit the ceiling for SSE. That terrible Claude UI refresh gif is state of the art for SSE. And it sucks.
This is nothing to do with SSE. It's trivial to persist state over disconnects and refresh with SSE. You can do all the same pub sub tricks.
None of theses companies are even using brotli on their SSE connection for 40-400x compression.
It's just bad engineering and it's going to be much worse with web sockets. Because, you have to rebuild http from scratch, compression is nowhere near as good, bidirectional nukes your mobile battery because of the duplex antenna, etc, etc.
Just to add. The main value of websockets was faster up events pre http2. But, now with multiplexing in http2 that's no longer the case.
So the only thing you get from websockets is bidirectional events (at the cost of all the production challenges websockets bring). In practice most problems don't need that feature.
Dunno, in my Go+HTMX project, it was pretty trivial to add SSE streaming. When you open a new chat tab, we load existing data from the DB and then HTMX initiates SSE streaming with a single tag. When the server receives a SSE request from HTMX, it registers a goroutine and a new Go channel for this tab. The goroutine blocks and waits for new events in the channel. When something triggers a new message, there's a dispatcher which saves the event to the DB and then iterates over registered Go channels and sends the event to it. On a new event in the tab's channel, the tab's goroutine unblocks and passes the event from the channel to the SSE stream. HTMX handles inserting new data to the DOM. When a tab closes, the goroutine receives the notification via the request's context (another Go primitive), deregisters the channel and exits. If the server restarts, HTMX automatically reopens the SSE stream. It took probably one evening to implement.
We resolved this by creating a separate context for the lifecycle of a chat/turn so if the user leaves the page, the process continues on the server. UI calls an RPC to fetch in progress turn, which allows it to resume, or if it's done, simply render the full turn.
Assuming the traditional stateless routing of requests, say round robin from load balancers; how do you make sure the returning UI client ends up on the same backend server replica that's hosting the conversation?
Or is it that all your tokens go through a DB anyway?
It's fairly easy to keep an agent alive when a client goes away. It's a lot harder to attach the client back to that agents output when the client returns, without stuffing every token though the database.
The SSE thing is a symptom of something bigger imo. These models are stateless but we often act like context windows are memory. Nothing around them actually remembers anything, and vector search doesn't fix it. I went down this rabbit hole recently: https://philippdubach.com/posts/beyond-vector-search-why-llm...
This is a feature of the web. Browser refreshes SHOULD dump state. Otherwise it can be difficult to recover from system errors. Of course if you can build a system that is guaranteed to never have bugs then go ahead and disable this feature. But users may still be confused as to why refreshing hasn’t restarted their window
It's interesting because this is a solved problem with collaborative docs.
CRDT or OT will work great but are even overkill. But so many of the edge cases you'd usually need to think about just disappear.
(I've built an agent / chat that used CRDT to represent the chat. You can have an arbitrary number of tabs, closing/opening at any time. All real time, in sync.)
t3.chat solves this pretty well. I believe they utilize convex db. I think it’s something like a backend server process is the true connection and state of the chat. The front end syncs and receives updates from it.
It saves your chats, which are presented in a pane you can expand on the left and search. You can jump back into any chat and continue it, or delete individual chats.
This history is attached to your Google account, not to the chat window. You can pick up an existing chat in another browser on another device where you are authenticated with the same Google identity.
Now about the specific use scenario in this article (hitting refresh immediately after submitting a prompt, while the response is coming). Not sure why that would be important?
I just tried it several times. Both times, it initially appeared as if the Gemini interface lost the chats, since they didn't appear in the chat history section of the left pane. But after another refresh, they appeared. So there is just some delay.
Anyway, it's good in this regard beyond giving a damn.
Lmao sorry but you completely missed the point of the article.
Yes of course all chat providers store your chats, and they will be available eventually when the response has finished streaming and has been dumped to a db.
This is about live streaming getting lost and not being reconnected (and restreamed) when you refresh the page.
And since chatting with AI and seeing the responses streamed is a major usecase, the author was correct to question why eg Anthropic wouldn invest some of the 30B in fixing this glaring problem.
Esp since it looks like your initial message was not received by the backend server at all!
It may not be super criticsl, but it's like saying "my ferrari sometimes shows the wrong speed. it's still driving, but the speedometer is stuck. it does get back to the correct speed eventually though, so no biggie"
andersmurphy|9 days ago
This is nothing to do with SSE. It's trivial to persist state over disconnects and refresh with SSE. You can do all the same pub sub tricks.
None of theses companies are even using brotli on their SSE connection for 40-400x compression.
It's just bad engineering and it's going to be much worse with web sockets. Because, you have to rebuild http from scratch, compression is nowhere near as good, bidirectional nukes your mobile battery because of the duplex antenna, etc, etc.
andersmurphy|9 days ago
So the only thing you get from websockets is bidirectional events (at the cost of all the production challenges websockets bring). In practice most problems don't need that feature.
anonzzzies|9 days ago
mrieck|9 days ago
Go to ChatGPT.com while logged in, start typing right away, 8 words into typing it clears the text in the form. Why?
hsbauauvhabzb|9 days ago
qouteall|9 days ago
nicbou|9 days ago
0xy|9 days ago
trillic|9 days ago
hglaser|9 days ago
Very weird that the foundational LLM companies' own chat pages don't do this.
kgeist|9 days ago
Dunno, in my Go+HTMX project, it was pretty trivial to add SSE streaming. When you open a new chat tab, we load existing data from the DB and then HTMX initiates SSE streaming with a single tag. When the server receives a SSE request from HTMX, it registers a goroutine and a new Go channel for this tab. The goroutine blocks and waits for new events in the channel. When something triggers a new message, there's a dispatcher which saves the event to the DB and then iterates over registered Go channels and sends the event to it. On a new event in the tab's channel, the tab's goroutine unblocks and passes the event from the channel to the SSE stream. HTMX handles inserting new data to the DOM. When a tab closes, the goroutine receives the notification via the request's context (another Go primitive), deregisters the channel and exits. If the server restarts, HTMX automatically reopens the SSE stream. It took probably one evening to implement.
luxurytent|9 days ago
Wasn't that complex!
zknill|9 days ago
Or is it that all your tokens go through a DB anyway?
It's fairly easy to keep an agent alive when a client goes away. It's a lot harder to attach the client back to that agents output when the client returns, without stuffing every token though the database.
7777777phil|7 days ago
xyzsparetimexyz|9 days ago
cyanydeez|9 days ago
mrcartmeneses|7 days ago
jasonjmcghee|9 days ago
CRDT or OT will work great but are even overkill. But so many of the edge cases you'd usually need to think about just disappear.
(I've built an agent / chat that used CRDT to represent the chat. You can have an arbitrary number of tabs, closing/opening at any time. All real time, in sync.)
unknown|9 days ago
[deleted]
rcarmo|9 days ago
unknown|9 days ago
[deleted]
rbbydotdev|9 days ago
unknown|9 days ago
[deleted]
kazinator|9 days ago
Some are using Google Gemini.
It saves your chats, which are presented in a pane you can expand on the left and search. You can jump back into any chat and continue it, or delete individual chats.
This history is attached to your Google account, not to the chat window. You can pick up an existing chat in another browser on another device where you are authenticated with the same Google identity.
Now about the specific use scenario in this article (hitting refresh immediately after submitting a prompt, while the response is coming). Not sure why that would be important?
I just tried it several times. Both times, it initially appeared as if the Gemini interface lost the chats, since they didn't appear in the chat history section of the left pane. But after another refresh, they appeared. So there is just some delay.
Anyway, it's good in this regard beyond giving a damn.
nubg|9 days ago
Yes of course all chat providers store your chats, and they will be available eventually when the response has finished streaming and has been dumped to a db.
This is about live streaming getting lost and not being reconnected (and restreamed) when you refresh the page.
And since chatting with AI and seeing the responses streamed is a major usecase, the author was correct to question why eg Anthropic wouldn invest some of the 30B in fixing this glaring problem.
Esp since it looks like your initial message was not received by the backend server at all!
It may not be super criticsl, but it's like saying "my ferrari sometimes shows the wrong speed. it's still driving, but the speedometer is stuck. it does get back to the correct speed eventually though, so no biggie"