top | item 35500279

(no title)

Xorlev | 2 years ago

Computers are fast. I once worked on a service sitting at the top of a deep RPC stack. Stack traces were often folded into RPC error messages to help diagnostics. Well, if you have a failure at one layer, propagated to the next layer, and the next layer, and sometimes replicated for each looked up item (to support partial failure semantics), then you can end up with gigabytes of stack traces in memory for some fraction of time. Very hard to figure out until tasks started dying and leaving behind heap dumps during a wide-spread incident at a lower layer of the stack.

discuss

dan-robertson|2 years ago

My back-of-the-envelope estimate for how long it would take to construct the 1.5GB string is about 200ms to 300ms (1.5GB write, 0.75 read, extra from growing a buffer for the result, sequential R/W at 10GB/s; copying to resize the buffer at 20). Which is somewhat fast but I’m still surprised that the latency increase wasn’t noticed first.

More importantly, the time and memory doubles with each further navigation which means a user shouldn’t have had to navigate much farther to see much worse performance. Maybe this was a known issue that no one had debugged by then.

My best guess is that screens were arranged in a tree and users would navigate ‘back’ instead of ‘up’ so it was hard to get a long history.