top | item 41314116

(no title)

I agree that the comment sounded pretty hostile, but I also agreed with the assessment that it might be better to avoid allocation in general. I know the code in the post is highly simplified, but you aren't exactly fully using "lazy iterators," at least in the post. refresh/load_hotels is still allocating new Hotels, each of which contains a one-million-element Vec. If you could reuse the existing self.hotels[id].prices Vec allocations, that might help, again at least in the example code.

On second glance, I guess this is what you're getting at with the ArcSwap comment in the post. It sounds like you really do want to reallocate the whole State and atomically swap it with the previous state, which would make reusing the prices Vec impossible, at least without keeping around two States and swapping between them.

Anyway, tone aside, I still think the comment had validity in a general optimization sense, even if not in this specific case.

discuss

Patryk27|1 year ago

Yeah, the example is not representative - in reality there's a couple of smaller vectors (base prices, meal prices, supplements, discounts, taxes, cancellation policies, ...) + lots of vectors containing indices (`for 2018-01-01, match prices #10, #20, #21`, `for 2018-01-02, match prices #11, #30`, ...), and they can't be actually updated in-place, because that would require using `RwLock`, preventing the engine from answering queries during refreshing.

(which is actually how it used to work a couple of years ago - ditching `RwLock` for `ArcSwap` and making the entire engine atomic was one of the coolest changes I've implemented here, probably worth its own post)

scottlamb|1 year ago

Makes perfect sense to me for the updates to happen atomically and avoid causing lock contention, even if that makes the loader more allocation-happy than it'd be otherwise. I've done similar things before.

What about the query path? Your post talked about 10% improvement in response latency by changing memory allocators. That could be due to something like one allocator making huge page use more possible and thus vastly decreasing TLB pressure...but it also could be due to heavy malloc/free cycles during the query getting sped up. Is that happening, and if so why are those allocations necessary? Ignoring the tone, I think this is more what akira2501 was getting at. My inclination would be to explore using per-request arenas.