top | item 37594703

(no title)

m104 | 2 years ago

One aspect of this type of problem I missed from the article is whether the data mutations were applied evenly across transaction time. Data sets like these tend to be very active for recent transactions, while the updates fall off quickly as the data ages. If that's the case, applying a single query caching solution may not be a good fit and may always suffer from major tuning/balance issues.

If the data is in fact updated with clear hot/warm/cold sets, caching the cold sets should be extremely effective, the warm set moderately effective, and it may not even be worth caching the hot set at all, given the complexity proposed. Additionally, you should be able to offload the cold sets to persistent blob storage, away from your main database, and bulk load them as needed.

Finally, it can be faster and simpler to keep track of deltas to cold sets (late mutations that happen to "invalidate" the previously immutable data), by simply storing those updates in a separate table, loading the cold set data, and applying the delta corrections in code as an overlay when queried. Cron jobs can read those deltas, and fold them back into the cold set aggregations, making clean validated cold set data again.

Great article, BTW! There are entire database technologies and product dedicated to addressing these use cases, particularly as the data sets grow very large.

discuss

No comments yet.