(no title)
jaksprats | 15 years ago
the process can be thought of as a map-reduce. first pass would be figure out where all your SETs are located and how big they are. then intersect all co-located SETs. then (in parallel) take smaller SETs and MIGRATE them towards larger SETS, intersect those and repeat (respecting new sizes). A pyramid of SET intersection can be done until the result is the final two SETs being intersected, a process will need to coordinate all of this, but luckily intersections are idempotent, so if one parallel job finishes quicker, it doesn't need to wait.
W/ redis speed, the overhead (for the coordinator) to get the size of the different SETs in all these steps should be minimal as compared to the time taken to MIGRATE the data (meaning for-free). Also w/ redis' speed the MIGRATE will be very fast and the cross-node-join (or cross-node-intersecton) bottlenecks on network I/O, so if a good framework for this redis map-reduce is created, it will be a pretty optimal setup, and it wont bloat the server w/ tasks that can not be done directly in the server (cross-node-intersections are done at something more like a proxy level, they require intermediate results).
this problem is a hard one, data analytic stores optimise to this problem by storing data redundantly w/ a "pre-joined" colocation strategy, which works for star schemas and a limited number of tables, but doesnt make sense w/ 1000s of SETs, so this is a real good solution and classic redis anti-bloat.
No comments yet.