top | item 46611063

Exa-d: How to store the web in S3

46 points| willbryk | 1 month ago |exa.ai

exa-d is our internal data processing framework that stores the web in S3. It helps deal with the complexity of data at (web) scale using specific design decisions like declarative typed dependencies and enabling sparse updates.

4 comments

order

swyx|1 month ago

hi will! super nicely written, nice look under the hood of your processing. as an orchestration guy i always wondered why everyone seems to converge on using Ray, and as a secondary thought, how well is Anyscale capturing the Ray market.

if i were doing what you do i might set up a lot of rate limits/anomaly detection in case some weird unintended invalidation causes a weird spike in your dependency graphs. is there good practice there for anomaly detection other than "setup a bunhc of dashboards and be on call"?

twyxy|1 month ago

Ray is the future

timvdalen|1 month ago

Opening this page makes my (quite beefy) machine grind to a halt! Almost all CPU threads and the GPU jump up to 80% usage

neilv|1 month ago

Would be funny if blog visitors were the distributed compute nodes.