top | item 37612414

(no title)

alibero | 2 years ago

I used to work at Yelp, which had something that I think it similar to what you are describing called Data Pipeline (https://engineeringblog.yelp.com/2019/12/cassandra-source-co...).

I remember it being pretty simple (like, run one or two bash commands) to get a source table streamed into a kafka topic, or get a kafka topic streamed into a sink datastore (S3, mysql, cassandra, redshift, etc). Kafka topics can also be filtered/transformed pretty easily.

E.g. in https://engineeringblog.yelp.com/2021/04/powering-messaging-... they run `datapipe datalake add-connection --namespace main --source message_enabledness`, which results in the `message_enabledness` table being streamed into a (daily?) parquet snapshot in S3, registered in AWS Glue.

It is open source but it's more of the "look at how we did this" open source VS the "it would be easy to stick this into your infra and use it" kind of open source :(

discuss

No comments yet.