(no title)
alibero | 2 years ago
I remember it being pretty simple (like, run one or two bash commands) to get a source table streamed into a kafka topic, or get a kafka topic streamed into a sink datastore (S3, mysql, cassandra, redshift, etc). Kafka topics can also be filtered/transformed pretty easily.
E.g. in https://engineeringblog.yelp.com/2021/04/powering-messaging-... they run `datapipe datalake add-connection --namespace main --source message_enabledness`, which results in the `message_enabledness` table being streamed into a (daily?) parquet snapshot in S3, registered in AWS Glue.
It is open source but it's more of the "look at how we did this" open source VS the "it would be easy to stick this into your infra and use it" kind of open source :(
No comments yet.