top | item 43313885

(no title)

snake_doc | 11 months ago

These just seems like over engineered solutions trying to guarantee their job security. When the dataflows are so straight forward, just replicate into pick your OLAP, and transform there.

discuss

moltar|11 months ago

I came from traditional engineering into data engineering by accident and had a similar view. But every time I tried to make a pipeline from first principles it always eventually turned out something like this for a reason. This is especially true when trying to bridge many teams and skillsets - everyone wants their favourite tool.

jchandra|11 months ago

our approach wasn’t about over-engineering, we were trying to leverage our existing investments (like Confluent BYOC) while optimizing for flexibility, cost, and performance. We wanted to stay loosely coupled to adapt to cloud restrictions across multiple geographic deployments.

polskibus|11 months ago

What is the current state of the art (open source) when doing oltp to olap pipelines in these days? I don’t mean a one-off etl style load at night but a continuous process with relatively low latency?

williamdclt|11 months ago

Idk what the state of the art is, but I’ve used change data capture with Debezium and Kafka, sink’d into Snowflake. Not sure Kafka is the right tool as you don’t need persistence, and having replication slots makes a lot of operations (eg DB engine upgrade) a lot harder though.

txomon|11 months ago

I would say clickhouse