top | item 45328499

(no title)

tingfirst | 5 months ago

Is there a native SQL pipeline tool for ClickHouse that processes real-time data incrementally, with low latency, large throughput and high efficiency, similar to Snowflake’s Dynamic Tables?

[1] Dynamic Tables: One of Snowflake’s Fastest-Adopted Features: https://www.snowflake.com/en/blog/reimagine-batch-streaming-...

discuss

Sep142324|5 months ago

Dynamic Tables are interesting for declarative streaming. In the ClickHouse ecosystem, you might want to look at materialized views combined with streaming engines.

For real-time transformations, there are a few approaches: - Native ClickHouse MaterializedViews with AggregatingMergeTree - Stream processors that write to ClickHouse (Flink, Spark Streaming) - Streaming SQL engines that can read/write ClickHouse

We've been working on streaming SQL at Proton (github.com/timeplus-io/proton) which handles similar use cases - continuous queries that maintain state and can write results back to ClickHouse. The key difference from Dynamic Tables is handling unbounded streams vs micro-batches.

What's your specific use case? Happy to discuss the tradeoffs.

tingfirst|5 months ago

Data sources are usually in Kafka, or other operational databases like Postgres or MySQL

1. Table A : fact events, high-throughput (10k~1M eps), high-cardinality

2. Table B, C, D : couple of dimension tables (fast or slow changing).

The use case is straightforward : join/enrich/lookup everything into one big flattened, analytics-friendly table into ClickHouse.

What’s the best pipeline approach to achieve this in real-time and efficiently?