(no title)
tejasmanohar | 4 years ago
Our average customer runs Hightouch syncs roughly every hour, but we can actually run syncs up to every minute! HT has a lot of optimizations like only sending changes to destinations instead of all data every run.
On the warehouse side, we're seeing a lot of improvements. BigQuery has streaming insert APIs [0] implemented with a parallel database on the backend that's joined at read time. Combined with timestamp partitioned tables (sortable) and our in-warehouse diff'ing, you can actually create a streaming pipeline in Hightouch. Some companies like JetBlue are doing cool stuff with lambda views on top of Snowflake [1]. Our power users at Hightouch are running syncs as fast as every minute.
For wider context, we find 90%+ of business use cases to be just fine in batch. It's amazing to see how many people are still replacing... manual CSV workflows... with Hightouch :)
That said, there are some use cases for truly real-time workflows (e.g. a post-checkout email), and for that, customers either implement outside of Hightouch or lately, we've been fiddling around with letting customers plug directly into streams like Kafka, Kinesis, PubSub - though they lose the power of SQL aggregations _for now_.
Streaming SQL databases like Materialize [2] will fix this fundamentally, and Hightouch can connect to them. Email hello@hightouch.io if you want to try any of the new stuff!
[0]: https://cloud.google.com/bigquery/docs/write-api [1]: https://discourse.getdbt.com/t/how-to-create-near-real-time-... [2]: https://materialize.com/
No comments yet.