(no title)
jdotjdot | 5 years ago
I fully agree with you that for many data science cases, you're likely to stick with batching. Where I see Materialize to be the most useful, and where I'd be inspired to use it and transform how we do things, would be the overlap between when Analytics team are writing definitions (e.g., what constitutes an "active user"?) and are typically doing so on the warehouse, but then I want those definitions to be used, up to date, and available everywhere in my stack, including analytics, my operational database, and third-party tools like marketing tools.
Personally, I'm less interested in one-off migrations like you're suggesting. What I really want is to have something like Materialize embedded in my Postgres. (Such a thing should be doable at minimum by running Materialize + Debezium side-by-side with Postgres and then having Postgres interact with Materialize via foreign data wrappers. It would need some fancy tooling to make it simple, but it would work.) In such a scenario, a Postgres + Materialize combo could serve as the "center of the universe" for all the data AND business definitions for the company, and everything else stems from there. Even if we used a big data warehouse in parallel for large ad hoc queries (which I imagine Materialize wouldn't handle well, not being OLAP), I would ETL my data to the warehouse from Materialize--and I'd even be able to include ETLing the data from the materialized views, pre-calculated. If I wanted to send data to third-party tools, I'd use Materialize in conjunction with Hightouch.io to forward the data, including hooking into subscriptions when rows in the materialized views change.
For what I propose, there are some open questions about data persistence, high availability, the speed to materialize an initial view for the first time, and backfilling data, among other things. But I think this is where Materialize has a good chance of fundamentally changing how analytical and operational data are managed, and I think there's a world where data warehouses would go away and you'd just run everything on Postgres + Materialize + S3 (+ Presto or similar for true OLAP queries). I could see myself using Materialize for log management, or log alerting. I'm just as excited to see pieces of it embedded in other pieces of infrastructure as I am to use it as a standalone product.
arjunnarayan|5 years ago
> Personally, I'm less interested in one-off migrations like you're suggesting. What I really want is to have something like Materialize embedded in my Postgres.
We're about to launch "Materialize as a Postgres read-replica" where it connects to a Postgres leader just as a Postgres read-replica would - using the built-in streaming replication in newer versions of postgres. It's currently in final testing before being released in the next month or two.
https://github.com/MaterializeInc/materialize/issues/5370
Also on our roadmap for Materialize Cloud is plug-and-play connections with Fivetran, Hightouch, and Census (and more) to bring in the business data, and allow you to, as you put it, make Materialize the central collection point for keeping updated all your views.
gbrits|5 years ago
frafra|5 years ago