top | item 26311633

(no title)

jdotjdot | 5 years ago

I should clarify; I don't think that for the general case you have to go all-in on Materialize, but for the case in my comment--where you are effectively using business logic within Materialize views as the "source of truth" of logic across all of both your analytics and your operation--that requires buy-in. Additionally, if I'm _already_ sending all of my data to a database or to my data warehouse, ETLing all of that data to Materialize also is rather burdensome. Just because I technically could run Materialize side by side with something doesn't mean I necessarily want to, especially given the streaming use case requires a lot more maintenance to get right and keep running in production.

I fully agree with you that for many data science cases, you're likely to stick with batching. Where I see Materialize to be the most useful, and where I'd be inspired to use it and transform how we do things, would be the overlap between when Analytics team are writing definitions (e.g., what constitutes an "active user"?) and are typically doing so on the warehouse, but then I want those definitions to be used, up to date, and available everywhere in my stack, including analytics, my operational database, and third-party tools like marketing tools.

Personally, I'm less interested in one-off migrations like you're suggesting. What I really want is to have something like Materialize embedded in my Postgres. (Such a thing should be doable at minimum by running Materialize + Debezium side-by-side with Postgres and then having Postgres interact with Materialize via foreign data wrappers. It would need some fancy tooling to make it simple, but it would work.) In such a scenario, a Postgres + Materialize combo could serve as the "center of the universe" for all the data AND business definitions for the company, and everything else stems from there. Even if we used a big data warehouse in parallel for large ad hoc queries (which I imagine Materialize wouldn't handle well, not being OLAP), I would ETL my data to the warehouse from Materialize--and I'd even be able to include ETLing the data from the materialized views, pre-calculated. If I wanted to send data to third-party tools, I'd use Materialize in conjunction with Hightouch.io to forward the data, including hooking into subscriptions when rows in the materialized views change.

For what I propose, there are some open questions about data persistence, high availability, the speed to materialize an initial view for the first time, and backfilling data, among other things. But I think this is where Materialize has a good chance of fundamentally changing how analytical and operational data are managed, and I think there's a world where data warehouses would go away and you'd just run everything on Postgres + Materialize + S3 (+ Presto or similar for true OLAP queries). I could see myself using Materialize for log management, or log alerting. I'm just as excited to see pieces of it embedded in other pieces of infrastructure as I am to use it as a standalone product.

discuss

arjunnarayan|5 years ago

Thank you very much for the elaboration, I really appreciate the thinking!

> Personally, I'm less interested in one-off migrations like you're suggesting. What I really want is to have something like Materialize embedded in my Postgres.

We're about to launch "Materialize as a Postgres read-replica" where it connects to a Postgres leader just as a Postgres read-replica would - using the built-in streaming replication in newer versions of postgres. It's currently in final testing before being released in the next month or two.

https://github.com/MaterializeInc/materialize/issues/5370

Also on our roadmap for Materialize Cloud is plug-and-play connections with Fivetran, Hightouch, and Census (and more) to bring in the business data, and allow you to, as you put it, make Materialize the central collection point for keeping updated all your views.

gbrits|5 years ago

Exciting! Does this already have the concept of backfilling as mentioned by GP. Interested to know as well

frafra|5 years ago

I agree with you. If you are focused on PostgreSQL, you could find interesting the work being done on IVM: https://wiki.postgresql.org/wiki/Incremental_View_Maintenanc... - https://github.com/sraoss/pgsql-ivm