(no title)
mavam | 2 years ago
1. Acquire bytes (void → unstructured)
2. Parse bytes to events (unstructured → structured)
3. Transform events (structured → structured)
4. Print events (structured → unstructured)
5. Send bytes (unstructured → void)
The "Publish" part is a combination of (4) and (5). Sometimes they are fused because not all APIs differentiate those steps. We're currently focusing on building blocks (engine, connectors, formats) as opposed to application-level integrations, so turnkey Reverse ETL is not near. But the main point is that the symmetry reduces cognitive effort for the user, because they worked that muscle on the "E" side already and now just need to find the dual in the docs.
[1] https://docs.tenzir.com/blog/five-design-principles-for-buil...
code_biologist|2 years ago
The primitives of many of these ETL systems are structured tables (snowflake, parquet, pandas dataframes, whatever) and I don't think I'd ever choose bytes over structured tables. The unstructured parts of data systems I've worked on have always chewed up an outsize portion of labor with difficult to diagnose failure modes. The biggest cognitive effort win of reverse ETL solutions has been to make external systems and applications "speak table".
mavam|2 years ago
In security, binary artifacts are common, e.g., to scan YARA rules on malware samples and produce a structured report (“table”). Turning packet traces into structured logs is another example. Typically you have to switch between a lot of tools for that, which makes the process complex.
(The “void” type is only for symmetry in that every operator has an input and output type. The presence of void makes an operator a source or sink. A “closed” pipeline invariant is one with source and sink, and only closed pipelines can execute in our mental model.)