top | item 29246468

(no title)

turk- | 4 years ago

I don’t understand what capability you are saying Databricks lacks. This capability is literally the entire premise of the Data Lakehouse. With Snowflake you need to export data out/or pipe data over jdbc/odbc to an external tool. With Databricks you can use SQL for data warehousing and when you need you can work with that same data using python to train an ML model without piping data out over jdbc (using the spark engine). One security model, one dataset, multiple use cases (AI/ML/BI/SQL) on one platform.

discuss

buttaphingas|4 years ago

They're still lacking things in the SQL space. For example, Databricks say they're ACID compliant, but it's only on a single-table basis. Snowflake offers multi-table ACID consistency, which is something that you would expect by default in the data warehousing world. If I'm loading, say, 10 tables in parallel, I want to be able to roll-back or commit the complete set of transactions in order to maintain data consistency. I'm sure you could work around this limitation, but it would feel like a hack, especially if you're coming from a traditional DWH world (Teradata, Netezza etc.).

Snowflake now offers Scala, Java and Python support, so it would seem their capabilities are converging even more, but both with their own strengths due to their respective histories.

doppelganger1|4 years ago

Actually, you would expect that in an OLTP world. DW's for the longest time, even Oracle, recommends you disable txn to get better performance. The logic is implemented in the ETL layer. Very rarely do you need multi-table txn in large scale DW.

Snowpark is still inferior.