top | item 29235901

(no title)

drawturkey | 4 years ago

Not entirely true. There is a bi-directional Spark connector for Snowflake written by Databricks. And exporting your data in bulk out of Snowflake into any number of open formats is incredibly easy using the COPY INTO command. You can also use Snowflake on top of Parquet and even Delta Lake.

This is the problem. Both Snowflake and Databricks are spreading FUD and otherwise smart people are falling for it.

discuss

order

feqgmmr2|4 years ago

It is not a "small" cost. The cost is proportional to the size of the data exported.

For all intents and purposes, large amounts of data are locked into Snowflake. Is it theoretically possible to export a petabyte out of SF? Sure.

Do I want to spend money on it? Not really. That is what I mean by the "data doesn't come out".

"Exporting" a petabyte out of Databricks is a no-op. I can already read Deltalake from other open source tools.

glogla|4 years ago

"Exporting PB from Snowflake" is only ever relevant if you want to move from Snowflake to something else. In that case, all other migration costs (recoding, redocumenting and especially revalidating everything, if in regulated environment) are going to make any cost of data movement irrelevant.

This is just FUD.

drawturkey|4 years ago

So if I stop paying Databricks, I can no longer use their proprietary query engine (Photon), right? I have to use something else, like Open Source Spark SQL which is slower and will cost a lot more money.

There are different ways to lock customers in and both Databricks and Snowflake are playing the game.