buttaphingas | 1 year ago | on: Snowflake Arctic Instruct (128x3B MoE), largest open source model
buttaphingas's comments
buttaphingas | 2 years ago | on: Most companies do not need Snowflake or Databricks
buttaphingas | 2 years ago | on: BlazingMQ: High-performance open source message queuing system
buttaphingas | 3 years ago | on: Why is Snowflake so expensive
Would love to know the TCO trade-off between procuring, securing and deploying on your own clusters vs having them managed via SaaS.
buttaphingas | 3 years ago | on: Why is Snowflake so expensive
That said, they've kind of introduced it with the Search Optimization Service, which is like an index across the whole table for fast lookups, but even that is automatically maintained in your behalf.
buttaphingas | 4 years ago | on: Databricks response to Snowflake's accusation of lacking integrity
Snowflake now offers Scala, Java and Python support, so it would seem their capabilities are converging even more, but both with their own strengths due to their respective histories.
buttaphingas | 4 years ago | on: Databricks response to Snowflake's accusation of lacking integrity
Snowflake uses the Arrow data format with their drivers, so is plenty fast enough when retrieving data in general. But it would be way less efficient if a data scientist just does a SELECT * to bring everything back from a table to load into a notebook.
Snowflake has had Scala support since earlier in the year, along with Java UDFs, and also just announced Python support - not a Python connector, but executing Python code directly on the Snowflake platform. Not GA yet though.
buttaphingas | 4 years ago | on: Databricks response to Snowflake's accusation of lacking integrity
buttaphingas | 4 years ago | on: Databricks response to Snowflake's accusation of lacking integrity
Databricks say their solution is better because it's open (though keep the optimizations you need to run this at scale to themselves, i.e. is ultimately proprietary). Snowflake says theirs is better because it's a fully managed service, meaning no infrastructure to procure or manage, is fully HA across multiple data centers by default etc.
Databricks push 'open' but really still want you to use their proprietary tech for first transforming into something usable (Parquet/Delta) and then querying with Photon/SQL, though you can also use other tech. With Snowflake you can just ingest and query, but it has to be through their engine.
Customers should do their own valudation and see which one fits their needs best.
buttaphingas | 4 years ago | on: Databricks response to Snowflake's accusation of lacking integrity
buttaphingas | 4 years ago | on: Databricks response to Snowflake's accusation of lacking integrity
buttaphingas | 4 years ago | on: Snowflake’s response to Databricks’ TPC-DS post
[Edit] Highly Available would be a better description per region, as that's out of the box with no configuration. e.g. if a node dies, your cluster will automatically heal and resubmit your query. If there's an entire AZ outage, your query should be resubmitted in another AZ. I think this is why failover/back is called out separately, as that's not automatic, incurs additional costs etc. Here's a link with an explanation: www.snowflake.com/blog/how-to-make-data-protection-and-high-availability-for-analytics-fast-and-easy
I didn't know DB did MVs, masking etc., so yes, that makes sense. Maybe a better idea would be to have a minimum offering comparison, and then a maximum offering comparison (with multi-AZ failover, masking feature costs etc. included) - the reality for a customer would be somewhere between those extremes.
buttaphingas | 4 years ago | on: Snowflake’s response to Databricks’ TPC-DS post
If I'm reading what Databricks published correctly, it seems that they've only used 1 driver node for this benchmark, in other words it's a dev setup. If they want to compare apples-to-apples then they should configure, and price, a multi-AZ HA set-up.
I'm not sure if this is still applicable to Photon, however - can anyone confirm?
buttaphingas | 4 years ago | on: Snowflake’s response to Databricks’ TPC-DS post
The higher editions of Snowflake include features like materialised views, dynamic data masking, BYOK, PCI & HIPAA compliance etc., non of which are required for the benchmark.