top | item 24265319

(no title)

veritas3241 | 5 years ago

Snowflake is the go to data warehouse in my opinion. Redshift and BigQuery are fine, but Snowflake is head and shoulders above. Good community around it and tools for it (dbt - works on other warehouse though). They have the mindshare in the data warehouse market.

There's so much they can do from a user experience perspective to make it even better. The integration with Numeracy was a trainwreck, but the fundamentals of the DB are there.

Interesting to see they lose so much money, but I bet their margins have to be so thin running on the cloud. I wonder if they'll ever have to go bare metal to make it work.

discuss

order

FridgeSeal|5 years ago

I could not disagree more.

Working with it was fraught with issues. Performance was mediocre at best, it was horribly expensive, Python and JS client libs had re-occurring issues with disconnecting and reconnecting. The advice given to us around scaling concurrent connections was bizarre at best. Teammates had numerous issues where it was clear corners had been cut in handling some edge cases around handling certain unicode characters. Their Snowpipe "streaming" implementation was...not good. The idea of having having compute workers that "spun up and down" sounded good in theory, but in practice lead to more bottlenecks and delays than anything else.

The AWS outage last year that prevented you from provisioning new instances essentially crippled our snowflake DB.

I almost go out of my way to recommend people _not_ use it. I keep seeing it pop up, but mostly because it seems they're doing what Mongo DB did in the early days and just throw marketing money to capture mindshare as opposed to being an actually good product.

We changed to ClickHouse and the difference was literally night-and-day. The performance especially was far superior.

veritas3241|5 years ago

Sorry you had a bad experience with it. Certainly compared to redshift it's a dream. I use it every day and it's been great for us.

dataminded|5 years ago

I can't believe that they will succeed in the long run as an independent player IN the cloud.

They are always going to be less integrated and less infrastructure-cost-efficient than the native options (Redshift and BigQuery), without the R&D budgets and with incremental friction (sales) and risk (data privacy and cybersecurity).

AWS really should get around to buying them, like they should have bought Looker or Tableau or Mode or Fivetran or DBT, etc, ect.

bpodgursky|5 years ago

Snowflake is wildly better than Redshift, no matter how you want to look at it -- integrations, cost, performance, etc.

Like, in a sane world I agree with you -- Redshift SHOULD have a crazy competitive advantage. But somehow they've been unable to execute on that goal for half a decade, and I don't see that changing quickly, given Snowflake's mindshare and growth.

hodgesrm|5 years ago

You don't need to own the public cloud infrastructure to build a better product.

Example: you can play inside ball on storage infrastructure costs to get a 2x cost benefit at the expense of a lot of extra engineering. Better DBMS storage organization, which is available to any implementation, gets you 10x (or greater) improvement. Which would you rather have?

In fact, products like Redshift don't even really game the infrastructure prices. Costs to customers are comparable with Snowflake for equivalent resources as far as I can tell. They both charge what the market will bear.

manigandham|5 years ago

For them to be an attractive acquisition target means they are succeeding, otherwise what would a cloud vendor gain from buying them?

manigandham|5 years ago

Snowflake is better than Redshift but BigQuery has improved greatly in the last 2 years to fill in a lot of the missing gaps. I find Snowflake is the best at dealing with unstructured/JSON data and handling interactive results on smaller datasets while BQ is great with serverless scaling and very large computations.

deepGem|5 years ago

"Our business benefits from powerful network effects. The Data Cloud will continue to grow as organizations move their siloed data from cloud-based repositories and on-premises data centers to the Data Cloud. The more customers adopt our platform, the more data can be exchanged with other Snowflake customers, partners, and data providers, enhancing the value of our platform for all users. We believe this network effect will help us drive our vision of the Data Cloud."

I fail to understand this network effect. Is there any conflation here ? How does data sharing equate to network effect. Something is fundamentally not adding up here. If I share my data with 10 other customers, it should inherently enhance my experience. How does this happen with Snowflake ?

suhel|5 years ago

This is one hypothetical way they could capture this value:

1) Building a common platform to upload datasets by anyone. e.g. weather data, retail data, govt data, other open data, or close data (copyright etc). They gave the example of COVID cases in their S-1 doc.

2) Providing mechanism for others to find data through a marketplace; some data is free, other only via payment (with diff monetisation models, e.g. per consumption, per month). Allow other customers to consume it as & when needed. Note, based on their S-1 doc, data is never copied when shared with others, so cost is limited to share with a wide audience.

3) More data on the platform, more data is shareable in the 'marketplace' and more data used by everyone. This increases the value of the whole platform through network effects.

4) Also opens up alternative revenue streams. e.g. more revenue through storage (more data on platform from different people). and revenue from shared data that is consumed (maybe)

Here is a company that is doing something similar in Australia. https://www.datarepublic.com/solutions/use-cases/data-collab...

veritas3241|5 years ago

I'm a little skeptical of this as well, but I think there is a path. At my previous company we would take in a lot of data from other companies and do analysis for them. If we had a really easy way to share the transformed and analyzed data after it's been modelled in the warehouse, that really would have been great. The question is, are you going to get companies to create a snowflake account just so they can access data in this way? Maybe if it's easy to export / do further analysis.

tyingq|5 years ago

One of the barriers for Snowflake is that while it's better than what AWS offers, very few customers start out needing everything Snowflake does. They grow into that. So they stick with AWS, hoping that the features/capabilities there grow fast enough to keep up.

afpx|5 years ago

But, also very expensive. You can do queries on a spark cluster for tiny fractions of what they charge. But, snowflake makes things easy for the "decision makers" (who know SQL). So, all good.

jwatte|5 years ago

Having run a medium size Spark cluster, I'm not sure I agree.

If you have 80-100% utilization for a month, perhaps, but the beauty of Snowflake is that you can spin up a 3XL warehouse for a few MINUTES to get answers fast, and then shut it down again and don't pay anything.

Saying "you could run it on self-managed Spark/Oracle/Hive/SQLite" is approximately the same argument as saying "I can run a web server cheaper myself than paying Amazon for an EC2 instance" -- there are cases where that is true, but there are many, many, cases where the "on demand capacity" is the bigger benefit.