top | item 24265041

Snowflake S-1

170 points| kressaty | 5 years ago |sec.gov

100 comments

order
[+] veritas3241|5 years ago|reply
Snowflake is the go to data warehouse in my opinion. Redshift and BigQuery are fine, but Snowflake is head and shoulders above. Good community around it and tools for it (dbt - works on other warehouse though). They have the mindshare in the data warehouse market.

There's so much they can do from a user experience perspective to make it even better. The integration with Numeracy was a trainwreck, but the fundamentals of the DB are there.

Interesting to see they lose so much money, but I bet their margins have to be so thin running on the cloud. I wonder if they'll ever have to go bare metal to make it work.

[+] FridgeSeal|5 years ago|reply
I could not disagree more.

Working with it was fraught with issues. Performance was mediocre at best, it was horribly expensive, Python and JS client libs had re-occurring issues with disconnecting and reconnecting. The advice given to us around scaling concurrent connections was bizarre at best. Teammates had numerous issues where it was clear corners had been cut in handling some edge cases around handling certain unicode characters. Their Snowpipe "streaming" implementation was...not good. The idea of having having compute workers that "spun up and down" sounded good in theory, but in practice lead to more bottlenecks and delays than anything else.

The AWS outage last year that prevented you from provisioning new instances essentially crippled our snowflake DB.

I almost go out of my way to recommend people _not_ use it. I keep seeing it pop up, but mostly because it seems they're doing what Mongo DB did in the early days and just throw marketing money to capture mindshare as opposed to being an actually good product.

We changed to ClickHouse and the difference was literally night-and-day. The performance especially was far superior.

[+] dataminded|5 years ago|reply
I can't believe that they will succeed in the long run as an independent player IN the cloud.

They are always going to be less integrated and less infrastructure-cost-efficient than the native options (Redshift and BigQuery), without the R&D budgets and with incremental friction (sales) and risk (data privacy and cybersecurity).

AWS really should get around to buying them, like they should have bought Looker or Tableau or Mode or Fivetran or DBT, etc, ect.

[+] manigandham|5 years ago|reply
Snowflake is better than Redshift but BigQuery has improved greatly in the last 2 years to fill in a lot of the missing gaps. I find Snowflake is the best at dealing with unstructured/JSON data and handling interactive results on smaller datasets while BQ is great with serverless scaling and very large computations.
[+] deepGem|5 years ago|reply
"Our business benefits from powerful network effects. The Data Cloud will continue to grow as organizations move their siloed data from cloud-based repositories and on-premises data centers to the Data Cloud. The more customers adopt our platform, the more data can be exchanged with other Snowflake customers, partners, and data providers, enhancing the value of our platform for all users. We believe this network effect will help us drive our vision of the Data Cloud."

I fail to understand this network effect. Is there any conflation here ? How does data sharing equate to network effect. Something is fundamentally not adding up here. If I share my data with 10 other customers, it should inherently enhance my experience. How does this happen with Snowflake ?

[+] tyingq|5 years ago|reply
One of the barriers for Snowflake is that while it's better than what AWS offers, very few customers start out needing everything Snowflake does. They grow into that. So they stick with AWS, hoping that the features/capabilities there grow fast enough to keep up.
[+] afpx|5 years ago|reply
But, also very expensive. You can do queries on a spark cluster for tiny fractions of what they charge. But, snowflake makes things easy for the "decision makers" (who know SQL). So, all good.
[+] supernova87a|5 years ago|reply
Wow, Sales + Marketing pretty much 1.5x their revenue ($265M), swinging them from +50% operating margins to -150-200% net margins. They are really trying to cram this product down people's throats, huh?
[+] knes|5 years ago|reply
We at Census (https://getcensus.com) are super excited by this S1 filing. Before Snowflake (and Bigquery and Redshift), data was seen as something only the fortune 500 could afford by buying Hadoop clusters and throwing an army of scientists and engineers at it.

But Snowflake has really led the way to democratize Data Warehouse the past few years and educating the market. You can start on a $50/month plan, and in our experience, the pricing scale nicely with the value you are getting out of the data. Snowflake (and Bigquery) also made it a lot less scary to get started by having an easy way to ETL data from 3rd parties (google ads, Salesforce, prod DB, etc.) to your warehouse.

Thank you, Snowflake, for paving the path for startups like Census, Fivetran, DBT, Mode to help (data) engineers and analysts do more with their data

[+] mint2|5 years ago|reply
If I hear census, especially with a capital “C”, I don’t think a business or startup. Are you in the US? Do a lot of people express confusion? How will you trademark your name?
[+] ganoushoreilly|5 years ago|reply
Now I know why our teams internally have been hammered by sales at Snowflake for the past 4 months. Like, relentless, to the point where I doubt we'd entertain their solution even if we had a need. Sorta like Datadog..
[+] chrisjc|5 years ago|reply
What are you using atm? After dealing with Redshift for a few years, Snowflake was and continues to be a breath of fresh air.
[+] rotten|5 years ago|reply
Their spam is so annoying I would never consider their product. I've had to block them from all of my email accounts.
[+] aqme28|5 years ago|reply
They wouldn't stop calling me even when I told them to stop. I wouldn't use them either just out of frustration.
[+] iblaine|5 years ago|reply
Snowflake is the best all around DW product out there. It commonly gets compared to Redshift, but Amazon built Redshift on top of ParAccel's technology. Snowflake built its database from scratch. Most of the Snowflake founders have PhDs with an emphasis on distributed systems and I think you see that in the product.

I can't say enough good things about snowflake, and I have plenty of criticism to throw at hadoop, redshift, asterdata & vertica.

[+] mabbo|5 years ago|reply
Maybe I'm not really wise to the world of finance and what-have-you, but how many S-1s do I need to see in short succession before I start to ask: "What's going on, guys?"

Like, is this a indication that a lot of people are trying to exchange their companies for hard cash as quickly as possible? It kind of looks a lot like that. This is what, the 3rd or 4th one to hit the HN front-page lately?

Is... is that a bad sign?

[+] dang|5 years ago|reply
When one post is on HN's front page it's common for there to be a rush of follow-up posts. Usually we downweight these since there's a power-law dropoff in how interesting they are. Some of the moderation principles relating to this:

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...

I'm wondering which of these S-1s are particularly interesting to discuss in their own right, vs. which are just follow-up/copycat-style threads? Unity is getting a specific discussion (https://news.ycombinator.com/item?id=24261559) but the other ones seem pretty generic. Oh yeah, that's another relevant principle: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor....

[+] jwatte|5 years ago|reply
Business was at a standstill in 2008. It takes 8-10 years to go from embryo to IPO these days. Thus, a lot of companies got started in 2010-2012, and are now reaching IPO maturity. I think this is a "ketchup bottle" effect rather than a sign of the end times.

(That being said, the end times may ALSO be upon us, but not because of this particular sign.)

[+] outworlder|5 years ago|reply
5 from companies in the Bay Area alone. There were more.
[+] TallGuyShort|5 years ago|reply
5 S-1 filings on the front page right now, mostly from tech companies I've heard of. Why the sudden spike?
[+] grey-area|5 years ago|reply
We're near the top of a huge an unprecedented bubble in tech stocks. Top 5 stocks are tech and are 23% of the S&P 500, all time highs in the middle of a pandemic with the global economy stuttering. That's a greater concentration than the year 2000. So it's a good time to IPO for tech stocks.
[+] nemo44x|5 years ago|reply
Seems like a warning sign...
[+] pupdogg|5 years ago|reply
> "Our net loss was $178.0 million and $348.5 million for the fiscal years ended January 31, 2019 and 2020, respectively"

Is that supposed to sound attractive to institutional investors?

[+] corford|5 years ago|reply
We're hitting some perf issues with Snowflake at work (not necessarily due to Snowflake itself but possibly more what we're trying to do with it: data warehouse storage needs but also a need for close to real-time analytical querying over that data). Has anyone here had any good/bad experiences with MemSQL?
[+] AdamProut|5 years ago|reply
This use case sounds like a good match for MemSQL at a high level (analytics with an SLA is our bread and butter).

We'll also have a lot of elasticity features of snowflake shortly without sacrificing our performance advantages. (https://www.memsql.com/blog/the-future-is-bottomless)

(disclosure: MemSQL CTO).

[+] turk-|5 years ago|reply
Spark/Databricks + Delta lake would be a good solution for combining streaming and batch analytics. MemSQL is good for low latency streaming.
[+] chrisjc|5 years ago|reply
Have you considered using change streams to keep a "view" up to date? Do you have a good cluster-key defined?

Very curious about your performance issues?

[+] flyinglizard|5 years ago|reply
What is the difference between a "Data Warehouse", a "Data Lake" and a plain old managed SQL Server instance I run on Azure?
[+] manigandham|5 years ago|reply
Data Warehouse is usually a relational database designed for large OLAP analysis with features like column-oriented storage, vectorized processing, and distributed scale-out architecture. Since it's a database, the focus is on strong schemas and structured data, although all major systems also support JSON datatypes now.

Data Lake is usually object storage or other large storage pool with raw files. These can be different formats like JSON, AVRO, Parquet containing with strong schemas or unstructured data. Processing can be done by engines like Spark, Presto, Drill, etc that support less advanced SQL but more robust access across data files and storage locations. The point is to serve as a general dumping ground or "lake" of all the data and then manage it afterwards (including cleaning and moving important records to a data warehouse).

SQL Server is a single-node OLTP relational database but most database engines are fast enough now that you can do everything you need up to hundreds of millions of rows. Best SQL and feature support with full update capabilities. Some DBs like SQL Server have also added OLAP features like columnstore tables to further delay or eliminate the need for a data warehouse.

[+] tormeh|5 years ago|reply
Mostly how much data there is, and how structured it is. Not really sure what the difference between data lake and warehouse is, but either of them will typically have less structured data and more of it than an SQL server. We're talking petabyte-scale. Sure, you can get 16TB drives, but it's still a stretch to put it all on a single machine. Data should ideally be stored as parquet or similar, but there's probably a lot of JSON out there. Couple it with something like Athena, and you can query in SQL. Spark for more complicated stuff.
[+] trumpeta|5 years ago|reply
data lake is where all your messy historical timestamped immutable data goes so its not lost. data warehouse is where you make sense of it. and your old sql server is just the current snapshot.
[+] kerng|5 years ago|reply
SQL instance is fast and for transactional systems - like stock exchange, purchasing something,... called schema on write.

DW is for analytics and reporting.

Data Lake is like many DWs together and other, often "garbage" data, which "might" be useful in future analysis, ML and stuff. It's the unstructured graveyard of data (joking). Schema is defined on read.

[+] foota|5 years ago|reply
My impression? Data warehouses are more fully featured than a data lake, whereas a data lake implies primarily the storage, with other systems querying it. Sql server is orthogonal in that you need neither if all your data fits in a single sql database (or alternatively, sql server is a small scale data warehouse).
[+] eganist|5 years ago|reply
Snowflake's put in more effort around security than I've seen from other data warehouses (that have offered me e.g. AWS SOC3s rather than their own SOC2type2s).

Just my experience. Glad to see them reaching for cash. They're effective at what they do.

[+] alexbanks|5 years ago|reply
Snowflake is by far the best data warehouse I have ever used. I would use it at any job where data warehousing was a keystone of our work. Really 10/10, not even close.
[+] FridgeSeal|5 years ago|reply
Out of curiousity, have you used ClickHouse?

Because I had the opposite experience - you literally couldn't pay me to use Snowflake again.

[+] mobileexpert|5 years ago|reply
Dumb question, given the filing today what is the earliest date it will be listed on the NYSE? Google time between S-1 and IPO gives a bunch of wishy washy answers.
[+] veritas3241|5 years ago|reply
I had the exact same question. Pg 127 seems to give the answer.

> Date Available for Sale in the Public : The 91st day after the date of this prospectus (First Release).

Edit: Seems like I was wrong. This is for current shareholders. I saw somewhere on the internet sometime in October. That's a wild guess though.

[+] publiccomps|5 years ago|reply
I just wrote a S-1 teardown of Snowflake: https://blog.publiccomps.com/snowflake-s1-ipo-teardown/

Would love feedback! Included some helpful quotes from this thread too on why Snowflake vs Redshift.

[+] yrotsih|5 years ago|reply
With all the hype over last few years, thought they had half a billion revenue, instead a paltry $265MM in 2020(per their chart) and a loss of $365MM. In comparison, teradata has $2B revenue in 2019(market cap < $3B). Just another VC fueled. Wait for an year after IPO. The real value will be clear.
[+] huac|5 years ago|reply
They doubled revenue over the last year and expanded gross margin by 25% ... seems good. Looks like most of the incremental loss comes from expanded marketing costs - but their net revenue retention rate is between 150% and 200%, which is extremely strong. Unfortunately no reported cohort metrics.
[+] swalsh|5 years ago|reply
They're going to have to add a new section "daily S-1 filings" if this keeps up, we can put it next to the show hn link.
[+] yrotsih|5 years ago|reply
Isn't it unusual for the VCs to own close to 67%. Bankers made money! Not the actual engineers who toiled away!
[+] staysaasy|5 years ago|reply
Absolutely phenomenal product and company. I have a huge enterprise SaaS crush on these folks. Very solid, thoughtful team as well in my experience.

Their massive marketing spend is interesting, I suspect that they perceive themselves to be the first (or at least, strongest) mover in a once-in-a generation land grab.

[+] Konohamaru|5 years ago|reply
Wow the styling of that legal document is just gorgeous!