top | item 44496537

Analyzing database trends through 1.8M Hacker News headlines

175 points| vercantez | 7 months ago |camelai.com

93 comments

order

codeulike|7 months ago

MS Sql Server not even mentioned. This tells us there is a whole world almost totally omitted from discussion on HN: "Enterprise"

thewebguyd|7 months ago

Oracle isn't in there either, which goes to show how much of a bubble HN actually is considering MSSQL and Oracle are #1 and #2 in market share.

diggan|7 months ago

> This tells us there is a whole world almost totally omitted from discussion on HN

It doesn't though, all it tells you is that it's missing from the headlines in the submissions.

"Enterprise" is discussed on HN too, but inside submissions that aren't exclusively about MS Sql Server. Try searching for some terms on the Algolia HN search, order by date and filter by comments and you'll find the subthreads/submissions where it's discussed :)

fullstackchris|7 months ago

There is a reason it is not even mentioned

conradkay|7 months ago

There's an online playground with the data here: https://play.clickhouse.com/

Wrote up this query:

  SELECT
    db_name,
    sum(if(type = 'comment', 1, 0)) AS comment_mentions,
    sum(if(type = 'story', 1, 0)) AS post_mentions,
    count(*) AS total_mentions,
    sum(score) as total_score
  FROM hackernews
  ARRAY JOIN
    extractAll(replaceAll(LOWER(text), ' ', ''), '(sqlite|postgres|mysql|mongodb|redis|clickhouse|mariadb|oracle|sqlserver|duckdb)') AS db_name
  WHERE toYear(time) >= 2022
  GROUP BY
    db_name
  ORDER BY
    post_mentions DESC;

Imustaskforhelp|7 months ago

Very interesting, where does the play.clickhouse get its hackernews data from though? There isn't any url link from where it fetches.

Does play.clickhouse contain all the HN data so that we can play with it?

xnx|7 months ago

More unsolicited feedback: Month-by-month is kind of noisy. You might do 3 month average to smooth it a little and make the trend clearer.

Aachen|7 months ago

Is MariaDB included in MySQL? I see no mention of it in the post, but MySQL trending downwards would make sense as people upgrade and switch over. Besides of course novelty wearing off as posited for all engines further down the post

evanelias|7 months ago

> Is MariaDB included in MySQL?

I was wondering the same, but I'm not sure if it would make a major change in the graphs. MySQL and MariaDB have both been unpopular on Hacker News for many years. Submissions on either topic rarely get much traction, which then leads to fewer submissions.

> MySQL trending downwards would make sense as people upgrade and switch over.

No, most large MySQL users are still using MySQL; there hasn't been a widespread migration to MariaDB. They're both actively developed and have grown in slightly different directions. Among corporations, MySQL's usage still far outstrips MariaDB by a significant degree. Lately MariaDB has better product velocity though, and their commercial enterprise finally seems to have stable footing.

tonymet|7 months ago

is anyone seriously using it? even their own brand facepile is pretty weak

Tepix|7 months ago

Sqlite seems to be growing recently which matches my perception, but it‘s not listed among the growing databases. Weird.

vercantez|7 months ago

Yeah I found a mistake in the analysis. I'm updating the post to reflect SQLite's popularity.

vercantez|7 months ago

SQLite is now reflected in the growth table

vercantez|7 months ago

UPDATE: Added a weighted average analysis based on story points and comments. SQLite ranks highest in points per story and Redis ranks highest in comments per post. Also added SQLite to the growth table. I had accidentally deleted this row in the original post.

kwillets|7 months ago

Snowflake seems to have peaked; 2023 was hellish dealing with roomfuls of inexperienced devs and even architects convinced it was the fastest cheapest thing ever.

redwood|7 months ago

Well as pointed out above since Oracle and SQL Server don't even show up.. this simply does not reflect enterprise and Snowflake and Eatabricks both lean Enterprise

Aachen|7 months ago

The data query tool linked at the bottom of the post doesn't work for me. Cloudflare shows error 600010, whatever that means. Nice that there is "no login required" but if it did, or allowed that option, maybe it wouldn't need an algorithm to decide whether my traffic is abusive because you could block abusive accounts instead

98codes|7 months ago

Interesting to see SQL Server not listed here, am curious whether it didn't have enough signal, or suffered from being a two-word product, with "SQL" being far too generic on its own.

jiggawatts|7 months ago

I’ve also don’t remember SAP HANA, Oracle, or DB2 mentioned even once here but believe me, along with MSSQL these occupy most of the top ten database deployments world wide.

Something that I’ve been thinking about a lot recently is that all of the proprietary vendors are quietly strangling their flagship products.

Free and open source database engines were always “nipping at their heels” but weren’t a serious threat for decades. Only other proprietary engines were.

Now that PostgreSQL has more features than SQL Server and better performance, it’s a serious competitor.

But Microsoft is holding MSSQL’s face under water with core-based licensing. It means that per dollar you get dozens of times less compute available for your data than with open-source systems. That ratio is growing exponentially, because they haven’t redone their pricing in… ever.

Oracle and DB2 are being similarly choked off at the same rate, so looking left and right at their direct competition their respective product managers haven’t noticed the problem, which is akin to Fuji and Kodak raising film prices in lockstep just as digital photography is taking off.

We’re entering the era of “kilocores”: single servers becoming available that have over a thousand cores. You can’t imagine what per-core licensing costs for something like that!

PS: I saw a similar dynamic play out in the network space with load balancers and “web accelerators” like NetScaler sold “by bandwidth” with a starter SKU as small as 2 Mbps. I kept trying to politely explain to the reps that the smallest cloud VMs can cheerfully put out 10 Gbps, and hence their product is a 500x decelerator. They eventually listened to someone and made it bandwidth-unlimited. Too late. Everyone uses NGINX now.

RadiozRadioz|7 months ago

It is also less mentioned on the site in general, owing to it being a proprietary Microsoft product in an audience of people who primarily go for Free / Open Source non-Microsoft products.

There are some people here who are interested in corporate Europe or <insert Microsoft foothold place/industry here>, but most are aligned with Silicon Valley hackers.

Cthulhu_|7 months ago

That's really interesting; I knew postgres was the most popular database on here, but also looking at that chart, SQLite had a burst of popularity on HN last year.

jeffbee|7 months ago

Is it weird or just me that bigquery is mentioned, but bigtable and spanner are not? The article presents a grab-bag of database concepts that do not seem related. BigQuery and PostgreSQL are just fundamentally different things.

It all makes me wonder what is the biggest "dark" database, the one nobody on HN wants to talk about, but it's out there serving the most transactions.

Imustaskforhelp|7 months ago

I really wanted to see the chat with HN data option or something https://camelai.com/hackernews/

But I am stuck at the cloudflare cf turnstile challenge and when I do click on it and it works, it shows error occured try again.

So frustating since I was so curious.

Imustaskforhelp|7 months ago

I almost knew that postgresql would be the winner just because of how much people recommend it here or literally anywhere. Postgres is cool.

My personal favourite depending on situations are postgres (technically supabase is postgres too),sqlite,duckdb,(valkey?)

I am just curious but guys what are your favourite options and why?

chickenzzzzu|7 months ago

the funniest thing about this graph is that it proves there was a raw drop off in all popularities in the last 2 years, which of course directly coincides with the great layoffening that has been happening for almost 3 years now.

this shows that people are definitely rotating out of "web technologies" in general, not because they aren't useful, but because the money isn't there anymore.

perhaps a large chunk have switched to AI hype trains, and it would be interesting to compare raw results of different AI headlines, but i suspect maybe 30% of people have left tech all together.

redwood|7 months ago

I think it's attention and mindshare going to AI

123yawaworht456|7 months ago

>a ClickHouse database of every HN story

I remember downloading it a few years ago, but the bookmark I have is dead. where is it now? is it still public?

jabart|7 months ago

Still Public, still chews through million->billion or rows in seconds. Their Cloud version has some Cloud specific features. A few vendors have build custom thing on top or custom builds off the open source project too.

bix6|7 months ago

Any commentary on DuckDB from users? I keep hearing about it but am not a user myself. Is it a fad or here to stay?

xnx|7 months ago

Would be great to share the queries. Are these results weighted for storypoints and/or number of comments?

vercantez|7 months ago

Purely based on headline occurrence but weighing based on storypoints and comments is a great idea. I'll update the blog, thanks.

vercantez|7 months ago

Updated with weighted analysis.

esafak|7 months ago

How are you handling sanitization? Anything interesting?

xnx|7 months ago

Confusingly, I just came across the unrelated https://www.camel-ai.org/ today.

bellareed|7 months ago

Sooo confusing. We've debated changing our name but can't bring ourselves to break up with our cute camel logo lol.

nsbk|7 months ago

Some of the insights match my personal experience and preferences. At $dayjob we're migrating from Mongo to TimescaleDB (now TigerData ¯\_(ツ)_/¯) which is basically a PostgreSQL extension for time series data and couldn't be happier. We are getting better performance and massive storage savings.

On the analytics side of things we are starting to use DuckDB for some development efforts, but we are keen on potentially replacing some or all of our Snowflake usage with DuckDB.

throw_m239339|7 months ago

Can you tell me, the scenarios you used MongoDB for? Because I'm still curious about why would anyone use MongoDB after all these years.

RS-232|7 months ago

No SQLite?

vercantez|7 months ago

Mistake in the analysis. Fixing now.

markwclancy|7 months ago

Absolute drivel. Comparing operational/transactional databases like MongoDB and Postgres to analytics / columnar datastores like Redshift and Snowflake is meaningless. You might as well as say "...the popularity of hammers is way up, with screwdrivers appearing to be in decline..". If this is the type of data analysis that AI is supporting, we're all in trouble.