(no title)
gnur | 4 years ago
This is actually a take most SRE's would / should believe. Every added 9 to the reliability increases the price exponentially. Finding the correct level of reliability is something most companies should focus more on, because sometimes a single physical machine that could go down once a year for a few hours is perfectly capable of providing all the resources a medium seized business could need. Proper backups, monitoring and recovery runbooks can even decrease the downtime of such a simple system to minutes, while easily saving you maybe thousands per month.
sokoloff|4 years ago
Slowing your young company down in order to turn 0.9995 to 0.9998 is almost always a terrible trade. Even turning 0.995 to 0.999 is hard to justify in most places. (That improvement saves about 35 hours of downtime per year.)
auggierose|4 years ago
rglullis|4 years ago
So we needed a web server, a database, a queue system to run these heuristics and we needed to host/distribute ~100GB worth of content, most of it video.
We were bootstrapping, so I was trying to (1) save as much as possible on operational costs and (2) punt on all the "scaling issues" that would require more of my devops time that would be better spent developing and adding more features. I deployed the whole system on a single server from Hetzner: Django app, Postgresql, Redis for caching and session management, RabbitMQ for celery. All in one machine with 32GB of RAM and a RAID system with enough capacity to hold the data. I think it was costing us less than 50€/month. That is all we needed to (easily) serve ~800 students and the staff who would author new content.
In the end we delivered everything we promised to our first customer, but we were not able to grow our revenue as much as we expected, so by end of 2013 we just put the whole company on the backburner, got a small maintenance contract with the main customer and went on to find another jobs.
From end-2013 until 2018, I needed only to make sure that our domains and SSL certificates were up-to-date every six months, upgrade django packages in case of security issues and deal with ONE incident (in 2016 IIRC) where a disk failure put the array in degraded mode, which I solved by getting a new server at Hetzner (better specs and cheaper, after all those years), warning the customer that the service would be taken offline for a couple of hours later in the day, rsyncing the content, restoring the database and redeploying the application with the fabric script.
This is one the projects that I am most proud of what was accomplished given all the constraints and made me realize the difference between a Software Developer and an Engineer. Yet, it translates to a very poor entry on an CV. We are too used to ask on interviews what people have done and what technologies they have used, but we rarely ask about the moments where it was best to avoid doing something.
y4mi|4 years ago
The speed is incredible if compared to ec2 or root server performance from other vendors. Even if they've dedicated resources.
_3u10|4 years ago
Why anyone would run their pointer chasing code in a heavy cache eviction environment is beyond me. The code is slow to start with, and then you make sure that none of your data is in the cache. Why you'd pay 10x for slower hardware makes no sense.
What people should be doing is running on bare metal and turning off all the garbage meltdown protections that kill performance. If you're not a cloud provider and you're allowing people to execute arbitrary code on your hardware, you've got much bigger problems than meltdown.
KronisLV|4 years ago
That does sound like a really good deal!
Until now i've only been using VPSes (apart from homelab servers as CI nodes etc.) because they're cheaper for the smaller sizes, but for comparison's sake, the cheapest VPS provider's (that i know of and trust) offering with 64 GB of RAM and 640 GB of storage would cost ~260 euros a month: https://www.time4vps.com/?affid=5294
Well, i guess there's also other VPS providers out there that can nearly match the price, like Contabo, though they do have mixed reviews: https://contabo.com/en/ (personally i just found their UI to be extremely dated and there are setup fees, but otherwise they were decent), though even then they'd cost anywhere from 30 - 90 euros a month.
bennyp101|4 years ago
kuon|4 years ago
goodpoint|4 years ago
On small platforms we are still stuck into the 1990's approach of having one reliable system.
We need distributed[1] systems and protocols even in small applications. Easy to use and self-healing.
[1] No, I'm not talking about blockchains
Spooky23|4 years ago
handrous|4 years ago
Three nines also means you can't afford to intentionally take a system down to work on it, or you'll burn all your "oopsie" downtime. That means a ton more work in infrastructure and deployment processes, than two nines.
jamespwilliams|4 years ago