top | item 13776331

(no title)

234dd57d2c8dba | 9 years ago

No surprise here. Honestly, their team seem to be lower skill or less experienced than I'd have thought.

I would never approve a production system as expansive as Gitlab's to only have two databases in a cluster. That is asking for trouble, and any {sys,db}admin worth their salt will tell you the same. As soon as you need to do anything on one database, you've just lost your cluster policy.

The lack of automation, especially around validating db backups, failover (not having the failover process scripted and tested is _begging_ to have a nightmare at 2am where you're reading documentation on how to fail over a db), etc.

The simple thing of having your hostname / $PS1 say the machine's purpose could have stopped this. All prod machines have a bright red PS1 and a clear name of <type>-<service>-<prod/dev/etc>-<region>-<dc>.corpnet in my setups.

All of this is reflected in their discussion style, meeting style, etc. Ad-hoc, not very carefully designed, random off-hand comments. Obviously a young team with a lot to learn. Nothing wrong with that, but a lot of customers are relying on their skills! Learn quick!

discuss

order

kogepathic|9 years ago

> I would never approve a production system as expansive as Gitlab's to only have two databases in a cluster. That is asking for trouble, and any {sys,db}admin worth their salt will tell you the same. As soon as you need to do anything on one database, you've just lost your cluster policy.

I do agree with you it reflects poorly on GitLab for only having a primary and replica, with broken backups.

BUT, they are a startup, and they need to be laser focused on growing the company and securing funding for the next quarter so they can keep the lights on.

This kind of pants on fire growth in a startup often (always?) comes at the cost of redundancy and best practices. If you stop to make your platform bulletproof instead of the new features you promised customers/investors, you die.

I'm not saying this is an excuse for them to permanently shrug off making their platform more redundant and engaging in best practices. But, as with many startups, they're focused on delivering features as fast as possible to grow their user base.

I think, and hope, that their recent outage has been the experience they needed to prioritize their shift toward more redundancy and best practices.

lightedman|9 years ago

"BUT, they are a startup, and they need to be laser focused on growing the company and securing funding for the next quarter so they can keep the lights on."

Having run and sold a few startups and turnkey operations, let me tell you - if you don't focus on your core product first and foremost and demonstrate the utmost competency in it, you're just pissing money away and are likely to fail.