On the flip side, when something DOESN'T need HA, sub-millisecond query response time, or lynx and IE Edge compatability, people need to know when "good enough" is good enough.
On my ops team, we've gotten some flak for not building robust enough of a request queue for some tasks. But it's been down several hours in the past year. The server almost never needs maintenance. None of the workload is real-time. App restarts are acceptable if memory leaks occur.
If we did everything by the enterprise book, we'd still be 70% of the way to a deployed product, instead of 15 months into its completion.
On my ops team, we've gotten some flak for not building robust enough of a request queue for some tasks. But it's been down several hours in the past year. The server almost never needs maintenance. None of the workload is real-time. App restarts are acceptable if memory leaks occur.
If we did everything by the enterprise book, we'd still be 70% of the way to a deployed product, instead of 15 months into its completion.
Sure. Except if this Ops team is like most Ops teams, they'd prefer to do it the (cue dark, brooding orchestral dum-dum-dum musical interlude) 'enterprise' way rather than be woken up by PagerDuty in the middle of the night because a new release push lead to the CPU spiking.
Sorry, but I've almost never seen situations where people have skimped on HA and someone else didn't get excoriated during an outage.
Maybe other people's experiences are different, but as a DevOps guy, even in a relatively new environment, my priority is stability. Not faith-based-computing based on someone's positive seat-of-their-pants past experience.
I once had an infrastructure guy who worked in the same company quote me £70K for new hardware to host a single static HTML web page.
£70K of new hardware!
I stuck it on an existing server and nobody noticed though I might have got into trouble with the Change Prevention Board for not using the right form or something...
Edit: He did have a carefully worked out explanation of where the money had to be spent.... which I ignored after hearing "£70K".
Overengineering. It's either the young engineer that wants to use the latest trendy technology or the old "architect astronaut" that believes only "enterprise level architecture" can solve all your problems.
Even when I worked in electronic trading and we were dealing with 10 million events/second on combined feeds, we kept our architecture simple. Why? Experienced engineers realized that the more moving parts, there's more things that can go wrong (and it makes it that much harder to pinpoint).
Just time your queries, get a P95, and stream data to an analytics database if you don't want your prod to be exposed to added latency. No need to create some fancy distributed consistent system with caching layers and enormous test harnesses if the analytics workload might change next week.
I'm a huge fan of just creating a separate analytics database you stream prod data to and letting those with SQL knowledge play around there. Surprisingly, they rarely break anything. And if they do, it's not going to take down everything else with it
>The server almost never needs maintenance. None of the workload is real-time. App restarts are acceptable if memory leaks occur.
Ah! Classic duct tape programming!
Something like this is fine as long as the requirements don't change.
But we all know that at a point the requirements will change. And then you have to tell your customer that the "little new requirement" can't be accomplished with a cheap little change but requires instead a complete rewrite that will cost even more then the previous system did.
The only question is whether you are a honest company that told the customer before that that will most likely happen in the future or your planed for that inevitable outcome in the first place...
Yeah, you have to read the piece as a kind of parable
-- emphasizing the most extreme cases of what can go wrong when the manager says "I don't understand, we had this other guy who said it could be done in 10 minutes / 1 hour / 1 day" or whatever. (And who hasn't been through that drill, like, a thousand times?)
But as you said, knowing how and and when to negotiate between said extremes is exactly where the "art" lies.
"Maybe you ran your SQL query a few times, fixed the errors you got, eyeballed the results, and called it a day. That’s not good enough for code integrated with a product."
Biggest lie ever. That's pretty much a production-quality code here.
You definitely need to think through your requirements or you end up with this:
Asked to get some kind of analytics query to someone, and they need it fast and want it in some kind of visualization tool.
1. You open Zeppelin, take a bunch of database tables and whip out a query that is basically what they want and export it to an Elasticsearch + kibana instance.
Now comes the edge cases:
Oh I forgot, it needs to be on the internet
2. Need to set up a public IP, DNS, Nginx server and a series of rules to make it read-only (and it's still dangerous mind you)
Why isn't this password protected?
3. Add a nginx basic-auth with a single password
It needs to be available to admins and sales managers only.
4. Set up ngx_http_auth_request_module to hit the our authentication server (the cookies should be present - OHH the DNS name doesn't match the cookies. Set up a /subpath on the existing application.
It needs to work as an embedded view in the company's mobile app.
5. That uses tokens not cookies, so the auth-request module no longer works for this, need to come up with a SSO solution with a cookie and a place to store the cookie in a database. That requires a REST service on the existing app server, which will require a redeploy.
I just added a product to our system and it's not in Kibana
6. Need to modify the spark code to use spark streaming
The Spark server restarted and my new products aren't showing up!
7. Need to set up a service on the system to auto-start the spark job.
It is feasibly an actual product feature at this point, but (1) was asked for, but they really wanted 1 - 7. I would argue that (1) would only take 25% the time 2-7 takes to do. Not every product change is like that, obviously, but sometimes people think all changes are just so simple. Often it's the history and unstated features that make a huge difference.
> Nothing ever works as soon as an engineer writes code; it is an iterative process.
One of my complaints about working in an environment where I am the only person who can program is that nobody else understands this. I've taken the time to explain this point to people, and they seem to grasp it fairly well, but once they see something that somewhat resembles the final product they get really impatient.
The last 25% of functionality takes far more time than the first 75%, especially once you consider handling edge cases, etc.
Yes, edge cases can make features pathological. That "simple feature X" might be simple in isolation but when crossed with "simple feature Y" may cause your edge cases, test code, and linecount, to explode. Not all features are a simple additive amount of work. Usually the work to add a feature is relative to the other features that already exist within the system.
Pro tip: Do the user interface last. Or, if you need to do it first for prototyping / mockup reasons, don't show it to anyone you don't have to.
Learned that one in my first ever project. We mocked up the entire application in Visual Basic (just windows and buttons, no actual functionality implemented, we just wanted to know if it'd work for them) and then the client got really upset when we couldn't come round and install it the next day. Classic case of failing to manage expectations.
Because your "5 minutes" is pure bullshit and your engineer knows it.
Your engineer knows that if she writes your "5 minute" query without careful analysis, peer review and documentation and the query ever produces a questionable result --- whether it was anticipated by your requirements or not --- it's your engineers ass; you'll throw your engineer under the bus _instantly_.
Your engineer knows that if she writes your "5 minute" query and it produces any actual value you'll be back the next day with a "5 minute" enhancement. Anything you ask for that might matter the next day has to be built to be maintained by others because if she happens to take the day off when you show up and demand a revision to your "5 minute" wonder query and there is nothing for the other engineers to go on (revision controlled work, documentation, etc.) then that's her ass; she knows you won't stand up for her.
Your engineer didn't just fall out of the boat and is in no hurry to obligate herself to take responsibility for your adhoc miracle queries and the questions that will emerge when you go waving the output under everyone's nose, and she knows that's exactly what you'll do with it. Your little query is your view of the world and that view is highly unlikely to survive the first bit of scrutiny that's applied by anyone other than yourself, much less the second.
I had a veteran DBA close to retirement (she was my cube-mate) warn me about this when I was an intern. I had a very simple request come in to update some data. Didn't even take 5 minutes. She told me: never do the simple tasks immediately. Sit on them for a bit. Business users don't understand complexity of different tasks and think they should all take the same amount of effort. So, if you do one thing in just a few minutes, they'll expect everything to take the same amount of time. In my experience, it's proved to be true.
Had a similar thing a while ago. Very complicated IPSEC tunnels, routers switches firewalls and whatnot between us and our customer. Customers techs were - not great - so needed to make sure port 443 worked through the tunnel. Our side mainly Linux, theirs was mainly Windows 10.
Asked a programmer our side (.net) who said it'd take a couple of hours to write a simple webserver, package it up into an .msi and give it us. I got annoyed, did it in Golang in about 10 lines of code.
I then realised i'd compiled a Linux executable on Linux, remembered it did cross-compiling, 10 seconds later I had a Windows .exe. All for a simple webserver that printed "cock", not useful but it proved the tunnels worked.
Sometimes we overcomplicate the simplest shit.
EDIT: As it wasn't clear (my fault) - we were trying to get to port 443 at their end - they were on Windows, we mainly Linux but the guys I asked were .net programmers..
It might also be possible that we don't know the simplest methods of doing something. I'm not sure if there is a simpler way to do it from the Windows side of things than what this person told you.
I wouldn't know how to do it in Golang because I don't use it. I would default to my most comfortable language, which is almost guaranteed to not be the most efficient method to do anything.
Unless I catch myself breaking my own rule, I never second guess my developer's time estimate unless I think it's too short. I've been a developer (still am one even though I manage people now too), and I've learned that I can't factor in all the things going through that person's mind.
I have cleaned up too many messes because of negligence from people who do not understand how hard programming actually is. That's what most people, especially us developers at times, fail to recognize. That 5 minutes someone took to write an "easy" query against the CI and deployment server almost brought it down (true story, luckily I was monitoring it look at another issue).
The ability to write code is taken for granted, because anyone can learn it. Some programming is easy, and some is extremely difficult and the real trick is knowing which. What scares me most about the code being written are the one off queries, etc. The ones that will "only be used once" or "only for low transaction instances". That's never true, someone will always have it laying around for that time when "we just really needed to make that update".
An old boss of mine used to say "the perfect is the enemy of the good." This is true, there are a number of times you need get something up and running and worry about fixing it along the way. There are other times when that "good little app" got used in the wrong way and cost us hours of downtime because of a mistake due to rushing. Now the perfect solution doesn't look like it was such a bad choice after all. I can wait a day or so for a query that I could write in 5 minutes. In the long term, waiting a few extra hours isn't going to impact anything that much. I'd rather the developer be through then explain why no one went home that night because we had to clean up a mess.
This is a little off-topic, but I'm not entirely sure I understand who this article was written for. It reads as a bit condescending to people who might benefit from it (namely the hypothetical MBA who wrote the query), and most of this is what engineers already know about.
Great Post! Most people understimate the depth of simple-looking tasks: it's not just managers that make this mistake. Even experienced engineers can end up not realizing all the complexity in the beginning.
Nice. I wrote something similar [1] after noticing that it's easy for developers to make this mistake too (especially more junior developers) -- once your proof of concept works, you still are far from shipping.
seems to be apples and oranges. Ad hoc query vs. a modification/new feature of a product. Depending on the product and dev process around it, it may take a month even for the minor among minor modifications/features.
>Why does my engineer say it will take a month?
if you don't know the answer to that, you're a bad manager. Either you hired bad engineers or you have no idea how your dev process works.
Adam, great to see you on HN. I assume you're the same ABS that founded calc.org. I always love seeing people from the TI Calculator community doing well.
If I was a CEO, and wanted to create a way for me to analyze data and I don't know much about efficiencies of SQL for ad-hoc analysis If instead I did have an issue where I did create this issue while doing my ad-hoc analysis I would try to search for solutions that would solve the problem. If this occurs then you probably want a data lake or even a local version/snapshot of production if possible.
[+] [-] unethical_ban|8 years ago|reply
On my ops team, we've gotten some flak for not building robust enough of a request queue for some tasks. But it's been down several hours in the past year. The server almost never needs maintenance. None of the workload is real-time. App restarts are acceptable if memory leaks occur.
If we did everything by the enterprise book, we'd still be 70% of the way to a deployed product, instead of 15 months into its completion.
[+] [-] mancerayder|8 years ago|reply
If we did everything by the enterprise book, we'd still be 70% of the way to a deployed product, instead of 15 months into its completion.
Sure. Except if this Ops team is like most Ops teams, they'd prefer to do it the (cue dark, brooding orchestral dum-dum-dum musical interlude) 'enterprise' way rather than be woken up by PagerDuty in the middle of the night because a new release push lead to the CPU spiking.
Sorry, but I've almost never seen situations where people have skimped on HA and someone else didn't get excoriated during an outage.
Maybe other people's experiences are different, but as a DevOps guy, even in a relatively new environment, my priority is stability. Not faith-based-computing based on someone's positive seat-of-their-pants past experience.
[+] [-] arethuza|8 years ago|reply
£70K of new hardware!
I stuck it on an existing server and nobody noticed though I might have got into trouble with the Change Prevention Board for not using the right form or something...
Edit: He did have a carefully worked out explanation of where the money had to be spent.... which I ignored after hearing "£70K".
[+] [-] alfalfasprout|8 years ago|reply
Even when I worked in electronic trading and we were dealing with 10 million events/second on combined feeds, we kept our architecture simple. Why? Experienced engineers realized that the more moving parts, there's more things that can go wrong (and it makes it that much harder to pinpoint).
Just time your queries, get a P95, and stream data to an analytics database if you don't want your prod to be exposed to added latency. No need to create some fancy distributed consistent system with caching layers and enormous test harnesses if the analytics workload might change next week.
I'm a huge fan of just creating a separate analytics database you stream prod data to and letting those with SQL knowledge play around there. Surprisingly, they rarely break anything. And if they do, it's not going to take down everything else with it
[+] [-] mistermann|8 years ago|reply
"Good enough, done in a week" is a perfectly viable option that is rarely offered by timesheet-padding developers.
[+] [-] still_grokking|8 years ago|reply
Ah! Classic duct tape programming!
Something like this is fine as long as the requirements don't change.
But we all know that at a point the requirements will change. And then you have to tell your customer that the "little new requirement" can't be accomplished with a cheap little change but requires instead a complete rewrite that will cost even more then the previous system did.
The only question is whether you are a honest company that told the customer before that that will most likely happen in the future or your planed for that inevitable outcome in the first place...
[+] [-] kafkaesq|8 years ago|reply
But as you said, knowing how and and when to negotiate between said extremes is exactly where the "art" lies.
[+] [-] ktRolster|8 years ago|reply
So has AWS, so you can kind of feel good.
[+] [-] technion|8 years ago|reply
I ask because, last project I was on, I "introduced" Edge compat by removing the section of JavaScript that sniffed the UA and kicked people out.
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] AnonNo15|8 years ago|reply
Biggest lie ever. That's pretty much a production-quality code here.
[+] [-] bykovich2|8 years ago|reply
[+] [-] coding123|8 years ago|reply
Asked to get some kind of analytics query to someone, and they need it fast and want it in some kind of visualization tool.
1. You open Zeppelin, take a bunch of database tables and whip out a query that is basically what they want and export it to an Elasticsearch + kibana instance.
Now comes the edge cases:
Oh I forgot, it needs to be on the internet
2. Need to set up a public IP, DNS, Nginx server and a series of rules to make it read-only (and it's still dangerous mind you)
Why isn't this password protected?
3. Add a nginx basic-auth with a single password
It needs to be available to admins and sales managers only.
4. Set up ngx_http_auth_request_module to hit the our authentication server (the cookies should be present - OHH the DNS name doesn't match the cookies. Set up a /subpath on the existing application.
It needs to work as an embedded view in the company's mobile app.
5. That uses tokens not cookies, so the auth-request module no longer works for this, need to come up with a SSO solution with a cookie and a place to store the cookie in a database. That requires a REST service on the existing app server, which will require a redeploy.
I just added a product to our system and it's not in Kibana
6. Need to modify the spark code to use spark streaming
The Spark server restarted and my new products aren't showing up!
7. Need to set up a service on the system to auto-start the spark job.
It is feasibly an actual product feature at this point, but (1) was asked for, but they really wanted 1 - 7. I would argue that (1) would only take 25% the time 2-7 takes to do. Not every product change is like that, obviously, but sometimes people think all changes are just so simple. Often it's the history and unstated features that make a huge difference.
Also this for fun: http://outofmymind.scanlen.com/wp-content/uploads/2011/04/Wh...
[+] [-] Declanomous|8 years ago|reply
One of my complaints about working in an environment where I am the only person who can program is that nobody else understands this. I've taken the time to explain this point to people, and they seem to grasp it fairly well, but once they see something that somewhat resembles the final product they get really impatient.
The last 25% of functionality takes far more time than the first 75%, especially once you consider handling edge cases, etc.
[+] [-] sbov|8 years ago|reply
[+] [-] taneq|8 years ago|reply
Learned that one in my first ever project. We mocked up the entire application in Visual Basic (just windows and buttons, no actual functionality implemented, we just wanted to know if it'd work for them) and then the client got really upset when we couldn't come round and install it the next day. Classic case of failing to manage expectations.
[+] [-] paulddraper|8 years ago|reply
The first 80% takes 80% of the time. The last 20% takes the remaining 80%.
[+] [-] dasmoth|8 years ago|reply
(Especially for internal tool) Is it possible that what they've seen is, actually, good enough?
Or at least, good enough to solve 90% of the problem while you fill in the missing piece that covers the rest.
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] cnnsucks|8 years ago|reply
Your engineer knows that if she writes your "5 minute" query without careful analysis, peer review and documentation and the query ever produces a questionable result --- whether it was anticipated by your requirements or not --- it's your engineers ass; you'll throw your engineer under the bus _instantly_.
Your engineer knows that if she writes your "5 minute" query and it produces any actual value you'll be back the next day with a "5 minute" enhancement. Anything you ask for that might matter the next day has to be built to be maintained by others because if she happens to take the day off when you show up and demand a revision to your "5 minute" wonder query and there is nothing for the other engineers to go on (revision controlled work, documentation, etc.) then that's her ass; she knows you won't stand up for her.
Your engineer didn't just fall out of the boat and is in no hurry to obligate herself to take responsibility for your adhoc miracle queries and the questions that will emerge when you go waving the output under everyone's nose, and she knows that's exactly what you'll do with it. Your little query is your view of the world and that view is highly unlikely to survive the first bit of scrutiny that's applied by anyone other than yourself, much less the second.
[+] [-] hermitdev|8 years ago|reply
[+] [-] matthewmacleod|8 years ago|reply
[+] [-] sofaofthedamned|8 years ago|reply
Asked a programmer our side (.net) who said it'd take a couple of hours to write a simple webserver, package it up into an .msi and give it us. I got annoyed, did it in Golang in about 10 lines of code.
I then realised i'd compiled a Linux executable on Linux, remembered it did cross-compiling, 10 seconds later I had a Windows .exe. All for a simple webserver that printed "cock", not useful but it proved the tunnels worked.
Sometimes we overcomplicate the simplest shit.
EDIT: As it wasn't clear (my fault) - we were trying to get to port 443 at their end - they were on Windows, we mainly Linux but the guys I asked were .net programmers..
[+] [-] kogepathic|8 years ago|reply
Indeed. Why not use nc and telnet on port 443 to test? Linux already has nc and Windows already has telnet.
[+] [-] zeeveener|8 years ago|reply
I wouldn't know how to do it in Golang because I don't use it. I would default to my most comfortable language, which is almost guaranteed to not be the most efficient method to do anything.
[+] [-] reledi|8 years ago|reply
[+] [-] z3t4|8 years ago|reply
[+] [-] BoorishBears|8 years ago|reply
[+] [-] acchow|8 years ago|reply
[+] [-] retox|8 years ago|reply
[+] [-] NobodyRalph|8 years ago|reply
[+] [-] Almaviva|8 years ago|reply
[deleted]
[+] [-] kemiller2002|8 years ago|reply
I have cleaned up too many messes because of negligence from people who do not understand how hard programming actually is. That's what most people, especially us developers at times, fail to recognize. That 5 minutes someone took to write an "easy" query against the CI and deployment server almost brought it down (true story, luckily I was monitoring it look at another issue).
The ability to write code is taken for granted, because anyone can learn it. Some programming is easy, and some is extremely difficult and the real trick is knowing which. What scares me most about the code being written are the one off queries, etc. The ones that will "only be used once" or "only for low transaction instances". That's never true, someone will always have it laying around for that time when "we just really needed to make that update".
An old boss of mine used to say "the perfect is the enemy of the good." This is true, there are a number of times you need get something up and running and worry about fixing it along the way. There are other times when that "good little app" got used in the wrong way and cost us hours of downtime because of a mistake due to rushing. Now the perfect solution doesn't look like it was such a bad choice after all. I can wait a day or so for a query that I could write in 5 minutes. In the long term, waiting a few extra hours isn't going to impact anything that much. I'd rather the developer be through then explain why no one went home that night because we had to clean up a mess.
[+] [-] bichiliad|8 years ago|reply
[+] [-] pascalxus|8 years ago|reply
[+] [-] philfreo|8 years ago|reply
[1] http://philfreo.com/blog/the-last-20-before-shipping/
[+] [-] trhway|8 years ago|reply
>Why does my engineer say it will take a month?
if you don't know the answer to that, you're a bad manager. Either you hired bad engineers or you have no idea how your dev process works.
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] geori|8 years ago|reply
- Harper
[+] [-] kristianc|8 years ago|reply
[+] [-] viraptor|8 years ago|reply
This post is about engineering a future-proof solution rather than fixing bugs.
[+] [-] mikestew|8 years ago|reply
[+] [-] kwillets|8 years ago|reply
[+] [-] SQL2219|8 years ago|reply
[+] [-] wcummings|8 years ago|reply
[+] [-] Profragile|8 years ago|reply
[deleted]
[+] [-] zitterbewegung|8 years ago|reply