top | item 16806388

We saved $50k/year with a Go microservice coded in a hackathon

319 points| maloga | 8 years ago |movio.co | reply

254 comments

order
[+] attaboyjon|8 years ago|reply
I've come to the conclusion that the problem in tech is that all the people doing the work are in their early twenties and have no idea what they are doing. Once they get some experience they are quickly promoted to the CTO position. Rinse and repeat.

What we have here is a classic dbms problem and no one at Movio seems to know how to deal with that. Instead of migrating from Mysql to something serious (Postgres) they move to some columnar DB no one has heard of. Nevermind that postgres and a reasonably priced DBA and a little thought put into their data model/queries could probably handle all their issues.

Sorry for the snark, cheers on a successful product.

[+] endymi0n|8 years ago|reply
Can't help but also think "WTF are they doing there..." - we're doing exactly the same (user segmentation, targeting and campaign execution for cinema & movie users, disclaimer: we're more or less their only competitor, albeit indirect), but our solution is running at ~30k/year total at 10 times their user base. No magic in there, just good architecture and solid Computer Science. Boring technology (Go/Redshift/Postgres/S3).

The only thing I'd fully agree on is that using Go saved us a lot of resources as well. It's an awesome choice for stuff like this that needs to be reasonably performant as well as being simple, understandable and reasonably fast built.

[+] indigochill|8 years ago|reply
Well, a problem in tech certainly. An alternative possibility (and another problem in tech) could be some mid-level developer could have figured out the problem at the start but because of artificial time pressure to deliver they didn't have the time to, so went with the first bad idea that popped into their head without taking the necessary time to evaluate it.

The fact this had to happen in a hackathon suggests a typical disconnect between management and development (and probably poor prioritization by management). Because development knew this was a problem and how to fix it (evidenced by the fact they fixed it), but it took removing management (aka a hackathon) to give development the space to fix it. And now the company pats itself on the back for having the vision to host a hackathon instead of structuring and prioritizing correctly in the first place so this would just get fixed on the clock.

I do think the author's takeaway about the value of simplicity and pragmatism are on point, but that applies not just to code but to management as well.

[+] smoe|8 years ago|reply
I think it is not necessarily the age but the mindset of focusing on solutions instead of understanding the problem first.

It often goes like this: Oh snap, we encountered a problem! Lets find a tool, framework, language that promises to solve a similar sounding problem. Now we have a problem with a layer of abstraction on top. Soon to be two problems. Lets find a tool, framework, language to solve both of them ...

It is a spaghetti to the wall approach, where you just throw a bunch of things at your problem hoping that something sticks. And who cares how long it will stick.

Secondly as a developer I think in start-ups dedicated db experts are way underrated. Sure your fullstack devs can cobble together some tables, changing them 15 times a day to accommodate business requests and slap indexes on everything that gets slow. That is also the way to get into trouble once you scale, and instead of reflecting why this is, people reach for the bowl of pasta.

I was no different, when just starting out. I thought my biggest strength was, how quickly I can come up with easy "solutions" for any problem the company had. Took me years to realize how silly of an approach this is.

[+] peburrows|8 years ago|reply
Overall, I agree with the sentiment of your comment, but the assertion that MySQL isn’t a serious database is just flat out wrong.
[+] collyw|8 years ago|reply
Agreed.

I have 15 years of experience and can built a decent clean system using "boring" technologies. But all the decent paid work where I live is maintaining big balls of mud with tech that was obviously peak hype when it was chosen, and nothing done according to best practices because of that would require sticking with a tech and learning it properly. Its quite frustrating.

Then we have the interview process where people expect me to give up my weekend for their coding test and can't even be bothered to give you feedback afterwards. Or some ridiculous algorithmic nonsense that has no relevance to the job. Getting bored of it all.

[+] ksec|8 years ago|reply
If I ever be a CEO of the company / Startup, that one criteria I made is either I decide on all the technologies we use, or there is no CTO so i make those decision.

And that criteria of technologies could be summed into one sentence. Use something boring. No Hyped programming languages / DB / tools allowed.

Of course some would argue you would be doing it wrong even if it was using old tech / programming / tools. Well yes, but you have a sea of recourse and expertise there to ask for help. Instead of spending energy and time doing figuring it out.

Of course if your company is all about tech innovation, AI or something cutting edge there surely you will have to tried something new. But 80% of those startup aren't.

[+] w8rbt|8 years ago|reply
"In June 1970, E. F. Codd of IBM Research published a paper [1] defining the relational data model and introducing the concept of data independence. Codd's thesis was that queries should be expressed in terms of high-level, nonprocedural concepts that are independent of physical representation."

The key, the whole key, and nothing but the key so help me Codd.

Also said as... "In Codd we trust."

If none of these DB jokes mean anything to you, take a DB concepts class at a CS university. There's a lot of great research going back 50 years and you can learn a great deal about why things are the way they are (tuple algebra and calculus). And before changing anything for something you think may be better, you should fully understand what you are giving up.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86....

[+] davidp|8 years ago|reply
I like to trot out this old gem[1] when people wonder why there's so much hate for MySQL.

Nontransactional DDL alone is sufficient to classify it as a toy DB for me. Yes, I've been personally bitten by it.

[1]: https://grimoire.ca/mysql/choose-something-else

[+] orf|8 years ago|reply
I don't quite get this. How fast was running this query:

   Select loyaltyMemberID
   from table
   WHERE gender = x
   AND (age = y OR censor = z)
Why the random complexity with individual unions and a group? Of course that's going to be dog slow.

Sure, the filters can be arbitrary but with an ORM it's really really simple to build them up from your app code. The Django ORM with Q objects is particularly great at this.

Obviously I'm armchairing hard here but it smells like over engineering from this post alone. Stuff like this is bread and butter SQL.

Edit: I've just read the query in the post again and I really can't understand why you would write it like that. Am I missing something here?

Seems like a fundamental misunderstanding of SQL rather than a particularly hard problem to solve.

[+] owenmarshall|8 years ago|reply
> Stuff like this is bread and butter SQL.

Ten or fifteen years ago, sure - a DBA would look at a query plan and figure out how to do it properly. Worse case you'd slap a materialized view in and query that.

But this is 2018! Programmers don't want to treat the database as anything but one big key value store ;)

[+] nemothekid|8 years ago|reply
>Seems like a fundamental misunderstanding of SQL rather than a particularly hard problem to solve.

Without knowing the rest of their stack, or what their data ingestion looks like, I think your query is oversimplified. If they are doing a union, then it's likely they aren't querying one table, but they are querying multiple tables. The article mentions that individual customers had as many as 500 million rows. Likely each customer has their own set of data they also pipe into the system. Next their own custom query language may support more complex algebra than standard equality.

IMO, the article doesn't sufficiently describe the problem for us to understand why their solution works. To you and I there are 100 other solutions they could have tried that seem simpler than the one they presented.

It's less likely that they overengineered - we are probably just underinformed.

[+] any626|8 years ago|reply
The user data is most likely in rows instead of columns. Instead of having

    id, name, age, gender
    1213, fake, 60, female
they would have

    property_id, user_id, value
    1 (assume age), 1213,    60
    2 (gender),     1213,    female
This gives them the freedom to add more properties to the user without always having to add a column to the users table. When querying the database you'll have to do unions or joins.
[+] ThrustVectoring|8 years ago|reply
The UI they showed in the blog post looks like it has enough data available to generate that kind of query, too. Like, the ands/ors/nots are right there on the page, the filters are already there too getting translated to SQL as well, just mash them together and you get the same "algebra of sets" stuff right in the WHERE clause.

As it stands the SQL query is quite silly. It gets a list of every user ID that is included by each filter and compares which ones are in the filters you want and not the filters you don't want. Much better is to pass the filters into SQL, let it figure out which users match the filters you want and not the filters you don't, and just use that result.

[+] wetha|8 years ago|reply
I’m not the author of the post. Your comment assumes a well known schema. My understanding from the post is that this solution can join and filter on “custom” datasets of arbitrary schema that each of their customers upload.
[+] TimPC|8 years ago|reply
I think the point is they don't know in advance what the query is and they didn't think they had a good solution to optimize all user entered variants across the range of possible groupings so they wanted a solution that was easier to optimize globally.

The general form of this is:

Select loyaltyMemberID from table WHERE V1_1= x_1 OR ... OR V1_n=x_n) AND (V2_1 = x_2_1 OR V2_2=x_2_2 OR ... V2_n=x_2_n) AND ... AND (Vn_1 = x_n_1 OR ... OR Vn_n= x_n_n) (some of these n's should actually be m_i's but I was lazy)

There may be some ability to optimize this in a number of ways but optimizing one example is not optimizing the general form. I can easily see how technology change could be a cleaner solution.

[+] rypskar|8 years ago|reply
I blame ORMs', if you don't understand SQL and how databases work you should not be allowed to use an ORM. If you know how databases work you, in many cases, will not use an ORM except for the most simple CRUD operations.
[+] flukus|8 years ago|reply
> Edit: I've just read the query in the post again and I really can't understand why you would write it like that. Am I missing something here?

Oh I've seen this happen a lot. Somewhere along the line, often from a DBA, it is decided that sql in an app is evil and that everything must be in a stored proc. Then instead of some simple string concatenation you have to jump through hoops like this.

[+] est|8 years ago|reply
I am building a rule engine quite similar to this. An AST parser will run all python DSLs and generated list of tables for INNER JOIN, then SELECT all the tables data out with filters in one pass, then run all results through the Python code.

It's quite fun.

[+] benmmurphy|8 years ago|reply
age/gender are probably simple but i'm guessing censor is probably derived from a transaction table. if they are letting users select arbitrary time ranges to filter the transactions then you can't store a precomputed censor value for each user. but seeing that they are talking about caching maybe a lot of stuff can be precomputed.
[+] BinaryIdiot|8 years ago|reply
Damn, you're not kidding. I wonder why they needed more than one query here plus UNION is slowwwwwwww. They never mention how frequent this query needs to run either, only the amounts of data involved in some aspects of this table.
[+] toddBarkus|8 years ago|reply
I was going to ask how to optimize the SQL in your post as it seems like the obvious/naive implementation of the query. If you're missing something, so am I. I can only imagine it was built up over time from googling specific terms that already missed the point, e.g. "rds mySQL union query"
[+] silveroriole|8 years ago|reply
Gotta agree with others and say that they’re clearly skimming over the facts that:

- they didn’t have the expertise to actually fix the SQL. That query smells bad. The data model smells bad. For some reason HN is always superstitiously afraid of letting developers touch the database, but if you don’t let devs touch the database enough you end up with this sort of thing; or that crap data model with properties in rows instead of columns, because oh god, we can’t let devs actually do DDL so we’d better make it all really flexible (and incredibly slow because it’s a misuse of the database). I mean, implementing your own result caching mechanism? I don’t know about MySQL but surely it has its own caching mechanism (Oracle does) that isn’t being used because the query is bad.

- project management probably had no interest in fixing the performance/incorrect data problems, and devs were expected to do it in their own time.

In a way though this makes me feel better, other people are dealing with these problems too and their overengineered solutions work and keep the company running, I guess mine will too :)

[+] hippich|8 years ago|reply
What else it shows - how expensive AWS hardware vs hosting own hardware. I guess you have to consider how often you have to scale, but hetzner offers dedicated servers with 64Gb and NVMe drives starting from 54 euros per month - https://www.hetzner.com/dedicated-rootserver?country=us - compare that to $580 per month these guys were paying for i3.2xlarge instance..
[+] bobwaycott|8 years ago|reply
I’ve heard mention of Hetzner no less than a dozen times in the last couple days. What’s their deal? I’m not quite sure I grok this server auction thing they do, or how they’re so cheap.
[+] rbranson|8 years ago|reply
... and then you have to clone all the AWS services yourself. If you just need a few boxes, there are definitely cheaper options than AWS.
[+] voltagex_|8 years ago|reply
What if you need boxes closer to where your customers are? Latency is a big deal, especially in Australia / New Zealand.
[+] partycoder|8 years ago|reply
Unless you can directly see how a query can be optimized, first thing you do is get the execution plan (e.g: EXPLAIN query).

The execution plan will tell you how expensive is each bit of your query and help you adjust it.

From there, if things are not getting better, you have a lot of alternatives:

- Consider creating an index

- If the value doesn't change often, consider writing it into another table or caching it.

- Replication, partitioning, sharding, changing the schema.

- Reconsider the requirement being implemented in order to have a more scoped query or to perform the query less often.

Then... OLAP is not OLTP. If you can, do reporting in another database.

Finally, creating your own project in the end may not save you $50,000. How about maintenance? tooling built around it? integration costs? documentation? usability? new hires having to learn about it? You can hire people that already know SQL without having to incur that cost yourself. All the tooling is built, battle-tested and readily available. Plus, skills related to internal tools are harder to trade in the market because they're harder to verify and less transferable.

[+] hnov|8 years ago|reply
Elasticsearch works for this use-case quite well, you'd store a fairly straightforward representation of the MySQL row as a document, query by the fields you're interested in and ask for aggregations on the matching documents. Common bitsets get cached automatically.
[+] mikeryan|8 years ago|reply
This is cool but it seems strange (to me) that this was a “Hackathon” project as opposed to just a stand-alone problem to be addressed as a normal course of doing business. It doesn’t make the solution less cool. It just seems like a strange distinction on what a Hackathon is.
[+] neuromantik8086|8 years ago|reply
A hackathon is something that used to be a cool party for geeks (i.e., a Mathletics competition or an ACM programming contest) until the corporate overlords bastardized it and converted a good thing into unpaid overtime with free beer.
[+] qw|8 years ago|reply
Depending on the type of company, it could have been blocked by the manager or PM. It is easy for them to reject "tech stuff", when your time could be spent adding a feature from the product owner.

A proposal like this could easily have been seen as the developers wanting to test out a technology that was not approved or with a good business case. That business case is usually something that only sales/product can sell. The barrier to listening to developers is higher because they are assumed to not know enough about business.

You may have an "agile" environment, but you often need a very good reason to not pick the next item from the backlog, which was not created and maybe not even prioritized by you.

In those companies, the hackathon may be the only time developers can present their ideas.

[+] btb|8 years ago|reply
My guess would be a dev that knew that something was wrong with the performance of their old system, but for whatever reason it wasnt something that anyone higher up prioritized. If thats the case, I can see why one might come up with an idea of trying to solve it outside the bounds of the normal production/SQL environment. At least in my company, the only time we devs ever get to specifically look at the performance of our website is when we have "hackathon" days where we choose our own projects. I often times feel like my regular time would be much better spent trying to optimize our 2 second+ initial pageload times, instead of all the other small tasks/tweaks/bugfixes that gets sent my way. But performance is something very few people higher up seems to care about. Or maybe its a case of users and managers becoming so accustomed to something being slow they dont notice anymore.
[+] pbnjay|8 years ago|reply
Cool post - Definitely an improvement and a good fit for Go services. I'm curious - did you run performance comparisons on optimizing the SQL itself as compared to adding this additional service?

Maybe I'm crazy, but just looking at that query it seems like there's definitely room for improvement with the SQL alone. Unless the "..." is hiding something I'm missing?

[+] TeeWEE|8 years ago|reply
Its not that a Go Microservice solved their problem. Its the different algorithm they use for querying. That has nothing todo with Go, or Microservices.
[+] hajile|8 years ago|reply
This seems a lot less about go and much more about changing the approach to the problem.
[+] misja111|8 years ago|reply
But it is the 'Go' in the headline that probably got this article promoted to the frontpage of HN ..
[+] manigandham|8 years ago|reply
This is a great example of how to do things wrong. I'm surprised they couldn't find any other columnstore database to take the place of InfiniDB in 2018.

The numbers they quote (5M members, 100M transactions) are tiny for any modern data warehouse. Many solutions would run these in sub-second speeds without changing the SQL at all, and it would be far better than building a quasi-SQL engine in Go.

Actually for the occasional querying + caching that they have, something like BigQuery or Snowflake data would be even cheaper with basically 0 operational effort.

[+] wiradikusuma|8 years ago|reply
Just to be sure, any language will do right? Because I thought it was about Go vs (put your slow programming language here).
[+] twic|8 years ago|reply
Okay, before i've even read the article, i'm going to guess that they were doing something egregously expensive in the cloud, and the microservice helped them do it more efficiently - but still much more expensively than doing it in a simple, old-fashioned way.

Now i'll read the article ...

EDIT: I would say i'm no more than 30% right. They were doing heavyweight data crunching in the cloud, and so paying more for it than if they were doing it on rented hardware. But that's a constant-factor thing; it's not like they were downloading gigabytes of CSVs from S3 on every request or some such. Their query looks suspect to me: couldn't it be written to do one big scan, rather than unioning a load of things? Or is this the right way to write queries on column stores? Still, there is no glaring obvious (to me) old-school fix for this.

[+] wasd|8 years ago|reply
Great story, thanks for sharing. I wanted to ask a quick question about something:

> Refreshing caches automatically

How do people usually handle this? Is this something done on the application layer or database layer? Where is the cache stored?

[+] maloga|8 years ago|reply
Blogpost author here. Thank you so much for all the attention, comments, upvotes, likes, retweets, etc! I've done a pass over the comments and can't really answer them all but I'd like to clarify a few things:

There seems to be a general opinion trend that the queries generated by the group builder algorithm are very inefficient, that it'd be easy to come up with a solution with much better response times, and that that would be achievable in any reasonable programming language in roughly the same time with similar results.

The language argument will always be controversial and I won't address it here; we have a point of view that is expressed in the Conclusion and on this blogpost: https://movio.co/en/blog/migrate-Scala-to-Go/

I can imagine that seeing a query with JOINs, subqueries, GROUP BYs and UNIONs can raise some eyebrows, but there is some lacking context in that story, and that's on me. Here's some of that context:

* The schema that the group builder algorithm operates on is not uniform in nature or composed of simple yes/no fields; it's an incredibly complex legacy schema that to a large degree wasn't even up to Movio: it's been up to the film industry as a whole, and it has evolved over the years, as is the case everywhere. Note that every different kind of filter translates to a very different kind of query, and we have more than 120 different filters, sometimes with dynamic parameters, and sometimes even bespoke for a particular customer!

* The group builder algorithm predates the team that built this service (myself included), as well as predating the first commercial release of Elasticsearch, MariaDB, mainstream Go success, etc. Nevertheless, it's still very fast and is being used today by ~88% of our customers (i.e. all the non-behemoths). It's been successful for many years, and continues to be, for the most part.

* But I don't like it because it's fast: I like it because it's simple and flexible. It allows our customers to build a really complex (and arbitrary) tree of filters to segment their loyalty member base, and it compiles all of that into one big SQL query, that in most cases is quite performant. That's pretty awesome. But yes; it doesn't scale to several million members.

* Migrating the very engine of the main product of a company is not a decision that is taken lightly. As is the case with every big company I can remember (e.g. Twitter, SoundCloud), behind a big success story there's always a legacy monolith, and our case is no exception. From that standpoint, achieving such breakthrough (i.e. cost reduction + significant response time improvement) within one hackathon day is really not all that common in my experience. Definitely something worth sharing, IMO.

Hopefully that clarifies some of the questions :) Cheers.

[+] chupy|8 years ago|reply
We had the same issue where I work and we are doing a very similar thing but on a way larger scale (adtech) for audience building and we actually resorted to compressed bitmaps since postgres was not cutting it. It's fairly easy to just come on a forum and say hey: just use postgres/mysql/sql server without reading the full article and understanding what you guys are dealing with.
[+] pryelluw|8 years ago|reply
I once built a micro service in Go that saved about as much. Took me a couple of days (meetings and all that) from start to deployed. It still runs on the same cheap aws instance. Go is really good at that sort of thing.
[+] sankyo|8 years ago|reply
what did the Go solution replace? I only read about DB changes, and cannot draw any conclusions.
[+] progval|8 years ago|reply
short version: they used the right data structure for their problem instead of a generic one.