top | item 29459724

Ask HN: Running production server on M1 mini?

23 points| groundthrower | 4 years ago | reply

We have a performance intensive application running on a AMD Epyc dedicated instance with 32 cores (our application is highly parallelizable)

We just noticed locally on our dev environment that our M1 is actually performing better performance wise (don’t ask me how).

We are now considering switching our production servers to M1 minis which is also offered by our cloud provider. Do you have any experience on running M1s / Mac in an production environment regarding stability / uptime etc?

Edit: it’s a Rust application which uses the Rayon crate. The application gets on average one request a minute which crunches some numbers for an average of 2 seconds - so it’s mostly idle. No disk IO.

63 comments

[+] al2o3cr|4 years ago|reply

    don’t ask me how

_You_ should be asking you how - there are lots of reasons why this could be happening and knowing which one is important if you're changing stuff.

Based on a "highly parallelizable" application performing better on 8 cores than 32, I'd guess you're running out of something else: memory or disk bandwidth.

[+] gwbas1c|4 years ago|reply

Probably the hardest thing to clean up is a codebase where very complicated "optimizations" were built because someone didn't understand some very basic bottlenecks.

I recently inherited an app that makes heavy use of Redis caching because someone didn't first try to optimizing SQL. The complexity that Redis caching adds is insane to maintain compared to spending a few minutes optimizing SQL.

The original poster really needs to hook up a profiler.

Also: having written lots of parallel code: Parallelization isn't a magic way to make things faster. If the codebase is breaking up tasks into lots of tiny tasks that run in parallel, there might be more overhead in parallelization than needed. Sometimes the fastest (performance and implementation) way to parallelize is to keep most of the codebase serial, but only parallelize at the highest level and never share data among operations.

[+] gjsman-1000|4 years ago|reply

If his application is running better on 8 instead of 32, that reeks to me of a dependency on single-core performance somewhere. An example of this would be Minecraft, which performs worse on heavily-multi core systems compared to a few fast cores (like M1).

[+] Matthias247|4 years ago|reply

+1. They should start profiling their application. If its running on alpine linux e.g. the default memory allocator is extremely bad and would degrade performance - but it could also be tons of other things. Taking random actions without understanding what the current bottleneck is will never be great long term.

[+] groundthrower|4 years ago|reply

It does not consume much memory but do lots of allocations/deallocations. No disc operations whatsoever.

[+] hvgk|4 years ago|reply

They are perfectly stable machines for running batch jobs. I have had one running a bunch of build automation as a jenkins slave for about 9 months now. Never skipped a beat. It just works and the thing is damn fast.

If it’s doing it offline it’s probably cheaper to buy one and chuck it in your office than borrow one from a cloud provider. The ass end ones are really really really cheap. Much cheaper than just the CPU in an equivalent server machine. If they blow up, just mill down to the apple store and buy another one.

Disclaimers of course: (1) it doesn’t have ECC RAM (2) it doesn’t have redundant power. We ignore (1) and solve (2) by running a prometheus node exporter on it and seeing if it disappears.

[+] Someone1234|4 years ago|reply

No currently offered M1 Mini has redundant fail-over power or storage. Also, without knowing how your cloud provider has cooling setup it is unclear how well it will operate under heavy load for extended periods of time (blade servers are designed for that specific workload and have cooling solutions to match).

My point is: If your workload is time critical, and you cannot afford downtime/outages then it may not be for you. If your workload can afford the time it would take to adopt a new M1 Mini when the old one dies, then maybe?

[+] jagger27|4 years ago|reply

> No currently offered M1 Mini has redundant fail-over power or storage.

It's kind of funny, but an M1 MacBook does. In fact it comes with a solid >12 hour UPS built-in.

[+] bpicolo|4 years ago|reply

Does that include the AWS launched M1 instances last week?

[+] groundthrower|4 years ago|reply

Well, it waits for calculations which take about 2 seconds to complete on average - the vast majority of the time it’s idle

[+] joshdev|4 years ago|reply

Have you considered looking at Amazon for their ARM offering (Graviton)? I'd be hesistant to use M1 minis for a production workflow as they are not really production grade (lacking ECC memory, not sure how long they are rated to run at high CPU, lack of user replaceable disks, no RAID, etc...).

[+] svacko|4 years ago|reply

How do you actually compare performance/bechmark the app - are you testing/benchmarking both prod and dev directly on the box itself? I'm thinking, there might be other infrastructure shielding the production like load balancers, proxies and other involved (observability/security tooling running and slowing the prod server) compared to accessing the dev on M1 directly..

[+] crankyadmin|4 years ago|reply

Knowing what the development language is as well would help a lot - but the first thing you want to do is get some instrumentation on both your 32 Core AMD box and your M1 and compare the two.

The M1 is very fast at doing certain things and your application may just be making good use of the M1 instruction set... both without knowing a bit more its difficult to tell.

[+] krageon|4 years ago|reply

If you do not understand why your performance profile is as it is, how do you know next week's patch won't make it perform better on AMD machines suddenly? You should understand your problem before you solve it.

[+] DrBenCarson|4 years ago|reply

I don’t think any amount of historical or present-state analysis will shed light on next week’s patch.

That being said, it would prepare one to better analyze next week’s patch,

[+] poulsbohemian|4 years ago|reply

>our M1 is actually performing better performance wise

I did performance analysis work for a long span of my career. While I'm reading between the lines of what you wrote, my first question is - what do you mean by performing better? As in, is it somehow able to process more of these tasks over a given timeframe? If so, I'd want to understand more about the workloads you are running to make sure it's a proper comparison.

There's a whole lot more questions we need to answer here to understand the results you are seeing before we can have any kind of discussion of whether M1s would be "better."

[+] marban|4 years ago|reply

I have one sitting on my desk that generates videos 24/7 and hasn't been down in a year.

https://imgur.com/a/VAxpGCL

[+] nobbis|4 years ago|reply

We use MacStadium's M1 mini servers for Metascan's photogrammetry batch processing. They've only been running a few months, but no downtime yet and I'm impressed with MacStadium's customer support, responsiveness, and price.

[+] toast0|4 years ago|reply

Unless it's changed recently, OS X has essentially no protection from synfloods. The TCP stack predates FreeBSD's syncache, and it was never ported. It doesn't have syncookies either. The pf port's synproxy stuff doesn't seem to work either.

You've got to put some sort of firewall or something in front, don't let it accept tcp connections directly. You might be OK, but not great if you just set the listen queue really short; at least that should prevent the machine from falling over when it's flooded, but without syncookies, chances are you won't be able to make new connections either.

[+] DarthNebo|4 years ago|reply

Feels like your provisioned disk or IOPS could be the missing factor instead of core counts.

[+] groundthrower|4 years ago|reply

We do not do any disk operations at all

[+] gjsman-1000|4 years ago|reply

No - but I can give a few suggestions.

One would be to look, if you haven’t, at MacStadium and what they’ve got there. You can get an M1 Mini there and it will be run by experts who know all about using M1 minis for servers. Considering your application is highly parallelizable, this would also make it easy to upgrade to the M1 Pro with double the performance cores down the line.

Secondly, if your application is running better on M1, that reeks of an application which is somehow greatly benefiting at single-threaded performance somewhere, which the M1 excels and the Epyc is poor at. That probably needs some investigation.

[+] errcorrectcode|4 years ago|reply

Terrible idea. I supported a dozen Xserves back in the day. They were crap because they weren't designed for production use. They used nonswappable, commodity retail IDE drives not meant for 100% duty cycle operation. Fixed power supplies. Real enterprise servers were cheaper.

Mac minis don't have redundant power or ECC. You might as well run a bunch of RPis or PICs. Get yourself some real enterprise servers or rent some via a VPS.

Disclaimer: I use a Mac mini as my living room HTPC. I wouldn't run anything real on it. That's what I have a 96 thread EPYC virtualized box for.

[+] caeril|4 years ago|reply

No personal experience other than a slightly different experience running production services (involving money!) on another box without ECC DRAM (to save money!) and experiencing random permission flags flips and actual balance/amount flips. Only a small handful over many years, but it does happen, and when it matters, it REALLY matters.

My advice is to always use ECC DRAM in production unless you're serving cat photos, porn, social media posts, or other societally useless applications. For anything that actually matters, please use ECC.

[+] groundthrower|4 years ago|reply

Yes this is one concern. Are you sure it was a result of using non ECC mem and how did you find out it was because of that?

[+] skw-hn|4 years ago|reply

Scaleway is also providing M1 mac minis. The price is around 0.10€/hr which is quite cheaper.

As per the stability, my scaleway m1 has never had any issues. works just fine for some CI.

[+] jagger27|4 years ago|reply

I'd be curious to know if your application scales even further onto an M1 Pro/Max. If that's the case, then something about Apple silicon makes your application scream.

[+] throwaway4good|4 years ago|reply

Mac OS X will require updates from time to time. Otherwise they will run 24/7 with no problem. You can consider building a hybrid setup where you leave the stuff that requires no / little downtime at your cloud provider.

[+] maksimpiriyev|4 years ago|reply

I was thinking the same, as switching to M1 server,and also next version of M1 Mac minis will probably be x2 faster than the current one, so next year you buy mac mini will be double benefit :)

[+] tyingq|4 years ago|reply

Have you tried using taskset or similar to force the production application onto fewer cores? Perhaps something about thread/ipc/locking overhead?

[+] usefulcat|4 years ago|reply

This is a good thing to check. However, do be aware that a lot of apps check the number of physical processors in the system rather than the CPU affinity mask for the process, even though the latter is almost always what they ought to be using.