The best cold starts are those which aren't noticed by the user. For my blog search (which runs on Lambda), I found a nice way of achieving that [1]: as soon as a user puts the focus to the input field for the search text, this will already submit a "ping" request to Lambda. Then, when they submit the actual query itself, they will hit the already running Lambda most of the times.
And, as others said, assigning more RAM to your Lambda than it actually may need itself, will also help with cold start times, as this increases the assigned CPU shares, too.
My experience with cold starts in Azure Functions Serverless is pretty awful. Like most other Azure services, their affordable consumer grade offerings are designed from the ground up not to be good enough for "serious" use.
Cold start times compared to Lambda are worse, and in addition, we would get random 404s which do not appear in any logs; inspecting these 404s indicated they were emitted by nginx, leading me to believe that the ultimate container endpoint was killed for whatever reason but that fact didn't make it back to the router, which attempted and failed to reach the function.
Of course the cold start and 404 are mitigated if you pay for the premium serverless or just host their middleware on their own App Service plans (basically VMs)
Same experience with Firebase. I just joined a team that has been using it. I've never worked with serverless before, and it boggles my mind how anyone thought it would be a good idea.
The cold starts are horrendous. In one case, it's consistently taking about 7 seconds to return ~10K of data. I investigated the actual runtime of the function and it completes in about 20ms, so the only real bottleneck is the fucking cold start.
I like Azure in general, but Function cold start times are really awful.
I regularly see start up times exceeding 10s for small, dotnet based functions. One is an auth endpoint for a self-hosted Docker registry, and the Docker CLI often times out when logging in if there is a cold start. I'm planning on moving these functions to Docker containers hosted in a VM.
I have other issues with Functions too. If you enable client certificates, the portal UI becomes pretty useless, with lots of stuff inaccessible. I have one such endpoint in production just now, and it's even worse than that, as every now and then it just... stops working until I manually restart it. Nothing useful in the logs either.
Azure Functions cold start times also depend on the underlying tech stack. I was using Python on a Linux host for Slack related Azure Functions and they ran into timeouts sometimes (which for the Slack API is 3s I think). After I switched to Nodejs on Windows I never got a timeout again.
For the Azure Functions consumption plan this can be mitigated to an extent by just having a keep alive function run inside the same function app (set to say a 3-5 minute timer trigger).
Azure Functions, in my opinion, should mostly be used in cases when you want to do some work over some time every now and then. It will also probably be cheaper to use something else in your case. In later versions of AF you can use a real Startup file to mitigate some life cycle related issues.
The way Azure Function scales out is different and is not entirely suited for the same goal as lambdas. Lambdas happily scale from 1 to 1000 instances in seconds* (EDIT: not A second), whereas Azure Functions just wont do that.
At my last job we built an entire API on top of serverless. One of the things we had to figure out was this cold start time. If a user were to hit an endpoint for the first time, it would take 2x as long as it normally would at first. To combat this we wrote a "runWarm" function that kept the API alive at all times.
Sure kind of defeats the purpose of serverless but hey, enterprise software.
Something I discovered recently, for my tiny Go Lambda functions it is basically always worth it to run them at least with 256mb of memory even if they don't need more than 128mb. This is because most of my functions run twice as fast at 256mb than they do at 128mb. Since lambda pricing is memory_limit times execution time, you get better performance for free.
Test your lambda functions in different configurations to see if the optimal setting is different than the minimal setting.
We run a few .net core lambdas and a few things that make a big difference for latency. 1. pre-jit the package, this reduces cold start times as the JIT doesn't need to run on most items. Still does later to optimize some items. 2 is sticking to the new .net json seralizer. The reference code uses both the new and old newtsonsoft package. The old package has higher memory allocations as it doesn't make use of the Span type.
AWS Lambda is pretty cool, it just gets used a lot for applications that it was never really designed for. While I wish that Amazon would address the cold start times, if you try to grill your burgers with a cordless drill, you can’t really blame the drill manufacturer when the meat doesn’t cook.
The main downside of Lambda, in particular for user facing applications is that the incentives of the cloud provider and you are completely opposed. You (the developer) want a bunch of warm lambdas ready to serve user requests and the cloud provider is looking to minimize costs by keeping the number of running lambdas as low as possible. It's the incentive model that fundamentally makes Lambda a poor choice for these types of applications.
Other downsides include the fact that Lambdas have fixed memory sizes. If you have units of work that vary in amount of memory required you're basically stuck paying the costs of the largest units of work unless you can implement some sort of routing logic somewhere else. My company ran into this issue using lambdas to process some data where the 99% of requests were fine running in 256mb but a few required more. There was so way to know ahead of time how much memory the computation would require ahead of time. We ended up finding a way to deal with it but in the short term we had to bump the lambda memory limits.
That doesn't even get into the problems with testing.
In my experience, Lambdas are best used as glue between AWS components, message processors and cron style tasks.
> the incentives of the cloud provider and you are completely opposed
I think this is a little overstated. The cloud provider wants their customers to be happy while minimizing costs (and therefore costs to the customer). It's not truly a perverse incentive scenario.
Disagree with "completely opposed". Cloud providers want to make money, sure, but in general everyone in the ecosystem benefits if every CPU cycle is used efficiently. Any overhead goes out of both AWS's and your pockets and instead to the electricity provider, server manufacturer, cooling service.
I just want to appreciate the article. Starting with non-clickbait title, upfront summary, detailed numbers, code for reruns, great graphs, no dreamy story and no advertisement of any kind.
It is hosted on Medium but the author has done a banging great job, so gets a pass. If he is reading, excellent work!
I recently discovered that uWSGI has a "cheap mode" that will hold the socket open but only actually spawn workers when a connection comes in (and kill them automatically after a timeout without any requests).
If you already have 24/7 compute instances going and can spare the CPU/RAM headroom, you can co-host your "lambdas" there, and make them even cheaper :)
It makes Lambda look like a product with a much narrower niche than what AWS wants to sell it as. For many people knowing beforehand that cold start times are > 500ms with 256MB (quite extravagant for serving a single web request) would disqualify Lambda for any customer-serving endpoint. As it stands many get tricked into that choice if they don't perform these tests themselves.
afaik AWS doesn't publish benchmarks on runtimes; but if they did, I am sure it'd result in a lot of finger-pointing and wasted energy if they were not to normalize the process first (something like acidtests.org).
They don't want to provide it themselves because then they have to admit that the performance is abysmal. Instead they let random blogos provide this data so they can just sit back and say "you're doing it wrong."
They have never cared about this cold-start metric or the devs who do. The hope is that the first users have a degraded experience helps the next 1,000 users that minute have a perfect experience.
To AWS it's like complaining about the end bit of crust in an endless loaf of sliced white bread that was baked in under 2 seconds.
A pattern I have implemented is to have my API code on both ECS/Fargate and Lambda at the same time, and send traffic to the appropriate one using an Elastic Load Balancer. I flag specific endpoints as "cpu intensive" and have them run on lambda.
Implemented by
- Duplicating all routes in the API with the "/sls/" prefix (this is a couple of lines in FastAPI)
- Setting up a rule in ELB to route to Lambda if the route starts with /sls, or to ECS otherwise.
- Set up the CPU intensive routes to automatically respond with a 307 to the same route but prefixed with /sls.
Boom, with that the system can handle bursts of CPU intensive traffic (e.g. data exports) while remaining responsive to the simple 99% of requests all on one vCPU.
And the same dockerfile, with just a tiny change, can be used both in ECS and Lambda.
If anyone is running into cold start problems on Firebase, I recently discovered you can add .runWith({minInstances: 1}) to your cloud functions.
It keeps 1 instance running at all times, and for the most part completely gets rid of cold starts. You have to pay a small cost each month (a few dollars), but its worth it on valuable functions that result in conversions, e.g. loading a Stripe checkout.
I'm surprised Node has cold-start issues. I had it in my mind that JS was Lambda's "native" language and wouldn't have cold start issues at all. Did it used to be like that? Didn't Lambda launch with only support for JS, and maybe a couple other languages that could compile to it?
I thought nodejs/v8 or any javascript runtime would have some kind of startup cost since it has to parse and compile the javascript code first. See a simple hello world execution time comparison:
# a Go hello world
$ time ./hello
hi
real 0m0.002s
$ time echo 'console.log("hello")' | node -
hello
real 0m0.039s
The ~25ms of cold start noted in this article feels acceptable and impressive to me, given what node is doing under the hood.
Cold start has been a problem with Lambda since day 1, and in fact has massively improved in recent years.
Node.js is optimized for request throughput rather than startup time. The assumption is that you will have a "hot" server running indefinitely. The Lambda pattern is in general a very recent invention, and not something that languages/rutimes have specifically considered in their design yet.
With node.js, the cold start problem is caused by how node loads files. For each file it does about 10 IO operations (to resolve the file from the module name), then load, parse and compile the file.
If using any file system that is not super fast, this amounts to long delays.
There are ways to get around that, but those are not available on lambda
I wonder ho w much time was spent requiring all of aws-sdk. The v3 sdk is modular and should be quicker to load. Bundlers like rebuild save space and reduce parsing time.
Container based lambda image configurations (vs zip based) would be a good addition to this comparison. People use them eg to get over the zip based lambda size limit.
Also maybe mentione provisioned concurrency (where you pay AWS to keep one or more instances of your lambda warm).
Both of these are supported by Serverless framework btw.
Definitely. In my experience, docker image based Lambdas had consistently poor (>3s) cold starts regardless of memory. I hope it will eventually improve as it is a much nicer packaging approach than ZIP file.
Also, it would have been nice to include ARM vs x86 now that ARM is available.
Slightly off topic, but what's the deal with Azure Functions cold start times in the Consumption (i.e. serverless) plan? I get cold start times in the multi seconds range (sometimes huge values, like 20s). Am I doing something wrong? Or is this expected?
I think if you get to this point with lambda you're probably overthinking it. I think language runtime choice is important because some choices do have a cost, but likewise, choosing lambda is a tradeoff -- you don't have to manage servers, but some of the startup and runtime operations will be hidden to you. If you're okay with the possible additional latency and don't want to manage servers, it's fine. If you do and want to eke performance, it might not be.
Larger lambdas mean a higher likelihood of concurrent access, which will result in cold starts when there is contention. Your cold starts will be slower with more code (It's not clear how much the size of your image affects start time, but it does have SOME impact).
It's best to just not worry about these kinds of optimizations -- that's what lambda is for. If you *want* to worry about optimizing, the best optimization is running a server that is actively listening.
Scope your lambda codebase in a way that makes sense. It's fine if your lambda takes multiple event types or does routing, but you're making the test surface more complex. Just like subnets, VPCs and everything else in AWS, you can scope them pretty much however you want and there's no hard fast rule saying "put more code in one" or "put less code in one", but by there are patterns that make sense and generally lots of individual transactions are easier to track and manage unless you have an explicit use case that requires scoping it to one lambda, in which case do that.
There are a few cases where I've advocated for bigger lambdas vs smaller ones:
* grapqhl (there still isn't a very good graphql router and data aggregator, so just handling the whole /graphql route makes the most sense)
* Limited concurrency lambdas. If you have a downstream that can only handle 10 concurrent transactions but you have multiple lambda interactions that hit that service, it might be better to at least bundle all of the downstream interactions into one lambda to limit the concurrency on it.
> NodeJs is the slowest runtime, after some time it becomes better(JIT?) but still is not good enough. In addition, we see the NodeJS has the worst maximum duration.
The conclusion drawn about NodeJS performance is flawed due to a quirk of the default settings in the AWS SDK for JS compared to other languages. By default, it opens and closes a TCP connection for each request. That overhead can be greater than the time actually needed to interact with DDB.
I submitted a pull request to fix that configuration[0]. I expect the performance of NodeJS warm starts to look quite a bit better after that.
In addition, the NodeJS cold start time can be further optimized by bundling into a single file artifact to reduce the amount of disk IO needed when requiring dependencies. Webpack, Parcel, ESBuild, and other bundlers could achieve that, I'm sure.
EDIT: That may already be happening here in the build.sh file. I see it runs `sam build --use-container NodeJsFunction -b nodejs`.
I was surprised by the quality of this one. That said...
Cold starts are a FaaS learning subject but they almost never matter much in practice. What workloads are intermittent and also need extremely low latencies? Usually when I see people worrying about this it is because they have architected their system with call chains and the use case, if it really matters, can be re-architected so that the query result is pre prepared. This is much like search results... Search engines certainly don't process the entire web to service your queries. Instead, they pre-calculate the result for each query and update those results as content CRUD happens.
I've been getting around 20ms cold starts, 1ms warm exec on the 128MB ARM Graviton2 using Rust for the most basic test cases. Graviton2 was slightly slower on cold starts than X86 for me (1-2ms) but who doesn't want to save $0.0000000004 per execution? Adding calls to parameter store/dynamo DB bumps it up a little but still < 120ms cold, and any added latency comes from waiting on the external service calls.
Memory usage is 20-30MB, and I haven't done anything to optimise memory. I know I can get rid of a few allocations I'm doing for simplicity if I want to.
I've not always been the greatest fan of Lambdas seeing it has hidden complexity orchestrating and a blackbox for debugging. Re-visiting a few years on and with Rust, you get an excellent language, excellent runtime characteristics and substantial cost savings unless you really need more than 128MB memory, i.e. processing large volumes of data per execution in memory or transcoding. Any asynchronous/event-driven service I write, I'll just package as a Rust lambda going forward and pay fractions of a cent per month. I am still on-the-wall with HTTP exposed services as that's a big plumbing exercise and hidden gateway costs but not as adverse to it as I was.
[+] [-] gunnarmorling|4 years ago|reply
And, as others said, assigning more RAM to your Lambda than it actually may need itself, will also help with cold start times, as this increases the assigned CPU shares, too.
[1] https://www.morling.dev/blog/how-i-built-a-serverless-search...
[+] [-] websap|4 years ago|reply
1 - https://aws.amazon.com/blogs/aws/new-provisioned-concurrency...
[+] [-] buzzdenver|4 years ago|reply
[+] [-] aahortwwy|4 years ago|reply
"serverless" indeed
[+] [-] yoava|4 years ago|reply
[+] [-] cmcconomy|4 years ago|reply
Cold start times compared to Lambda are worse, and in addition, we would get random 404s which do not appear in any logs; inspecting these 404s indicated they were emitted by nginx, leading me to believe that the ultimate container endpoint was killed for whatever reason but that fact didn't make it back to the router, which attempted and failed to reach the function.
Of course the cold start and 404 are mitigated if you pay for the premium serverless or just host their middleware on their own App Service plans (basically VMs)
[+] [-] danielvaughn|4 years ago|reply
The cold starts are horrendous. In one case, it's consistently taking about 7 seconds to return ~10K of data. I investigated the actual runtime of the function and it completes in about 20ms, so the only real bottleneck is the fucking cold start.
[+] [-] secondaryacct|4 years ago|reply
And I was so happy to leave the clusterfuck of 300 aws lambda I was working with in my prev company.
What an expensive fad, and no engineer is ever consulted ...
[+] [-] GordonS|4 years ago|reply
I regularly see start up times exceeding 10s for small, dotnet based functions. One is an auth endpoint for a self-hosted Docker registry, and the Docker CLI often times out when logging in if there is a cold start. I'm planning on moving these functions to Docker containers hosted in a VM.
I have other issues with Functions too. If you enable client certificates, the portal UI becomes pretty useless, with lots of stuff inaccessible. I have one such endpoint in production just now, and it's even worse than that, as every now and then it just... stops working until I manually restart it. Nothing useful in the logs either.
[+] [-] MaKey|4 years ago|reply
[+] [-] HappyVertical|4 years ago|reply
[+] [-] AtNightWeCode|4 years ago|reply
[+] [-] adflux|4 years ago|reply
[+] [-] _fat_santa|4 years ago|reply
Sure kind of defeats the purpose of serverless but hey, enterprise software.
[+] [-] psanford|4 years ago|reply
Test your lambda functions in different configurations to see if the optimal setting is different than the minimal setting.
[+] [-] jabart|4 years ago|reply
[+] [-] moonchrome|4 years ago|reply
[1] https://docs.microsoft.com/en-us/dotnet/core/deploying/ready...
I wonder did you test if the increased size results in an actual win for the startup time ?
[+] [-] projectileboy|4 years ago|reply
[+] [-] davewritescode|4 years ago|reply
Other downsides include the fact that Lambdas have fixed memory sizes. If you have units of work that vary in amount of memory required you're basically stuck paying the costs of the largest units of work unless you can implement some sort of routing logic somewhere else. My company ran into this issue using lambdas to process some data where the 99% of requests were fine running in 256mb but a few required more. There was so way to know ahead of time how much memory the computation would require ahead of time. We ended up finding a way to deal with it but in the short term we had to bump the lambda memory limits.
That doesn't even get into the problems with testing.
In my experience, Lambdas are best used as glue between AWS components, message processors and cron style tasks.
[+] [-] joncrane|4 years ago|reply
I think this is a little overstated. The cloud provider wants their customers to be happy while minimizing costs (and therefore costs to the customer). It's not truly a perverse incentive scenario.
[+] [-] paxys|4 years ago|reply
[+] [-] Lucasoato|4 years ago|reply
Well, you could have deployed 2 lambdas: one as a coordinator with 256MB and another worker with the Ram you need called in those 1% of cases.
And remember that if you had to do that on a server with a virtual machine you still needed all the ram deployed 24/7.
[+] [-] tuananh|4 years ago|reply
i think lambda is the defacto standard now for integration on aws.
but they are pushing lambda hard for other workload too
[+] [-] e3bc54b2|4 years ago|reply
It is hosted on Medium but the author has done a banging great job, so gets a pass. If he is reading, excellent work!
[+] [-] rcarmo|4 years ago|reply
Pertinent options: https://github.com/piku/piku/blob/master/piku.py#L908
If you already have 24/7 compute instances going and can spare the CPU/RAM headroom, you can co-host your "lambdas" there, and make them even cheaper :)
[+] [-] rajin444|4 years ago|reply
[+] [-] pid-1|4 years ago|reply
1 - Hey, $SERVICE looks exactly what I need. Neat!
2 - Wait, how do I do $THING with $SERVICE. No way $SERVICE can't do $THING.
3 - Realize $SERVICE is extremely limited and full of weird edge cases, get pissed.
In general their docs are not transparent about the limitations of their products compared to similar non managed solutions.
I sort of gave up using AWS managed services after a few years DevOpsing, except for the more flexible / battle tested ones: VPC, EC2, etc...
[+] [-] wongarsu|4 years ago|reply
[+] [-] ignoramous|4 years ago|reply
That said, they do publish plenty of guidance. For ex, see Chapter 9: Optimizing serverless application performance of their well-architected series: https://aws.amazon.com/ru/blogs/compute/building-well-archit...
[+] [-] VWWHFSfQ|4 years ago|reply
[+] [-] unraveller|4 years ago|reply
To AWS it's like complaining about the end bit of crust in an endless loaf of sliced white bread that was baked in under 2 seconds.
[+] [-] mulmboy|4 years ago|reply
Implemented by
- Duplicating all routes in the API with the "/sls/" prefix (this is a couple of lines in FastAPI)
- Setting up a rule in ELB to route to Lambda if the route starts with /sls, or to ECS otherwise.
- Set up the CPU intensive routes to automatically respond with a 307 to the same route but prefixed with /sls.
Boom, with that the system can handle bursts of CPU intensive traffic (e.g. data exports) while remaining responsive to the simple 99% of requests all on one vCPU.
And the same dockerfile, with just a tiny change, can be used both in ECS and Lambda.
[+] [-] tinyprojects|4 years ago|reply
It keeps 1 instance running at all times, and for the most part completely gets rid of cold starts. You have to pay a small cost each month (a few dollars), but its worth it on valuable functions that result in conversions, e.g. loading a Stripe checkout.
[+] [-] digianarchist|4 years ago|reply
[+] [-] losvedir|4 years ago|reply
[+] [-] maxmcd|4 years ago|reply
[+] [-] paxys|4 years ago|reply
Node.js is optimized for request throughput rather than startup time. The assumption is that you will have a "hot" server running indefinitely. The Lambda pattern is in general a very recent invention, and not something that languages/rutimes have specifically considered in their design yet.
[+] [-] yoava|4 years ago|reply
If using any file system that is not super fast, this amounts to long delays.
There are ways to get around that, but those are not available on lambda
[+] [-] artimaeis|4 years ago|reply
https://web.archive.org/web/20141115183837/http://aws.amazon...
[+] [-] seniorsassycat|4 years ago|reply
[+] [-] fulafel|4 years ago|reply
Also maybe mentione provisioned concurrency (where you pay AWS to keep one or more instances of your lambda warm).
Both of these are supported by Serverless framework btw.
[+] [-] time0ut|4 years ago|reply
Also, it would have been nice to include ARM vs x86 now that ARM is available.
[+] [-] haolez|4 years ago|reply
[+] [-] mikesabbagh|4 years ago|reply
[+] [-] davidjfelix|4 years ago|reply
Larger lambdas mean a higher likelihood of concurrent access, which will result in cold starts when there is contention. Your cold starts will be slower with more code (It's not clear how much the size of your image affects start time, but it does have SOME impact).
It's best to just not worry about these kinds of optimizations -- that's what lambda is for. If you *want* to worry about optimizing, the best optimization is running a server that is actively listening.
Scope your lambda codebase in a way that makes sense. It's fine if your lambda takes multiple event types or does routing, but you're making the test surface more complex. Just like subnets, VPCs and everything else in AWS, you can scope them pretty much however you want and there's no hard fast rule saying "put more code in one" or "put less code in one", but by there are patterns that make sense and generally lots of individual transactions are easier to track and manage unless you have an explicit use case that requires scoping it to one lambda, in which case do that.
There are a few cases where I've advocated for bigger lambdas vs smaller ones:
* grapqhl (there still isn't a very good graphql router and data aggregator, so just handling the whole /graphql route makes the most sense)
* Limited concurrency lambdas. If you have a downstream that can only handle 10 concurrent transactions but you have multiple lambda interactions that hit that service, it might be better to at least bundle all of the downstream interactions into one lambda to limit the concurrency on it.
[+] [-] bilalq|4 years ago|reply
The conclusion drawn about NodeJS performance is flawed due to a quirk of the default settings in the AWS SDK for JS compared to other languages. By default, it opens and closes a TCP connection for each request. That overhead can be greater than the time actually needed to interact with DDB.
I submitted a pull request to fix that configuration[0]. I expect the performance of NodeJS warm starts to look quite a bit better after that.
[0]: https://github.com/Aleksandr-Filichkin/aws-lambda-runtimes-p...
[+] [-] bilalq|4 years ago|reply
EDIT: That may already be happening here in the build.sh file. I see it runs `sam build --use-container NodeJsFunction -b nodejs`.
[+] [-] NicoJuicy|4 years ago|reply
[+] [-] erikerikson|4 years ago|reply
Cold starts are a FaaS learning subject but they almost never matter much in practice. What workloads are intermittent and also need extremely low latencies? Usually when I see people worrying about this it is because they have architected their system with call chains and the use case, if it really matters, can be re-architected so that the query result is pre prepared. This is much like search results... Search engines certainly don't process the entire web to service your queries. Instead, they pre-calculate the result for each query and update those results as content CRUD happens.
[+] [-] tiew9Vii|4 years ago|reply
I've been getting around 20ms cold starts, 1ms warm exec on the 128MB ARM Graviton2 using Rust for the most basic test cases. Graviton2 was slightly slower on cold starts than X86 for me (1-2ms) but who doesn't want to save $0.0000000004 per execution? Adding calls to parameter store/dynamo DB bumps it up a little but still < 120ms cold, and any added latency comes from waiting on the external service calls.
Memory usage is 20-30MB, and I haven't done anything to optimise memory. I know I can get rid of a few allocations I'm doing for simplicity if I want to.
I've not always been the greatest fan of Lambdas seeing it has hidden complexity orchestrating and a blackbox for debugging. Re-visiting a few years on and with Rust, you get an excellent language, excellent runtime characteristics and substantial cost savings unless you really need more than 128MB memory, i.e. processing large volumes of data per execution in memory or transcoding. Any asynchronous/event-driven service I write, I'll just package as a Rust lambda going forward and pay fractions of a cent per month. I am still on-the-wall with HTTP exposed services as that's a big plumbing exercise and hidden gateway costs but not as adverse to it as I was.