Deleting an S3 Bucket Costs Money

[+] CSDude|4 years ago|reply

You can use lifecycle policies to delete it for free, but its best to confirm it via support. Not saying this is the great way, maybe its intentionally hidden, but at least there is a way.

https://stackoverflow.com/questions/59170391/s3-lifecycle-ex...

[+] treesknees|4 years ago|reply

Yes, this was mentioned as the preferred method in the article. As the article states, it's free for objects in the standard storage tier but will incur a Transition cost for other tiers. It's not hidden, but it's not exactly advertised as a way to empty a bucket.

[+] tyingq|4 years ago|reply

There's some caveats to that...

"If you create an S3 Lifecycle expiration rule that causes objects that have been in S3 Standard-IA or S3 One Zone-IA storage for less than 30 days to expire, you are charged for 30 days"

It goes on to say 90 days for Glacier and 180 days for Glacier Deep.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecy...

[+] fideloper|4 years ago|reply

Lifecycle rules are the preferred (cheapest) way to do it for standard-storage objects in S3 for sure!

[+] unknown|4 years ago|reply

[deleted]

[+] helium|4 years ago|reply

Not only is a problem that deleting a bucket costs money, but if you have a big bucket with many deeply nested files, it can take a really long time to clean it up using the AWS command line.

I ran into this with a bucket full of EMR log files a few years ago and had to figure out some pretty crazy command line hackiness, plus running on a EC2 machine with lots of cores to figure it out. This a write-up I did if anyone else ever runs into this issue.

https://gist.github.com/michael-erasmus/6a5acddcb56548874ffe...

[+] JackC|4 years ago|reply

Per-object costs can be tricky with S3 -- it's easy to mentally round costs less than 1/10th of a penny to zero, and then look up a few years later and realize you have hundreds of millions of things and can't afford to do anything with them.

When this bit us on a project I made a tool to solve our particular problem, which tars files, writes csv indexes, and can fetch individual files from the tars if need be.[1] Running on millions of files was janky enough that I also ended up scripting an orchestrator to repeatedly attempt each step of the pipeline.[2] Not tested on data other than ours but could be a useful starting point.

[1] https://github.com/harvard-lil/s3mothball [2] https://github.com/harvard-lil/mothball_pipeline

[+] WORMS_EAT_WORMS|4 years ago|reply

And deleting your AWS account will keep billing you [1] if you don’t delete all resources first.

AWS is designed to extract dollars from big enterprise contracts.

Also interesting from the article, this poor soul on StackOverflow was trying to figure out how to delete a bucket that would cost him $20,000 [2]. Can’t delete, can’t close.

[1] https://www.reddit.com/r/aws/comments/j5nh4w/ive_deleted_my_...

[2] https://stackoverflow.com/questions/54255990/cheapest-way-to...

[+] bluelu|4 years ago|reply

A long time ago, we deleted/disabled our aws account and I assumed that all my files on s3 would also get deleted.

When we reactivated the account a few yearss later, we were retroactively billed for all the files in the s3 bucket. We got the money back though.

[+] perlpimp|4 years ago|reply

Amazon has threatened via email to be sent to collections for 1$.... over s3 bucket I had no access to for 3 years.

[+] pattyj|4 years ago|reply

I've had to ask my credit card company to block AWS charging an account that I can't remember the login to.

Amazon can keep trying to get blood from a stone...

[+] jrochkind1|4 years ago|reply

From the answers on the SO, it looks like there are several ways to delete 2 billion objects that will be free or cheap. I think? Not sure. Which is part of why I comment here, if anyone else is.

[+] danlugo92|4 years ago|reply

www.privacy.com

[+] unknown|4 years ago|reply

[deleted]

[+] pkorzeniewski|4 years ago|reply

Pricing of AWS services makes me uneasy in general, just take the S3 as an example - you go to the pricing page and you have several tabs with dozens of entries which makes calculating how much exactly will you pay difficult. I might be simple minded but I prefer a clearly defined plans with predetermined limits - you know exactly what it costs you each month and what you get and if you need more, just switch to a higher plan, no risk of nasty (and often expensive) surprises like mentioned in the article.

[+] rad_gruchalski|4 years ago|reply

Yup. And uploading / downloading large objects from S3 incurs tons of requests because S3 client does parallel chunking with a small number of other control requests. That client works on the same premise as SFTP client.

It’s amazing how often it retries.

Example from go sdk: https://github.com/aws/aws-sdk-go/blob/main/service/s3/s3man....

[+] userbinator|4 years ago|reply

The first thing everyone who tries using cloud services should learn: everything costs money. Even the service that tells you how much it costs: https://aws.amazon.com/aws-cost-management/pricing/

[+] tg180|4 years ago|reply

I'm still learning about AWS, but I don't think other clouds do the same.

If there are still those who do not use the cloud, it is because the big three have taken advantage of their position a lot.

The pricing of Hetzner, CloudFlare, Linode, OVH, ... seems to be cheaper and more transparent.

[+] Decabytes|4 years ago|reply

This post finally got my ass in gear to cancel an account that I thought I had closed but was still charging me a few dollars a month.

I spinned up an AWS instance to practice, and once I was done I thought I closed everything down.

Turns out I had just stopped my micro instances, and I didn't terminate them. I also hadn't released the my IP address. There was also a snapshot of the tiny db I had created still floating around. The documentation was a little confusing, so after I went through it I spent half an hour chatting with a support rep to make sure everything was completely good. After next month my last bill should go through and I should be free and clear. Unfortunately I have to wait for next months bill to go through as I can't just pay it all now.

This was mostly my fault for letting it go on for so long, but I hate how if you don't do some very specific steps you can still be charged. And I think if an account is closed, it should absolutely terminate all services that are still running on that account, and then send you the final bill.

[+] fideloper|4 years ago|reply

If it happens again, you can try out the aws-nuke tool to help you destroy resources. (Because of course an open source tool is needed for this )

[+] roland35|4 years ago|reply

I've made the same mistake! Also check for EBS images which can be sneaky.

[+] devnull3|4 years ago|reply

How much of this is a problem in practice?

I think in practice, S3 data is often indexed using other DBs e.g DynamoDB, Postgres, MySQL etc. Can't this index be used to enumerate all S3 URLs? I am off-course simplifying this a lot.

[+] onion2k|4 years ago|reply

How much of this is a problem in practice?

This specific issue probably isn't a very big problem.

The issue of Amazon repeatedly coming up on HN as a service that will bill you when you're not unexpecting it for things that are moderately hard to understand and might refund you later probably costs them tens or even hundreds of millions in lost revenue every year from developers being cautious about deploying things to their services.

[+] Manfred|4 years ago|reply

Developers are pretty sloppy with maintaining references to object in my experience. It generally only becomes a problem when you need to clean up and that usually at time of IPO when there are petabytes of data in S3.

[+] peanut_worm|4 years ago|reply

Stories like this make me extremely hesitant to try AWS. I was about to try S3 for a static site I was working on this weekends but I think I am gonna stick with netlify or digital ocean instead after reading this.

[+] certifiedloud|4 years ago|reply

Having worked with S3+Cloudfront as well as Netlify, I can say with confidence that Netlify is better anyways. And it's hard to beat free...

[+] input_sh|4 years ago|reply

In theory, I'm sure you could do that with the free tier.

In practice, activating that free tier requires a valid card, and I'd highly advise never giving them your own. Whatever alternative you can think of is 100% better for your sanity.

[+] Saris|4 years ago|reply

There are also other S3 providers with much more sensible pricing, like Backblaze B2, Wasabi, or Cloudflare R2.

[+] donatj|4 years ago|reply

> .5¢ per 1000 items LISTed seems insanely expensive considering how cheaply you can transfer terabytes of data with S3.

Correction: I misread - .5¢ per 1,000,000 items LISTed

  .5¢ per 1000 LIST operations
  LIST operations max out at 1000 items

Still a little pricey, but way less so than I'd imagined.

Do they make a lot of money off of charging for basic operations? It seems like you could make the whole pricing structure a lot more friendly by only charging for bandwidth use. I guess when you're as dominant as S3, you don't need to care about friendly pricing structures.

Charging for basic operations like that is weird, it's akin to a service charging people per number of clicks on a website.

[+] codazoda|4 years ago|reply

There's money in confusion... I'm terrified of using the existing cloud services for personal projects. For business projects you can mostly just get an idea of how much your month-to-month bill will increase with certain actions, but it's sure easy to blow a budget by accident.

[+] SpicyLemonZest|4 years ago|reply

Listing is an expensive operation. I don't know the exact economics of it, but it's very plausible to me that serving 1000 LIST requests has a comparable resource cost to transferring a couple GB of data cross-region. (It should be noted that this definitely isn't a market dominance thing - every S3 competitor I'm familiar with also charges per-operation, and charges 10 times more for LIST than GET.)

[+] dahart|4 years ago|reply

> .5¢ per 1000 items LISTed seems insanely expensive

Note it’s $0.005 per 1k requests, not $0.05 per 1k items -- that’s an extra zero from what you said, and also important to point out that one request can list 1k items. So if you list in 1k batches, it’s $5 per million items listed.

[+] unknown|4 years ago|reply

[deleted]

[+] NabiDev|4 years ago|reply

> In 2021, anyone who comes across this question may benefit to know that AWS console now provides an empty button.

source : https://stackoverflow.com/a/67834172

[+] bryan0|4 years ago|reply

According to the article the empty button still calls LIST per 1000 objects. So if the guy in the SO thread has 2B object this one click would still cost him ~$10k ??

[+] tedivm|4 years ago|reply

> Within the last year, AWS added a handy Empty button to the S3 console when viewing a bucket. You can click that button and watch the S3 console make API calls on your behalf.

> Here's what it does: It calls a LIST on the bucket, pagination through the objects in the bucket 1000 at a time. It calls a DeleteObjects API method, deleting 1000 at a time.

> The cost is 1 API LIST call per 1000 objects in the bucket. Delete operations are free, so there's no extra cost there.

source: I read the article.

[+] vdm|4 years ago|reply

> you can also get an export of all objects in a bucket using S3 Inventory and run the output through AWS Batch in order to delete those objects

"S3 Batch Operations" sends S3 requests based on a csv file, which can but does not have to be from S3 Inventory. But S3 Batch Operations supports only a subset of APIs and this does not include DeleteObject(s). [0]

An AWS Batch job could run a container which sends DeleteObjects requests but only when triggered by a job queue which seems redundant here.

If I can't use an expiration lifecycle policy because I need a selection of objects not matching a prefix or object tags, I would run something with `s5cmd rm` [1]. Alternatively roll your own golang which parses the CSV and sends many DeleteObjects requests in parallel goroutines.

0. https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-...

1. https://github.com/peak/s5cmd#delete-multiple-s3-objects

[+] jugg1es|4 years ago|reply

They have an example of some person almost paying $20k on transition fees. In my early days of AWS, I racked up $90k on S3 transition fees. Thankfully, AWS forgave it.

[+] onion2k|4 years ago|reply

Stories of forgiven fees are an example of survivorship bias. Developers who rack up thousands in AWS charges by mistake and aren't forgiven probably don't tell too many people about the time they screwed up and cost their company a lot of money.

[+] unknown|4 years ago|reply

[deleted]

[+] mweberxyz|4 years ago|reply

Would the S3 inventory help here? That would allow you to get the list of all files (albeit on a delay similar to the lifecycle rule approach), which you could process offline to generate the DELETEs.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/storag...

[+] malka|4 years ago|reply

DELETE is free LIST is not

I guess one could spam DELETE calls while bruteforcing filenames to make it free.

[+] Eikon|4 years ago|reply

Ok pretty obvious, but if you don't know what you are storing inside your bucket, how are you accessing your objects in the first place ?

If your use-case is storing random things you don't know the path of, maybe it's the wrong product to use.

[+] shric|4 years ago|reply

A common pattern is to dump log files or to use as a dead letter queue for failed event processing. These things typically are arbitrarily named by e.g. a prefix and a unix timestamp

[+] afandian|4 years ago|reply

One of the great features of S3 is that it has an arbitrary prefix index on keys! And the list API paginates on batches of 1000 which is useful.

You can retrieve all objects with a given prefix, which is great for storing content-addressed files, and being able to iterate on them. You can also partition on arbitrary prefixes too.

[+] CobrastanJorji|4 years ago|reply

Here's an example. You're doing a big machine learning workflow and you've got a gazillion large files representing training data. You run a large training job in Amazon's cloud somewhere that is composed of one worker that lists the files and delegates work to various learning jobs, and then those jobs each stream in one training file, process it, and then grab the next item and repeat. That's a pretty common type of work.

[+] Someone|4 years ago|reply

> AWS is "eventually consistent" within most services, and S3 is no exception

Nowadays, it (¿almost?) is. https://aws.amazon.com/s3/consistency/:

“After a successful write of a new object, or an overwrite or delete of an existing object, any subsequent read request immediately receives the latest version of the object”

I think that says that deletes are immediately visible, too, but they phrase it weirdly, as, after a delete, there is no latest version of the object.

Also, I don’t think buckets are objects in this sense, so the caveat in the article stands.

[+] Hamuko|4 years ago|reply

>The wait is often hours until AWS released a bucket name (since bucket names are globally unique, not just within your account).

I think last time I did this, the wait time was pretty much exactly 60 minutes.

[+] annoyingnoob|4 years ago|reply

This kind of thing is the best argument for your own bare metal hardware.

[+] marcosdumay|4 years ago|reply

Or at least get some service with transparent pricing, instead of per-object per-action bullshit.

145 comments