You can use lifecycle policies to delete it for free, but its best to confirm it via support. Not saying this is the great way, maybe its intentionally hidden, but at least there is a way.
Yes, this was mentioned as the preferred method in the article. As the article states, it's free for objects in the standard storage tier but will incur a Transition cost for other tiers. It's not hidden, but it's not exactly advertised as a way to empty a bucket.
"If you create an S3 Lifecycle expiration rule that causes objects that have been in S3 Standard-IA or S3 One Zone-IA storage for less than 30 days to expire, you are charged for 30 days"
It goes on to say 90 days for Glacier and 180 days for Glacier Deep.
Not only is a problem that deleting a bucket costs money, but if you have a big bucket with many deeply nested files, it can take a really long time to clean it up using the AWS command line.
I ran into this with a bucket full of EMR log files a few years ago and had to figure out some pretty crazy command line hackiness, plus running on a EC2 machine with lots of cores to figure it out. This a write-up I did if anyone else ever runs into this issue.
Per-object costs can be tricky with S3 -- it's easy to mentally round costs less than 1/10th of a penny to zero, and then look up a few years later and realize you have hundreds of millions of things and can't afford to do anything with them.
When this bit us on a project I made a tool to solve our particular problem, which tars files, writes csv indexes, and can fetch individual files from the tars if need be.[1] Running on millions of files was janky enough that I also ended up scripting an orchestrator to repeatedly attempt each step of the pipeline.[2] Not tested on data other than ours but could be a useful starting point.
And deleting your AWS account will keep billing you [1] if you don’t delete all resources first.
AWS is designed to extract dollars from big enterprise contracts.
Also interesting from the article, this poor soul on StackOverflow was trying to figure out how to delete a bucket that would cost him $20,000 [2]. Can’t delete, can’t close.
From the answers on the SO, it looks like there are several ways to delete 2 billion objects that will be free or cheap. I think? Not sure. Which is part of why I comment here, if anyone else is.
Pricing of AWS services makes me uneasy in general, just take the S3 as an example - you go to the pricing page and you have several tabs with dozens of entries which makes calculating how much exactly will you pay difficult. I might be simple minded but I prefer a clearly defined plans with predetermined limits - you know exactly what it costs you each month and what you get and if you need more, just switch to a higher plan, no risk of nasty (and often expensive) surprises like mentioned in the article.
Yup. And uploading / downloading large objects from S3 incurs tons of requests because S3 client does parallel chunking with a small number of other control requests. That client works on the same premise as SFTP client.
This post finally got my ass in gear to cancel an account that I thought I had closed but was still charging me a few dollars a month.
I spinned up an AWS instance to practice, and once I was done I thought I closed everything down.
Turns out I had just stopped my micro instances, and I didn't terminate them. I also hadn't released the my IP address. There was also a snapshot of the tiny db I had created still floating around. The documentation was a little confusing, so after I went through it I spent half an hour chatting with a support rep to make sure everything was completely good. After next month my last bill should go through and I should be free and clear. Unfortunately I have to wait for next months bill to go through as I can't just pay it all now.
This was mostly my fault for letting it go on for so long, but I hate how if you don't do some very specific steps you can still be charged. And I think if an account is closed, it should absolutely terminate all services that are still running on that account, and then send you the final bill.
I think in practice, S3 data is often indexed using other DBs e.g DynamoDB, Postgres, MySQL etc. Can't this index be used to enumerate all S3 URLs? I am off-course simplifying this a lot.
This specific issue probably isn't a very big problem.
The issue of Amazon repeatedly coming up on HN as a service that will bill you when you're not unexpecting it for things that are moderately hard to understand and might refund you later probably costs them tens or even hundreds of millions in lost revenue every year from developers being cautious about deploying things to their services.
Developers are pretty sloppy with maintaining references to object in my experience. It generally only becomes a problem when you need to clean up and that usually at time of IPO when there are petabytes of data in S3.
Stories like this make me extremely hesitant to try AWS. I was about to try S3 for a static site I was working on this weekends but I think I am gonna stick with netlify or digital ocean instead after reading this.
In theory, I'm sure you could do that with the free tier.
In practice, activating that free tier requires a valid card, and I'd highly advise never giving them your own. Whatever alternative you can think of is 100% better for your sanity.
> .5¢ per 1000 items LISTed seems insanely expensive considering how cheaply you can transfer terabytes of data with S3.
Correction: I misread - .5¢ per 1,000,000 items LISTed
.5¢ per 1000 LIST operations
LIST operations max out at 1000 items
Still a little pricey, but way less so than I'd imagined.
Do they make a lot of money off of charging for basic operations? It seems like you could make the whole pricing structure a lot more friendly by only charging for bandwidth use. I guess when you're as dominant as S3, you don't need to care about friendly pricing structures.
Charging for basic operations like that is weird, it's akin to a service charging people per number of clicks on a website.
There's money in confusion... I'm terrified of using the existing cloud services for personal projects. For business projects you can mostly just get an idea of how much your month-to-month bill will increase with certain actions, but it's sure easy to blow a budget by accident.
Listing is an expensive operation. I don't know the exact economics of it, but it's very plausible to me that serving 1000 LIST requests has a comparable resource cost to transferring a couple GB of data cross-region. (It should be noted that this definitely isn't a market dominance thing - every S3 competitor I'm familiar with also charges per-operation, and charges 10 times more for LIST than GET.)
> .5¢ per 1000 items LISTed seems insanely expensive
Note it’s $0.005 per 1k requests, not $0.05 per 1k items -- that’s an extra zero from what you said, and also important to point out that one request can list 1k items. So if you list in 1k batches, it’s $5 per million items listed.
According to the article the empty button still calls LIST per 1000 objects. So if the guy in the SO thread has 2B object this one click would still cost him ~$10k ??
> Within the last year, AWS added a handy Empty button to the S3 console when viewing a bucket. You can click that button and watch the S3 console make API calls on your behalf.
> Here's what it does: It calls a LIST on the bucket, pagination through the objects in the bucket 1000 at a time. It calls a DeleteObjects API method, deleting 1000 at a time.
> The cost is 1 API LIST call per 1000 objects in the bucket. Delete operations are free, so there's no extra cost there.
> you can also get an export of all objects in a bucket using S3 Inventory and run the output through AWS Batch in order to delete those objects
"S3 Batch Operations" sends S3 requests based on a csv file, which can but does not have to be from S3 Inventory. But S3 Batch Operations supports only a subset of APIs and this does not include DeleteObject(s). [0]
An AWS Batch job could run a container which sends DeleteObjects requests but only when triggered by a job queue which seems redundant here.
If I can't use an expiration lifecycle policy because I need a selection of objects not matching a prefix or object tags, I would run something with `s5cmd rm` [1]. Alternatively roll your own golang which parses the CSV and sends many DeleteObjects requests in parallel goroutines.
They have an example of some person almost paying $20k on transition fees. In my early days of AWS, I racked up $90k on S3 transition fees. Thankfully, AWS forgave it.
Stories of forgiven fees are an example of survivorship bias. Developers who rack up thousands in AWS charges by mistake and aren't forgiven probably don't tell too many people about the time they screwed up and cost their company a lot of money.
Would the S3 inventory help here? That would allow you to get the list of all files (albeit on a delay similar to the lifecycle rule approach), which you could process offline to generate the DELETEs.
A common pattern is to dump log files or to use as a dead letter queue for failed event processing. These things typically are arbitrarily named by e.g. a prefix and a unix timestamp
One of the great features of S3 is that it has an arbitrary prefix index on keys! And the list API paginates on batches of 1000 which is useful.
You can retrieve all objects with a given prefix, which is great for storing content-addressed files, and being able to iterate on them. You can also partition on arbitrary prefixes too.
Here's an example. You're doing a big machine learning workflow and you've got a gazillion large files representing training data. You run a large training job in Amazon's cloud somewhere that is composed of one worker that lists the files and delegates work to various learning jobs, and then those jobs each stream in one training file, process it, and then grab the next item and repeat. That's a pretty common type of work.
“After a successful write of a new object, or an overwrite or delete of an existing object, any subsequent read request immediately receives the latest version of the object”
I think that says that deletes are immediately visible, too, but they phrase it weirdly, as, after a delete, there is no latest version of the object.
Also, I don’t think buckets are objects in this sense, so the caveat in the article stands.
[+] [-] CSDude|4 years ago|reply
https://stackoverflow.com/questions/59170391/s3-lifecycle-ex...
[+] [-] treesknees|4 years ago|reply
[+] [-] tyingq|4 years ago|reply
"If you create an S3 Lifecycle expiration rule that causes objects that have been in S3 Standard-IA or S3 One Zone-IA storage for less than 30 days to expire, you are charged for 30 days"
It goes on to say 90 days for Glacier and 180 days for Glacier Deep.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecy...
[+] [-] fideloper|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] helium|4 years ago|reply
I ran into this with a bucket full of EMR log files a few years ago and had to figure out some pretty crazy command line hackiness, plus running on a EC2 machine with lots of cores to figure it out. This a write-up I did if anyone else ever runs into this issue.
https://gist.github.com/michael-erasmus/6a5acddcb56548874ffe...
[+] [-] JackC|4 years ago|reply
When this bit us on a project I made a tool to solve our particular problem, which tars files, writes csv indexes, and can fetch individual files from the tars if need be.[1] Running on millions of files was janky enough that I also ended up scripting an orchestrator to repeatedly attempt each step of the pipeline.[2] Not tested on data other than ours but could be a useful starting point.
[1] https://github.com/harvard-lil/s3mothball [2] https://github.com/harvard-lil/mothball_pipeline
[+] [-] WORMS_EAT_WORMS|4 years ago|reply
AWS is designed to extract dollars from big enterprise contracts.
Also interesting from the article, this poor soul on StackOverflow was trying to figure out how to delete a bucket that would cost him $20,000 [2]. Can’t delete, can’t close.
[1] https://www.reddit.com/r/aws/comments/j5nh4w/ive_deleted_my_...
[2] https://stackoverflow.com/questions/54255990/cheapest-way-to...
[+] [-] bluelu|4 years ago|reply
When we reactivated the account a few yearss later, we were retroactively billed for all the files in the s3 bucket. We got the money back though.
[+] [-] perlpimp|4 years ago|reply
[+] [-] pattyj|4 years ago|reply
Amazon can keep trying to get blood from a stone...
[+] [-] jrochkind1|4 years ago|reply
[+] [-] danlugo92|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] pkorzeniewski|4 years ago|reply
[+] [-] rad_gruchalski|4 years ago|reply
It’s amazing how often it retries.
Example from go sdk: https://github.com/aws/aws-sdk-go/blob/main/service/s3/s3man....
[+] [-] userbinator|4 years ago|reply
[+] [-] tg180|4 years ago|reply
If there are still those who do not use the cloud, it is because the big three have taken advantage of their position a lot.
The pricing of Hetzner, CloudFlare, Linode, OVH, ... seems to be cheaper and more transparent.
[+] [-] Decabytes|4 years ago|reply
I spinned up an AWS instance to practice, and once I was done I thought I closed everything down.
Turns out I had just stopped my micro instances, and I didn't terminate them. I also hadn't released the my IP address. There was also a snapshot of the tiny db I had created still floating around. The documentation was a little confusing, so after I went through it I spent half an hour chatting with a support rep to make sure everything was completely good. After next month my last bill should go through and I should be free and clear. Unfortunately I have to wait for next months bill to go through as I can't just pay it all now.
This was mostly my fault for letting it go on for so long, but I hate how if you don't do some very specific steps you can still be charged. And I think if an account is closed, it should absolutely terminate all services that are still running on that account, and then send you the final bill.
[+] [-] fideloper|4 years ago|reply
[+] [-] roland35|4 years ago|reply
[+] [-] devnull3|4 years ago|reply
I think in practice, S3 data is often indexed using other DBs e.g DynamoDB, Postgres, MySQL etc. Can't this index be used to enumerate all S3 URLs? I am off-course simplifying this a lot.
[+] [-] onion2k|4 years ago|reply
This specific issue probably isn't a very big problem.
The issue of Amazon repeatedly coming up on HN as a service that will bill you when you're not unexpecting it for things that are moderately hard to understand and might refund you later probably costs them tens or even hundreds of millions in lost revenue every year from developers being cautious about deploying things to their services.
[+] [-] Manfred|4 years ago|reply
[+] [-] peanut_worm|4 years ago|reply
[+] [-] certifiedloud|4 years ago|reply
[+] [-] input_sh|4 years ago|reply
In practice, activating that free tier requires a valid card, and I'd highly advise never giving them your own. Whatever alternative you can think of is 100% better for your sanity.
[+] [-] Saris|4 years ago|reply
[+] [-] donatj|4 years ago|reply
Correction: I misread - .5¢ per 1,000,000 items LISTed
Still a little pricey, but way less so than I'd imagined.Do they make a lot of money off of charging for basic operations? It seems like you could make the whole pricing structure a lot more friendly by only charging for bandwidth use. I guess when you're as dominant as S3, you don't need to care about friendly pricing structures.
Charging for basic operations like that is weird, it's akin to a service charging people per number of clicks on a website.
[+] [-] codazoda|4 years ago|reply
[+] [-] SpicyLemonZest|4 years ago|reply
[+] [-] dahart|4 years ago|reply
Note it’s $0.005 per 1k requests, not $0.05 per 1k items -- that’s an extra zero from what you said, and also important to point out that one request can list 1k items. So if you list in 1k batches, it’s $5 per million items listed.
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] NabiDev|4 years ago|reply
source : https://stackoverflow.com/a/67834172
[+] [-] bryan0|4 years ago|reply
[+] [-] tedivm|4 years ago|reply
> Here's what it does: It calls a LIST on the bucket, pagination through the objects in the bucket 1000 at a time. It calls a DeleteObjects API method, deleting 1000 at a time.
> The cost is 1 API LIST call per 1000 objects in the bucket. Delete operations are free, so there's no extra cost there.
source: I read the article.
[+] [-] vdm|4 years ago|reply
"S3 Batch Operations" sends S3 requests based on a csv file, which can but does not have to be from S3 Inventory. But S3 Batch Operations supports only a subset of APIs and this does not include DeleteObject(s). [0]
An AWS Batch job could run a container which sends DeleteObjects requests but only when triggered by a job queue which seems redundant here.
If I can't use an expiration lifecycle policy because I need a selection of objects not matching a prefix or object tags, I would run something with `s5cmd rm` [1]. Alternatively roll your own golang which parses the CSV and sends many DeleteObjects requests in parallel goroutines.
0. https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-...
1. https://github.com/peak/s5cmd#delete-multiple-s3-objects
[+] [-] jugg1es|4 years ago|reply
[+] [-] onion2k|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] mweberxyz|4 years ago|reply
https://docs.aws.amazon.com/AmazonS3/latest/userguide/storag...
[+] [-] malka|4 years ago|reply
I guess one could spam DELETE calls while bruteforcing filenames to make it free.
[+] [-] Eikon|4 years ago|reply
If your use-case is storing random things you don't know the path of, maybe it's the wrong product to use.
[+] [-] shric|4 years ago|reply
[+] [-] afandian|4 years ago|reply
You can retrieve all objects with a given prefix, which is great for storing content-addressed files, and being able to iterate on them. You can also partition on arbitrary prefixes too.
[+] [-] CobrastanJorji|4 years ago|reply
[+] [-] Someone|4 years ago|reply
Nowadays, it (¿almost?) is. https://aws.amazon.com/s3/consistency/:
“After a successful write of a new object, or an overwrite or delete of an existing object, any subsequent read request immediately receives the latest version of the object”
I think that says that deletes are immediately visible, too, but they phrase it weirdly, as, after a delete, there is no latest version of the object.
Also, I don’t think buckets are objects in this sense, so the caveat in the article stands.
[+] [-] Hamuko|4 years ago|reply
I think last time I did this, the wait time was pretty much exactly 60 minutes.
[+] [-] annoyingnoob|4 years ago|reply
[+] [-] marcosdumay|4 years ago|reply