(no title)
fendale | 4 years ago
This is the wider issue with small files. On HDFS each file uses up some namenode memory, but if there are jobs that need to touch 100k+ files (which I have seen plenty of), that puts a real strain on the Namenode too.
I have no experience with S3 to know how it would behave in terms of metadata queries for lots of small objects.
dikei|4 years ago