top | item 46515061

(no title)

jshchnz | 1 month ago

Emerge Tools has an old thread on why it's actually so big: https://x.com/emergetools/status/1810790280922288617

discuss

order

simonw|1 month ago

Thanks, that thread is great!

They have a neat treemap breakdown here: https://www.emergetools.com/app/example/ios/com.google.Gmail

130MB is localization data.

This detail was interesting too: https://twitter.com/emergetools/status/1810790291714314706

> There's over 20k files in the app, 17k of which are under 4 kB. In iOS, the minimum file size allocation is 4 kB, so having many small files causes unnecessary size bloat. Gmail could save 56.4 MB by moving their small files to an Asset catalog

trevor-e|1 month ago

Yep, localization is a huge size bloat for enterprisey apps that support many locales. There is no Apple provided way to dynamically download select localization packs based on the device locale. Meta came up with their own solution: https://engineering.fb.com/2022/05/09/android/language-packs...

The small filesize issue is something we commonly see in games, was surprised to see it for Gmail.

And btw we open-sourced much of our analysis after being acquired by Sentry: https://github.com/getsentry/launchpad

crazygringo|1 month ago

130 MB for localization? At 50 languages that would be 2.6 MB/language. If we assume an average 50 bytes per string and another 50 for an identifier, that's 27,000 strings.

That doesn't seem right. Localization feels like it should add a few MB. Not over 100. (Plus shouldn't it be compressed, and locally uncompressed the first time a language gets used?)

lynndotpy|1 month ago

4kB is also the minimum file size on Linux, so I imagine a similar issue could exist on Android.

tonyplee|1 month ago

Wonder if it is better to create separate localized app download such as gmail-japanese, etc.

thefilmore|1 month ago

Author here. Thanks for sharing this. It seems they released an updated version of this analysis last year [1]. It matches what I saw when analyzing the IPA. I tried to do a deeper analysis on the code itself using several tools, including Google's own bloaty [2] which was not very useful without symbols, classdumpios [3] which revealed something like 50k interfaces starting with "ComGoogle", and Ghidra [4], which I left running for a day to analyze the binary, but kept hanging and freezing so I gave up on it. Perhaps comparing the Android and iOS code could lead to something more fruitful.

[1] https://x.com/emergetools/status/1943060976464728250

[2] https://github.com/google/bloaty

[3] https://github.com/lechium/classdumpios

[4] https://github.com/NationalSecurityAgency/ghidra

jonny_eh|1 month ago

Looks like it's mostly strings, probably due to localization. They should consider compressing each localization/language, and decompressing the needed bundle on first startup (or language change). Even better: Download the language bundle when needed.

Pxtl|1 month ago

Well, that's a question for OS level. If the OS doesn't require the user to download the language and so language-switching to a new language is doable as an offline operation, I could see it being frustrating that switching to a new language must be done online.

So compression/deduplication is probably the better option. Rather than storing as 1 zip per language, though, you'd probably want a compression format that also eliminates duplication that may occur between languages if you're storing all languages compressed on the system. That means you'd need compression to handle the entire language complex being in one massive compressed blob and you'd just extract out the languages you needed. I assume there are some forms of zipping that do this better than others.

giancarlostoro|1 month ago

So is the extra space not accounted for from then to now AI related pieces?