top | item 45945551

(no title)

bcardarella | 3 months ago

Just a small comparison, compiled for release:

Boa: 23M Brimstone: 6.3M

I don't know if closing the gap on features with Boa and hardening for production use will also bloat the compilation size. Regardless, for passing 97% of the spec at this size is pretty impressive.

discuss

order

jerf|3 months ago

It looks like Boa has Unicode tables compiled inside of itself: https://github.com/boa-dev/boa/tree/main/core/icu_provider

Brimstone does not appear to.

That covers the vast bulk of the difference. The ICU data is about 10.7MB in the source (boa/core/icu_provider) and may grow or shrink by some amount in the compiling.

I'm not saying it's all the difference, just the bulk.

There's a few reasons why svelte little executables with small library backings aren't possible anymore, and it isn't just ambient undefined "bloat". Unicode is a big one. Correct handling of unicode involves megabytes of tables and data that have to live somewhere, whether it's a linked library, compiled in, tables on disks, whatever. If a program touches text and it needs to handle it correctly rather than just passing it through, there's a minimum size for that now.

ambicapter|3 months ago

Unicode is everywhere though. You'd think there'd be much greater availability of those tables and data and that people wouldn't need to bundle it in their executables.

rixed|3 months ago

I was currious to see what that data consisted of and aparently that's a lot of translations, like the name of all possible calendar formats in all possible languages, etc. This seems useless in the vast majority of use cases, including that of a JS interpreter. Looks to me like the typical output of a comitee that's looking too hard to extend its domain.

Disclaimer: I never liked unicode specs.

miki123211|3 months ago

I just wish we could use system tables for that, instead of bloating every executable with their own outdated copy.

I have no issue with my system using an extra 10mb for Ancient Egyptian capitalization to work correctly. Every single program including those rules is a lot more wasteful.

jancsika|3 months ago

If someone builds, say, a Korean website and needs sort(), does the ICU monolith handle 100% of the common cases?

(Or substitute for Korean the language that has the largest amount of "stuff" in the ICU monolith.)

twoodfin|3 months ago

As well-defined as Unicode is, surprising that no one has tried to replace ICU with a better mousetrap.

Not to say ICU isn’t a nice bit of engineering. The table builds in particular I recall having some great hacks.

martin-t|3 months ago

I was gonna say the last few percent might increase the size disproportionally as the last percent tend to do[0] but looks like boa passes fewer tests (~91%).

This is something I notice in small few-person or one-person projects. They don't have the resources to build complex architectures so the code ends up smaller, cleaner and easier to maintain.

The other way to look at it is that cooperation has an overhead.

[0]: The famous 80:20 rule. Or another claiming that each additional 9 in reliability (and presumably other aspects) takes the same amount of work.

embedding-shape|3 months ago

Is that with any other size optimizations? I think by default, most of them (like codegen-units=1, remove panic handling, etc) are tuned for performance, not binary size, so might want to look into if the results are different if you change them.

LtdJorge|3 months ago

Stripping can save a huge amount of binary size, there’s lots of formatting code added for println! and family, stacktrace printing, etc. However, you lose those niceties if stripping at that level.

bcardarella|3 months ago

I only ran both with `cargo build --release`