(no title)
kion | 9 months ago
This is partly due to how we've distributed software over the last 40 years. In the 80s the idea of a library of functionality was something you paid for, and painstakingly included parts of into your size constrained environment (fit it on a floppy). You probably picked apart that library and pulled the bits you needed, integrating them into your builds to be as small as possible.
Today we pile libraries on top of libraries on top of libraries. Its super easy to say `import foolib`, then call `foolib.do_thing()` and just start running. Who knows or cares what all 'foolib' contains.
At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.
In some cases the languages make this worse. Go and Rust, for example, encourage everything for a single package/mod to go in the same file. Adding optional functionality can get ugly when it would require creating new modules, but if you only want to use a tiny part of the module, what do you do?
The only real solution I can think of to deal with this long term is ultra-fine-grained symbols and dependencies. Every function, type, and other top-level language construct needs to declare the set of things it needs to run (other functions, symbols, types, etc). When you depend on that one symbol it can construct, on demand, the exact graph of symbols it needs and dump the rest for any given library. You end up with the minimal set of code for the functionality you need.
Its a terrible idea and I'd hate it, but how else do you address the current setup of effectively building the whole universe of code branching from your dependencies and then dragging it around like a boat anchor of dead code.
WuxiFingerHold|9 months ago
Go and C# (.NET) are counterexamples. They both have great ecosystems and just as simple and effective package management as Rust or JS (Node). But neither Go or C# have issues with dependency hell like Rust or even more JavaScript, because they have exceptional std libs and even large frameworks like ASP.NET or EF Core.
A great std lib is obviously the solution. Some Rust defenders are talking it down by giving Python as counter example. But again, Go and C# are proving them wrong. A great std lib is a solution, but one that comes with huge efforts that can only be made by large organisations like Google (Go) or Microsoft (C#).
athrowaway3z|9 months ago
A large stdlib solves the problems the language is focused on. For C# and Go that is web hosts.
Try using them outside that scope and the dependencies start to pile in (Games, Desktop) or they are essentially unused (embedded, phones, wasm)
zahlman|9 months ago
Python's standard library is big. I wouldn't call it great, because Python is over 30 years old and it's hard to add things to a standard library and even harder to remove them.
fiedzia|9 months ago
They also have a lot narrower scope of use, which means it is easier to create stdlib usable for most people. You can't do it with more generic language.
slashdev|9 months ago
iTokio|9 months ago
Compared to go and c#, Rust std lib is mostly lacking:
- a powerful http lib
- serialization
But Rust approach, no Runtime, no GC, no Reflection, is making it very hard to provide those libraries.
Within these constraints, some high quality solutions emerged, Tokio, Serde. But they pioneered some novel approaches which would have been hard to try in the std lib.
The whole async ecosystem still has a beta vibe, giving the feeling of programming in a different language. Procedural macros are often synonymous with slow compile times and code bloat.
But what we gained, is less runtime errors, more efficiency, a more robust language.
TLDR: trade-offs everywhere, it is unfair to compare to Go/C# as they are languages with a different set of constraints.
PoignardAzur|9 months ago
I'm not convinced that happens that often.
As someone working on a Rust library with a fairly heavy dependency tree (Xilem), I've tried a few times to see if we could trim it by tweaking feature flags, and most of the times it turned out that they were downstream of things we needed: Vulkan support, PNG decoding, unicode shaping, etc.
When I did manage to find a superfluous dependency, it was often something small and inconsequential like once_cell. The one exception was serde_json, which we could remove after a small refactor (though we expect most of our users to depend on serde anyway).
We're looking to remove or at least decouple larger dependencies like winit and wgpu, but that requires some major architectural changes, it's not just "remove this runtime option and win 500MB".
nullc|9 months ago
sseagull|9 months ago
Just so you can multiply matrices or something.
bilbo-b-baggins|9 months ago
kion|9 months ago
Part of the issue I have with the dependency bloat is how much effort we currently go through to download, distribute, compile, lint, typecheck, whatever 1000s of lines of code we don't want or need. I want software that allows me to build exactly as much as I need and never have to touch the things I don't want.
nosianu|9 months ago
For example, you have a function calling XML or PDF or JSON output functions depending on some output format parameter. That's three very different paths and includes, but if you don't know which values that parameter can take during runtime you will have to include all three paths, even if in reality only XML (for example) is ever used.
Or there may be higher level causes outside of any analysis, even if you managed a dynamic one. In a GUI, for example, it could be functionality only ever seen by a few with certain roles, but if there is only one app everything will have to be bundled. Similar scenarios are possible with all kinds of software, for example an analysis application that supports various input and output scenarios. It's a variation of the first example where the parameter is internal, but now it is external data not available for an analysis because it will be known only when the software is actually used.
sitkack|9 months ago
It worked great, but it took diligence, it also forces you to interact with your deps in ways that adding a line to a deps file does not.
saagarjha|9 months ago
cmrdporcupine|9 months ago
It feels like torture until you see the benefits, and the opposite ... the tangled mess of multiple versions and giant transitive dependency chains... agony.
I would prefer to work in shops that manage their dependencies this way. It's hard to find.
silon42|9 months ago
Alternatively, for some project it might be enough to only depend on stuff provided by Debian stable or some other LTS distro.
ardit33|9 months ago
Kids today don't know how to do that anymore...
tester756|9 months ago
So, what's is the compiler doing that he doesnt remove unused code?
ak_111|9 months ago
For example you know you will never use one of the main functions in the parsing library with one of the arguments set to "XML", because you know for sure you don't use XML in your domain (for example you have a solid project constraint that says XML is out of scope).
Unfortunately the code dealing with XML in the library is 95% of the code, and you can't tell your compiler I won't need this, I promise never to call that function with argument set to XML.
sph|9 months ago
amiga386|9 months ago
It's effectively an end-run around the linker.
It used to be that you'd create a library by having each function in its own compilation unit, you'd create a ".o" file, then you'd bunch them together in a ".a" archive. When someone else is compiling their code, and they need the do_thing() function, the linker sees it's unfulfiled, and plucks it out of the foolib.a archive. For namespacing you'd probably call the functions foolib_do_thing(), etc.
However, object-orientism with a "god object" is a disease. We go in through a top-level object like "foolib" that holds pointers to all its member functions like do_thing(), do_this(), do_that(), then the only reference the other person's code has is to "foolib"... and then "foolib" brings in everything else in the library.
It's not possible for the linker to know if, for example, foolib needed the reference to do_that() just to initialise its members, and then nobody else ever needed it, so it could be eliminated, or if either foolib or the user's code will somehow need it.
> Go and Rust, for example, encourage everything for a single package/mod to go in the same file.
I can say that, at least for Go, it has excellent dead code elimination. If you don't call it, it's removed. If you even have a const feature_flag = false and have an if feature_flag { foobar() } in the code, it will eliminate foobar().
immibis|9 months ago
It also happens to be an object, but that's just because python is a dynamic language and libraries are objects. The C++ equivalent is foolib::do_thing(); where foolib is not an object.
xlii|9 months ago
Clarification: Go allows for a very simple multi-file. It’s one feature I really like, because it allows splitting otherwise coherent module into logical parts.
dcow|9 months ago
tialaramex|9 months ago
Historically Rust wanted that foo.rs to be renamed foo/mod.rs but that's no longer idiomatic although of course it still works if you do that.
mseepgood|9 months ago
Aurornis|9 months ago
It’s getting hard to take these conversations seriously with all of the hyperbole about things that don’t happen. Nobody is producing Rust binaries that hit 500MB or even 50MB from adding a couple simple dependencies.
You’re also not ending up with mountains of code that never gets called in Rust.
Even if my Rust binaries end up being 10MB instead of 1MB, it doesn’t really matter these days. It’s either going on a server platform where that amount of data is trivial or it’s going into an embedded device where the few extra megabytes aren’t really a big deal relative to all the other content that ends up on devices these days.
For truly space constrained systems there’s no-std and entire, albeit small, separate universe of packages that operate in that space.
For all the doom-saying, in Rust I haven’t encountered this excessive bloat problem some people fret about, even in projects with liberal use of dependencies.
Every time I read these threads I feel like the conversations get hijacked by the people at the intersection of “not invented here” and nostalgia for the good old days. Comments like this that yearn for the days of buying paid libraries and then picking them apart anyway really reinforce that idea. There’s also a lot of the usual disdain for async and even Rust itself throughout this comment section. Meanwhile it feels like there’s an entire other world of Rust developers who have just moved on and get work done, not caring for endless discussions about function coloring or rewriting libraries themselves to shave a few hundred kB off of their binaries.
galangalalgol|9 months ago
unknown|9 months ago
[deleted]
SamuelAdams|9 months ago
https://learn.microsoft.com/en-us/dotnet/core/deploying/trim...
https://learn.microsoft.com/en-us/dotnet/core/deploying/nati...
dathinab|9 months ago
which get reinvented all the time, like in dotnet with "trimming" or in JS with "tree-shaking".
C/C++ compiler have been doing that since before dot net was a thing, same for rust which does that since it's 1.0 release (because it's done by LLVM ;) )
The reason it gets reinvented all the time is because while it's often quite straight forward in statically compiled languages it isn't for dynamic languages as finding out what actually is unused is hard (for fine grained code elimination) or at lest unreliable (pruning submodules). Even worse for scripting languages.
Which also brings use to one area where it's not out of the box, if you build .dll/.so in one build process and then use them in another. Here additional tooling is needed to prune the dynamic linked libraries. But luckily it's not a common problem to run into in Rust.
In general most code size problems in Rust aren't caused by too huge LOC of dependencies but by an overuse of monopolization. The problem of tons of LOC in dependencies is one of supply chain trust and review ability more then anything else.
CBLT|9 months ago
The comment you're replying to is talking about not pulling in dependencies at all, before compiling, if they would not be needed.
dietr1ch|9 months ago
It should be easy to build and deploy profiling-aware builds (PGO/BOLT) and to get good feedback around time/instructions spent per package, as well as a measure of the ratio of each library that's cold or thrown away at build time.
taeric|9 months ago
I'll note that it isn't just PGO/BOLT style optimizations. Largely, it is not that at all, oddly.
Instead, the problem is one of stability. In a "foundation that doesn't move and cause you to fall over" sense of the word. Consider if people made a house where every room had a different substructure under it. That, largely, seems to be the general approach we use to building software. The idea being that you can namespace a room away from other rooms and not have any care on what happens there.
This gets equally frustrating when our metrics for determining the safety of something largely discourages inaction on any dependencies. They have to add to it, or people think it is abandoned and not usable.
Note that this isn't unique to software, mind. Hardware can and does go through massive changes over the years. They have obvious limitations that slow down how rapidly they can change, of course.
throwaway462663|9 months ago
It's a terrible idea because you're trying to reinvent section splitting + `--gc-sections` at link time, which rust (which the article is about) already does by default.
kion|9 months ago
Things like --gc-sections feels like a band-aid, a very practical and useful band-aid, but a band-aid none the less. You're building a bunch of things you don't need, then selectively throwing away parts (or selectively keeping parts).
IMO it all boils down to the granularity. The granularity of text source files, the granularity of units of distribution for libraries. It all contributes to a problem of large unwieldy dependency growth.
I don't have any great solutions here, its just observations of the general problem from the horrifying things that happen when dependencies grow uncontrolled.
jiggawatts|9 months ago
If each layer of “package abstraction” is only 50% utilised, then each layer multiplies the total size by 2x over what is actually required by the end application.
Three layers — packages pulling in packages that pull their own dependencies — already gets you to 88% bloat! (Or just 12% useful code)
An example of this is the new Windows 11 calculator that can take several seconds to start because it loads junk like the Windows 10 Hello for Business account recovery helper library!
Why? Because it has currency conversion, which uses a HTTP library, which has corporate web proxy support, which needs authentication, which needs WH4B account support, which can get locked out, which needs a recovery helper UI…
…in a calculator. That you can’t launch unless you have already logged in successfully and is definitely not the “right place” for account recovery workflows to be kicked off.
But… you see… it’s just easier to package up these things and include them with a single line in the code somewhere.
aeonik|9 months ago
nicoburns|9 months ago
poincaredisk|9 months ago
This is an extreme example, but the same thing happens very often at a smaller scale. Optional functionality can't always be removed statically.
dathinab|9 months ago
it's also not always a fair comparison, if you include tokio in LOC counting then you surely would also include V8 LOC when counting for node, or JRE for Java projects (but not JDK) etc.
samus|9 months ago
kccqzy|9 months ago
foresto|9 months ago
I wouldn't say completely. People still sometimes struggle to get this to work well.
Recent example: (Go Qt bindings)
https://github.com/mappu/miqt/issues/147
kion|9 months ago
The analogy I use is cooking a huge dinner, then throwing out everything but the one side dish you wanted. If you want just the side-dish you should be able to cook just the side-dish.
mkj|9 months ago
ruraljuror|9 months ago
kibwen|9 months ago
What? I don't know about Go, but this certainly isn't true in Rust. Rust has great support for fine-grained imports via Cargo's ability to split up an API via crate features.
KennyBlanken|9 months ago
[deleted]
panstromek|9 months ago
Functions are defined by AST structure and are effectively content addressed. Each function is then keyed by hash in a global registry where you can pull it from for reuse.
zahlman|9 months ago
Or you have ultra-fine-grained modules, and rely on existing tree-shaking systems.... ?
scripturial|9 months ago
Aeolun|9 months ago
That’s literally the JS module system? It’s how we do tree shaking to get those bundle sizes down.
frontfor|9 months ago
thayne|9 months ago
I think that is much more of a problem in ecosystems where it is harder to add dependencies.
When it is difficult to add dependencies, you end up with large libraries that do a lot of stuff you don't need, so you only need to add a couple of dependencies. On the other hand, if dependency management is easy, you end up with a lot of smaller packages that just do one thing.
dev_l1x_be|9 months ago
unknown|9 months ago
[deleted]
rapnie|9 months ago
johannes1234321|9 months ago
akshitgaur2005|9 months ago
whstl|9 months ago
A lot of the bloat comes from functionality that can be activated via flags, methods that set a variable to true, environment variables, or even via configuration files.
whalesalad|9 months ago
But I think the best defense against this problem at the moment is to be extremely defensive/protective of system dependencies. You need to not import that random library that has a 10 line function. You need to just copy that function into your codebase. Don’t just slap random tools together. Developing libraries in a maintainable and forward seeking manner is the exception not the rule. Some ecosystems exceed here, but most fail. Ruby and JS is probably one of the worst. Try upgrading a Rails 4 app to modern tooling.
So… be extremely protective of your dependencies. Very easy to accrue tech debt with a simple library installation. Libraries use libraries. It becomes a compounding problem fast.
Junior engineers seem to add packages to our core repo with reckless abandon and I have to immediately come in and ask why was this needed? Do you really want to break prod some day because you needed a way to print a list of objects as a table in your cli for dev?
zozbot234|9 months ago
If anything, the 1980s is when the idea of fully reusable, separately-developed software components first became practical, with Objective-C and the like. In fact it's a significant success story of Rust that this sort of pervasive software componentry has now been widely adopted as part of a systems programming language.
mysterymath|9 months ago
spullara|9 months ago
never_inline|9 months ago
The more pressing issue with dependencies is supply chain risks including security. That's why larger organizations have approval processes for using anything open source. Unfortunately the new crop of open source projects in JS and even Go seem to suffer from "IDGAF about what shit code from internet I am pulling" syndrome.
Unfortunately granularity does not solve that as long as your 1000 functions come from 1000 authors on NPM.
unknown|9 months ago
[deleted]
KennyBlanken|9 months ago
andrepd|9 months ago
hedora|9 months ago