Here [1] is the "dep ensure -v" output for a project of mine. It takes almost 12 seconds even when there are no changes to the actual file. I don't know why, or whether it's actually the solver (though the output seems to indicate it).
As you can see from the output, it is not the solver per-se, but the weird idiosyncrasies of go imports and gopath layout. 'satisfy' and 'select-atom' and such are the solver bit and take about 20ms all together. A SAT solver is 20ms, MVS might be 1ms, but who cares about that difference, right?
The top 3 items there are slow because they're:
1. 'source-exists' (~6s) which will do network traffic to find if a project exists to be downloaded or is in the cache; it's network io heavy in most cases.
2. list-packages (~3s) which parses the downloaded source code for import statements to find further dependencies; disk-io heavy + go loader has to do some work
3. gmal - GetManifestAndLock (~2s) which looks for lock files, including of other dependency solvers; disk io mostly I think
Any system designed with the constraint that it cannot use a centralized registry / list, must be compatible with things not using this system (and so must parse their code), etc will have these problems regardless of the algorithm.
Those steps are all doing network/disk-io/go-parsing, and none of that is SAT solving.
I don't think vgo has these problems because vgo is built by the go team and can dictate far more, such as the use of a centralized repo, that all dependencies must use vgo, etc.
The fact that dep parses import statements (as does Glide) is something I've never liked. It means that if you run "dep ensure --add" on something not yet imported, it will complain, and the next "ensure" will remove it. This is never in line with how I actually work. I need the dependencies before I can import them! There's no editor/IDE in existence that lets you autocomplete libraries that haven't been installed yet.
It also means that "dep ensure" parses my code to discover things not yet added to Gopkg.toml. That's upside down to me. I want it to parse its lockfile and nothing else; the lockfile is what should inform its decisions about what to install so that my code works, my code shouldn't be driving the lockfile! If I try to compile my code and it imports stuff that isn't in the lockfile, it should fail, and dep shouldn't try to "repair" itself with stuff that I didn't list as an explicit dependency.
I'm sure there are edge cases where the current behaviour can be considered rational, but I don't know what they are. As you point out, dep has to do a lot of work -- but why? Running "dep ensure" when the vendor directory is in perfect sync with the lockfile should take no time at all, and certainly shouldn't need to access the network. Yet it takes the same amount of time with or without a lockfile.
Small note, this isn’t something that you’ve said, but since we’re comparing the two in this sub thread overall, Cargo doesn’t require a central registry either. You can pull straight from version control, and the lock file will even keep track of what HEAD is at the time, maintaining reproducibility. Or from the file system. Etc.
Thanks for your comments here, there’s a lot of stuff I wasn’t aware of. Very illuminating.
That's quite weird. When I run `rm Cargo.lock && cargo generate-lockfile` on the Servo repo (test performed on the cheapest VPS that money can buy) it exits near-instantly (after first spending three seconds trying to git-fetch new versions of the dozen custom dependencies that live on Github rather than crates.io). For reference, here's what Servo's dependency graph looked like two years ago (July 2016): https://dirkjan.ochtman.nl/files/servo-graph.svg ; the number of transitive dependencies is quite large and yet the runtime of version selection is negligible.
Because third party go packages may not have a dep file, and because go programmers expect vendor directories to be minimal and not include unused imports, dep parses all of the go code of the project, and all the project's transitive dependencies.
It has to parse every .go file to find all 'import' statements, and it also has to find remote versions by making multiple network requests per dependency (typically 1 http-get + 1 git pull operation).
This is obviously going to be much slower than cargo where it's assumed every dependency is also using cargo and all needed information is present in metadata files... and there's one single fast api to download data from and cache (crates.io).
If cargo had to do the equivalent of `cargo check`-style parsing to find all 'extern crate' and 'use' statements before it could spit out a valid lock, and it couldn't use only 1 request to update all crates.io data, it would probably be closer to the speed of dep.
I think the speed difference is thus largely a result of go's lack of a central repository and lack of a unified packaging solution.
TheDong|7 years ago
The top 3 items there are slow because they're:
1. 'source-exists' (~6s) which will do network traffic to find if a project exists to be downloaded or is in the cache; it's network io heavy in most cases.
2. list-packages (~3s) which parses the downloaded source code for import statements to find further dependencies; disk-io heavy + go loader has to do some work
3. gmal - GetManifestAndLock (~2s) which looks for lock files, including of other dependency solvers; disk io mostly I think
Any system designed with the constraint that it cannot use a centralized registry / list, must be compatible with things not using this system (and so must parse their code), etc will have these problems regardless of the algorithm.
Those steps are all doing network/disk-io/go-parsing, and none of that is SAT solving.
I don't think vgo has these problems because vgo is built by the go team and can dictate far more, such as the use of a centralized repo, that all dependencies must use vgo, etc.
lobster_johnson|7 years ago
The fact that dep parses import statements (as does Glide) is something I've never liked. It means that if you run "dep ensure --add" on something not yet imported, it will complain, and the next "ensure" will remove it. This is never in line with how I actually work. I need the dependencies before I can import them! There's no editor/IDE in existence that lets you autocomplete libraries that haven't been installed yet.
It also means that "dep ensure" parses my code to discover things not yet added to Gopkg.toml. That's upside down to me. I want it to parse its lockfile and nothing else; the lockfile is what should inform its decisions about what to install so that my code works, my code shouldn't be driving the lockfile! If I try to compile my code and it imports stuff that isn't in the lockfile, it should fail, and dep shouldn't try to "repair" itself with stuff that I didn't list as an explicit dependency.
I'm sure there are edge cases where the current behaviour can be considered rational, but I don't know what they are. As you point out, dep has to do a lot of work -- but why? Running "dep ensure" when the vendor directory is in perfect sync with the lockfile should take no time at all, and certainly shouldn't need to access the network. Yet it takes the same amount of time with or without a lockfile.
steveklabnik|7 years ago
Thanks for your comments here, there’s a lot of stuff I wasn’t aware of. Very illuminating.
kibwen|7 years ago
TheDong|7 years ago
Because third party go packages may not have a dep file, and because go programmers expect vendor directories to be minimal and not include unused imports, dep parses all of the go code of the project, and all the project's transitive dependencies.
It has to parse every .go file to find all 'import' statements, and it also has to find remote versions by making multiple network requests per dependency (typically 1 http-get + 1 git pull operation).
This is obviously going to be much slower than cargo where it's assumed every dependency is also using cargo and all needed information is present in metadata files... and there's one single fast api to download data from and cache (crates.io).
If cargo had to do the equivalent of `cargo check`-style parsing to find all 'extern crate' and 'use' statements before it could spit out a valid lock, and it couldn't use only 1 request to update all crates.io data, it would probably be closer to the speed of dep.
I think the speed difference is thus largely a result of go's lack of a central repository and lack of a unified packaging solution.