top | item 14286383

I Broke Rust's Package Manager for Windows Users

351 points| sasheldon | 9 years ago |sasheldon.com | reply

162 comments

order
[+] bluejekyll|9 years ago|reply
This is a great example of something else about software. As software grows in usage and use cases, it starts bumping up against edge conditions which need to be handled for various reasons.

Cargo now is becoming stronger and more stable because of bugs like this being discovered. All software goes through this growth cycle. It's great to see these things worked out in the various projects that support Rust.

There is another point here though; anytime the question comes up to just rewrite a piece of software, throw out all the technical debt, it's not as straightforward as it seems. Remember, together with that technical debt lies a lot of valuable learnings written into the code. I haven't worked on Windows directly in years, but I never knew that NUL was a reserved word as a file. I would, and probably still will make this mistake in the future.

Which makes me wonder, has anyone written a file name validation crate that guarantees that you're not writing to any reserved words on a filesystem of the host OS? A quick search of crate.io doesn't turn anything up.

[+] curun1r|9 years ago|reply
It also shows how necessary it is to have some sort of deprecation process. Maintaining nonsensical landmine features for compatibility with an operating system released 36 years ago is putting the interests of MS's lazy long-term users ahead of the interests of its current users. Even if MS maintained a policy of only removing functionality after a 10-year deprecation period, this "feature" would have been gone long ago. Transitions must be orderly, but they should still happen.

It's nice that Rust's toolchain is better able to live Windows crazy ecosystem, but that doesn't make Windows any less crazy.

[+] akira2501|9 years ago|reply
> I haven't worked on Windows directly in years, but I never knew that NUL was a reserved word as a file.

It's not. It's a reserved word through the MS-DOS file redirection facilities. If you use the newer file API or you use the \\?\[path] convention; the reserved words are not an issue and you can create files named for them.

[+] pjc50|9 years ago|reply
> I never knew that NUL was a reserved word as a file. I would, and probably still will make this mistake in the future

While we're here: NUL, COM<n>, LPT<n> and AUX are reserved.

[+] Macha|9 years ago|reply
I've done con.py on a Linux system a few times for net code in different projects and then realised I couldn't clone it on windows. It comes up infrequently enough that you can forget
[+] garaetjjte|9 years ago|reply
Other magic aliases include CON, PRN, AUX, COM1-9 and LPT1-9. They are aliased to respective devices in Win32 namespace "\\.\". COMs and LPTs above 9 don't have aliases in global namespace and must be accessed explictly in Win32 namespace, eg. "\\.\COM10" (which itself is symlink to NT native "\Device\Serial9")

In fact, it is possible to create files named NUL, COM1, etc. using \\?\ (eg. "\\?\C:\NUL" is valid path) prefix which disables parsing arcane Win32 magic files. Unfortunately these files are causing strange behaviour in applications that don't use that prefix, Explorer included.

source: https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...

[+] monochromatic|9 years ago|reply
I still remember using "copy con foo.txt" and ending with ctrl-z to quickly create a file. It was years before I understood how that actually worked.
[+] tatterdemalion|9 years ago|reply
As the blog post mentioned, we solve the issue by deleting the crate from the package repository and reserving these problematic names. The incident lasted about 2 and a half hours.

Crate names have to be one or more valid idents connected by hyphens, so no other clever names like `/home` would be possible to upload. We already had some crate names reserved and we just needed to add these to the list.

[+] kibwen|9 years ago|reply
> The incident lasted about 2 and a half hours.

And because it was a weekend, much of that time involved me trying to figure out who had the proper credentials for crates.io, and then texting those people until one of them responded. :)

[+] derefr|9 years ago|reply
Reserving just the crate names won't cover your bases, though, no? I'm not clear on what exists as part of a crate—but if there's any user control over the filenames of the contents of the crate (e.g. if the crate's source code is in there) then any crate might contain a file named e.g. "nul.rs", triggering the same problem.
[+] slobotron|9 years ago|reply
There was a bug in Windows 95 (98 too?) where if you tried to open 'nul\nul' or 'con\con' etc, it would BSOD instantly. Provided lots of drive-by fun in computer labs... (got really good at typing win+r con\con)
[+] tonyarkles|9 years ago|reply
For more fun, you could also target other machines with SMB shares. \\thevictim\foo\nul\nul would BSOD that machine. Good times.
[+] johnfn|9 years ago|reply
This is like the 90s era "undefined is not a function."

null is not a problem, but null.null, on the other hand...

[+] captn3m0|9 years ago|reply
What I don't understand is why cargo fetches the entire crate list and create a directory for every crate (even if you never install it). Why not just have a single file with the entire list? The issue mentions they use a trie, but why use the filesystem as the trie store? Why not have a single file?
[+] kibwen|9 years ago|reply
The original authors of cargo, wycats and carllerche, aren't around today to ask (it's a weekend!) though IRC attempted to answer regardless:

  <foo> to keep the number of files in a single directory down
  <foo> tools become unhappy with hundreds of thousands+ of things in a single dir
  <foo> as do filesytems
  <bar> why not just a flat file
  <bar> or sqlite or whatever
  <qux> right now it uses git's deduplication feature
  <qux> aka, when downloading updates you only download the objects that changed
  <qux> but it mostly works on a per file basis
  <qux> so git hashes each file and if the hash didnt change, it doesnt download an update
  <qux> but if it did, it treats it as completely new file, even if its just a little change
[+] tatterdemalion|9 years ago|reply
There is a very long comment in the cargo source which explains this decision and some of the trade offs involved. https://github.com/rust-lang/cargo/blob/master/src/cargo/sou...

I know that some of the people who worked on cargo originally had experience with other package managers - mainly bundler - and I believe bundler used to use a single file, but ran into performance issues.

[+] nerdponx|9 years ago|reply
AFAIK this is how the BSD Ports system works too.
[+] ddevault|9 years ago|reply
Sounds more like a problem with stupid Windows design choices than with anything you did.
[+] dagw|9 years ago|reply
Windows is, for better or worse, fiercely proud of its backwards compatibility. So it's not so much a stupid Windows design choice as a 'stupid' DOS 1.0 design choice (and not even so much a choice as simply a quirk of how the DOS 1.0 file system worked) that Windows doesn't want to break backwards comparability with.
[+] alkonaut|9 years ago|reply
Because there aren't any of those in posix...
[+] poizan42|9 years ago|reply
People could also stop using CreateFile without a \\? prefix and all the problems would go away. There's not even a MAX_PATH limit on any NT based Windows version if you do that.
[+] brianberns|9 years ago|reply
To me, it sounds more like a problem with Rust that a single misnamed package could bring down the whole system. It's essentially a SQL injection attack (without the SQL).
[+] codys|9 years ago|reply
Yep, just not allowing (directly) user controlled file names seems ideal. Maybe just hash the crate names and use the hash as a file name? No more silly restrictions due to platforms. Eliminates issues with some file systems having a length limit too.
[+] pvg|9 years ago|reply
In the Mac System 7-ish days, people used to earnestly warn each other not to name a file '.Sony' (a special name reserved for the floppy driver) as it supposedly trashed your HD. Although I've never heard of anyone reproducing it.
[+] Xylakant|9 years ago|reply
Trying to name a folder "trash" led to the error message "The name 'trash' is reserved for the operating system."
[+] yrashk|9 years ago|reply
Wouldn't it make sense for Cargo not to use crate names in file names, and use hexadecimally encoded hashes instead?
[+] brianberns|9 years ago|reply
Yes, or some other identifier that's unique to that crate. Assuming that the crate name is also a valid file name seems risky.
[+] nomercy400|9 years ago|reply
Or have a prefix. Once cost me exam points trying to optimize a prefix away. The examinators were right.
[+] wwalexander|9 years ago|reply
You could just hex encode the crate name. No need to hash it too.
[+] ziikutv|9 years ago|reply
What did you end up calling the new crate?

Edit: I suggest "terminated"

[+] sasheldon|9 years ago|reply
I haven't published it again because I hadn't thought of a name (and nothing published is using it, so no urgent need).

I like terminated! Good suggestion :D

[+] Strilanc|9 years ago|reply
Urgh, this "nul" filename / reserved filename bug is probably in a lot of software.
[+] roryisok|9 years ago|reply
I was working on a video project for a local comics convention, and named the project file "con.proj". That file hung around until I upgraded my hard drive because no file manager could delete it.
[+] alkonaut|9 years ago|reply
It's very tricky to do cross platform file handling stuff, and only the most mature projects have ironed out this. Just look at your pet project and see if it handles

- Windows and unix line breaks in text files

- Windows and unix path separators

- BOM and non-BOM marked files if parsing UTF

- Forbidden filenames such as in this article

By "handling" I mean it should accept or fail nicely on unexpected input - e.g. say that line breaks should be unix style, or paths should be backslashes etc. Very few projects actually do this well. Even fewer will do even more complex things like handling too long paths with nice error messages etc.

[+] encryptThrow32|9 years ago|reply
I recommend ':?', as it will work in POSIX, but not Windows.
[+] msimpson|9 years ago|reply
While I know nothing of Rust, Diesel, or CrateDB, I do know that Windows uses a case-insensitive file system and this fix doesn't seem to take that into consideration. However, the author of the fix does note:

> I believe crates.io's namespace is case insensitive let me know if that's wrong

Someone should probably validate that.

[+] toabi|9 years ago|reply
I tried `npm install nul` on my win7 VM and it created a folder called nul which I can't get rid of ¬_¬
[+] hmottestad|9 years ago|reply
Makes me wonder if I can make a crate called "../.." and have it overwrite some user files.
[+] kibwen|9 years ago|reply
Crate identifiers are required to be [a-zA-Z] for the first character and [a-zA-Z0-9_-] for the rest. So, no. :P
[+] callumjones|9 years ago|reply
It looks like the files were managed by Git (see the output where the checkout errors out), so no that won't work.
[+] lsiebert|9 years ago|reply
Hmm... I feel like someone should stick the reserved names into a json somewhere for easy reference for the next package manager.
[+] joshu|9 years ago|reply
Remember not to name anything pr#6 for either...
[+] HedleyLamar|9 years ago|reply
Don't they have a continuous integration system where they run the unit tests on all platforms for all checkins to master?