This is a great example of something else about software. As software grows in usage and use cases, it starts bumping up against edge conditions which need to be handled for various reasons.
Cargo now is becoming stronger and more stable because of bugs like this being discovered. All software goes through this growth cycle. It's great to see these things worked out in the various projects that support Rust.
There is another point here though; anytime the question comes up to just rewrite a piece of software, throw out all the technical debt, it's not as straightforward as it seems. Remember, together with that technical debt lies a lot of valuable learnings written into the code. I haven't worked on Windows directly in years, but I never knew that NUL was a reserved word as a file. I would, and probably still will make this mistake in the future.
Which makes me wonder, has anyone written a file name validation crate that guarantees that you're not writing to any reserved words on a filesystem of the host OS? A quick search of crate.io doesn't turn anything up.
It also shows how necessary it is to have some sort of deprecation process. Maintaining nonsensical landmine features for compatibility with an operating system released 36 years ago is putting the interests of MS's lazy long-term users ahead of the interests of its current users. Even if MS maintained a policy of only removing functionality after a 10-year deprecation period, this "feature" would have been gone long ago. Transitions must be orderly, but they should still happen.
It's nice that Rust's toolchain is better able to live Windows crazy ecosystem, but that doesn't make Windows any less crazy.
> I haven't worked on Windows directly in years, but I never knew that NUL was a reserved word as a file.
It's not. It's a reserved word through the MS-DOS file redirection facilities. If you use the newer file API or you use the \\?\[path] convention; the reserved words are not an issue and you can create files named for them.
I've done con.py on a Linux system a few times for net code in different projects and then realised I couldn't clone it on windows. It comes up infrequently enough that you can forget
Other magic aliases include CON, PRN, AUX, COM1-9 and LPT1-9. They are aliased to respective devices in Win32 namespace "\\.\". COMs and LPTs above 9 don't have aliases in global namespace and must be accessed explictly in Win32 namespace, eg. "\\.\COM10" (which itself is symlink to NT native "\Device\Serial9")
In fact, it is possible to create files named NUL, COM1, etc. using \\?\ (eg. "\\?\C:\NUL" is valid path) prefix which disables parsing arcane Win32 magic files. Unfortunately these files are causing strange behaviour in applications that don't use that prefix, Explorer included.
As the blog post mentioned, we solve the issue by deleting the crate from the package repository and reserving these problematic names. The incident lasted about 2 and a half hours.
Crate names have to be one or more valid idents connected by hyphens, so no other clever names like `/home` would be possible to upload. We already had some crate names reserved and we just needed to add these to the list.
And because it was a weekend, much of that time involved me trying to figure out who had the proper credentials for crates.io, and then texting those people until one of them responded. :)
Reserving just the crate names won't cover your bases, though, no? I'm not clear on what exists as part of a crate—but if there's any user control over the filenames of the contents of the crate (e.g. if the crate's source code is in there) then any crate might contain a file named e.g. "nul.rs", triggering the same problem.
There was a bug in Windows 95 (98 too?) where if you tried to open 'nul\nul' or 'con\con' etc, it would BSOD instantly.
Provided lots of drive-by fun in computer labs... (got really good at typing win+r con\con)
What I don't understand is why cargo fetches the entire crate list and create a directory for every crate (even if you never install it). Why not just have a single file with the entire list? The issue mentions they use a trie, but why use the filesystem as the trie store? Why not have a single file?
The original authors of cargo, wycats and carllerche, aren't around today to ask (it's a weekend!) though IRC attempted to answer regardless:
<foo> to keep the number of files in a single directory down
<foo> tools become unhappy with hundreds of thousands+ of things in a single dir
<foo> as do filesytems
<bar> why not just a flat file
<bar> or sqlite or whatever
<qux> right now it uses git's deduplication feature
<qux> aka, when downloading updates you only download the objects that changed
<qux> but it mostly works on a per file basis
<qux> so git hashes each file and if the hash didnt change, it doesnt download an update
<qux> but if it did, it treats it as completely new file, even if its just a little change
I know that some of the people who worked on cargo originally had experience with other package managers - mainly bundler - and I believe bundler used to use a single file, but ran into performance issues.
Windows is, for better or worse, fiercely proud of its backwards compatibility. So it's not so much a stupid Windows design choice as a 'stupid' DOS 1.0 design choice (and not even so much a choice as simply a quirk of how the DOS 1.0 file system worked) that Windows doesn't want to break backwards comparability with.
People could also stop using CreateFile without a \\? prefix and all the problems would go away. There's not even a MAX_PATH limit on any NT based Windows version if you do that.
To me, it sounds more like a problem with Rust that a single misnamed package could bring down the whole system. It's essentially a SQL injection attack (without the SQL).
Yep, just not allowing (directly) user controlled file names seems ideal. Maybe just hash the crate names and use the hash as a file name? No more silly restrictions due to platforms. Eliminates issues with some file systems having a length limit too.
In the Mac System 7-ish days, people used to earnestly warn each other not to name a file '.Sony' (a special name reserved for the floppy driver) as it supposedly trashed your HD. Although I've never heard of anyone reproducing it.
I was working on a video project for a local comics convention, and named the project file "con.proj". That file hung around until I upgraded my hard drive because no file manager could delete it.
It's very tricky to do cross platform file handling stuff, and only the most mature projects have ironed out this. Just look at your pet project and see if it handles
- Windows and unix line breaks in text files
- Windows and unix path separators
- BOM and non-BOM marked files if parsing UTF
- Forbidden filenames such as in this article
By "handling" I mean it should accept or fail nicely on unexpected input - e.g. say that line breaks should be unix style, or paths should be backslashes etc. Very few projects actually do this well. Even fewer will do even more complex things like handling too long paths with nice error messages etc.
Here is a character encoding issue that I ran into about a year ago.
git does not support UTF-16LE[0]. The result is that UTF-16LE encoded files will be mangled[1] by the line ending conversion. There is at least one generated Visual Studio file (GlobalSuppressions.cs) that is saved in UTF-16 by default.
While I know nothing of Rust, Diesel, or CrateDB, I do know that Windows uses a case-insensitive file system and this fix doesn't seem to take that into consideration. However, the author of the fix does note:
> I believe crates.io's namespace is case insensitive let me know if that's wrong
[+] [-] bluejekyll|9 years ago|reply
Cargo now is becoming stronger and more stable because of bugs like this being discovered. All software goes through this growth cycle. It's great to see these things worked out in the various projects that support Rust.
There is another point here though; anytime the question comes up to just rewrite a piece of software, throw out all the technical debt, it's not as straightforward as it seems. Remember, together with that technical debt lies a lot of valuable learnings written into the code. I haven't worked on Windows directly in years, but I never knew that NUL was a reserved word as a file. I would, and probably still will make this mistake in the future.
Which makes me wonder, has anyone written a file name validation crate that guarantees that you're not writing to any reserved words on a filesystem of the host OS? A quick search of crate.io doesn't turn anything up.
[+] [-] curun1r|9 years ago|reply
It's nice that Rust's toolchain is better able to live Windows crazy ecosystem, but that doesn't make Windows any less crazy.
[+] [-] akira2501|9 years ago|reply
It's not. It's a reserved word through the MS-DOS file redirection facilities. If you use the newer file API or you use the \\?\[path] convention; the reserved words are not an issue and you can create files named for them.
[+] [-] pjc50|9 years ago|reply
While we're here: NUL, COM<n>, LPT<n> and AUX are reserved.
[+] [-] Macha|9 years ago|reply
[+] [-] Vinnl|9 years ago|reply
[+] [-] garaetjjte|9 years ago|reply
In fact, it is possible to create files named NUL, COM1, etc. using \\?\ (eg. "\\?\C:\NUL" is valid path) prefix which disables parsing arcane Win32 magic files. Unfortunately these files are causing strange behaviour in applications that don't use that prefix, Explorer included.
source: https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...
[+] [-] monochromatic|9 years ago|reply
[+] [-] tatterdemalion|9 years ago|reply
Crate names have to be one or more valid idents connected by hyphens, so no other clever names like `/home` would be possible to upload. We already had some crate names reserved and we just needed to add these to the list.
[+] [-] kibwen|9 years ago|reply
And because it was a weekend, much of that time involved me trying to figure out who had the proper credentials for crates.io, and then texting those people until one of them responded. :)
[+] [-] derefr|9 years ago|reply
[+] [-] slobotron|9 years ago|reply
[+] [-] tonyarkles|9 years ago|reply
[+] [-] johnfn|9 years ago|reply
null is not a problem, but null.null, on the other hand...
[+] [-] protomyth|9 years ago|reply
[+] [-] Kenji|9 years ago|reply
[+] [-] captn3m0|9 years ago|reply
[+] [-] kibwen|9 years ago|reply
[+] [-] tatterdemalion|9 years ago|reply
I know that some of the people who worked on cargo originally had experience with other package managers - mainly bundler - and I believe bundler used to use a single file, but ran into performance issues.
[+] [-] nerdponx|9 years ago|reply
[+] [-] ddevault|9 years ago|reply
[+] [-] dagw|9 years ago|reply
[+] [-] alkonaut|9 years ago|reply
[+] [-] poizan42|9 years ago|reply
[+] [-] brianberns|9 years ago|reply
[+] [-] codys|9 years ago|reply
[+] [-] pvg|9 years ago|reply
[+] [-] Xylakant|9 years ago|reply
[+] [-] db48x|9 years ago|reply
I'd try it myself, but I've only got my phone with me.
[+] [-] yrashk|9 years ago|reply
[+] [-] brianberns|9 years ago|reply
[+] [-] nomercy400|9 years ago|reply
[+] [-] wwalexander|9 years ago|reply
[+] [-] ziikutv|9 years ago|reply
Edit: I suggest "terminated"
[+] [-] sasheldon|9 years ago|reply
I like terminated! Good suggestion :D
[+] [-] filleokus|9 years ago|reply
I can't find it on crates.io though.
[+] [-] Strilanc|9 years ago|reply
[+] [-] roryisok|9 years ago|reply
[+] [-] alkonaut|9 years ago|reply
- Windows and unix line breaks in text files
- Windows and unix path separators
- BOM and non-BOM marked files if parsing UTF
- Forbidden filenames such as in this article
By "handling" I mean it should accept or fail nicely on unexpected input - e.g. say that line breaks should be unix style, or paths should be backslashes etc. Very few projects actually do this well. Even fewer will do even more complex things like handling too long paths with nice error messages etc.
[+] [-] dbremner|9 years ago|reply
git does not support UTF-16LE[0]. The result is that UTF-16LE encoded files will be mangled[1] by the line ending conversion. There is at least one generated Visual Studio file (GlobalSuppressions.cs) that is saved in UTF-16 by default.
[0] https://github.com/libgit2/libgit2/issues/1009
[1] https://github.com/Microsoft/Windows-classic-samples/issues/...
[+] [-] encryptThrow32|9 years ago|reply
[+] [-] msimpson|9 years ago|reply
> I believe crates.io's namespace is case insensitive let me know if that's wrong
Someone should probably validate that.
[+] [-] toabi|9 years ago|reply
[+] [-] hmottestad|9 years ago|reply
[+] [-] kibwen|9 years ago|reply
[+] [-] callumjones|9 years ago|reply
[+] [-] lsiebert|9 years ago|reply
[+] [-] joshu|9 years ago|reply
[+] [-] HedleyLamar|9 years ago|reply
[+] [-] steveklabnik|9 years ago|reply