Cygwin and MinGW utilities may lose files

[+] jcoffland|10 years ago|reply

Tl;dr if you have two files in a tar archive, one with an .exe extension and another without, but otherwise with the same name, e.g. test and test.exe, then both cygwin and msys2 tar will overwrite one of the files with the other on extraction. The developers insist that this is the correct behavior, refuse to even consider changing it and will barely even discuss it.

I've often run into cases like this where the developers insist on some ridiculous behavior and will not budge and even get angry if you question their decision. Another good example I found recently was with makepkg, a tool for creating packages for the pacman package system on ArchLinux. makepkg refuses to run as root. It used to have an option --asroot which would override this behavior but the devs decided to remove the option in recent versions. They insist that there is absolutely no reason that you would ever need to run makepkg as root and it is just too dangerous for you to be allowed to do so. This makes it extremely difficult to run makepkg inside a Docker container. They say you can just sudo to the nobody user but in Docker this takes many nontrivial steps.

Another good example is Bitcoin where the "core" devs currently have a strangle hold on the community. They've repeatedly made decisions that many disagree with. Now the community has split into warring factions yet still "core" retains its power.

Sure you can fork an Open-Source project but the power is held by those who control the most popular outlet for that software.

[+] ikeboy|10 years ago|reply

Bitcoin is not a good example, because there can't be competing forks on the protocol level. Once one fork "wins", the other automatically dies. Therefore, it's much more important to get the right design, because individuals in the minority who don't like the choice made by the others have no way to change anything.

Whereas a typical open-source project can, in theory, be forked with absolutely no consequences for those who remain on the original fork. Any change introduced in bitcoin harms those who disagree with it. The analogue by other software would need to be auto-updated software that couldn't be blocked.

Edit: another difference is that if you fork bitcoin and mess up, it can be unrecoverable. It might crash, or all funds might become unspendable, or a couple of other things. If it shakes confidence, then you might not be able to go back.

Also, you can't fully test bitcoin releases in advance because they interact with the network, especially for forks. So you might not know something's wrong until it's too late.

While a mistake in other software can just be reverted, and can be tested independently, because instances are localized.

(Disclaimer: I am simplifying a bit here.)

[+] barrkel|10 years ago|reply

It's not ridiculous behaviour; it's the correct behaviour for Cygwin in almost all cases.

Windows executables are identified by their extension. That's in direct conflict with POSIX-style executables. You wouldn't want to have to execute every command-line tool like ls.exe, cp.exe, bash.exe, rm.exe etc. So Cygwin looks on the filesystem twice: both with and without the exe extension.

Similar things need to happen when building code from tarballs. It's not just finding executables to actually execute; compilers and linkers need to work correctly too, and random build scripts aren't going to always be checking for the .exe version of files. So the behaviour applies to reading and writing as well.

Tar is already surprising in that it overwrites by default, potentially including the archive you're extracting from. It's usually a bad idea to extract tar archives into anything other than empty directories, unless you know exactly what you're doing.

[+] interurban|10 years ago|reply

Unless the dev's statement (quoted in TFA) is a blatant lie, this _has_ been discussed multiple times in the past. He also doesn't refuse to discuss the issue, but recommends reading the past discussions before arguing over previously resolved points.

In fact the dev and other users continue to discuss the reasoning for this behavior for quite a lengthy email chain.

Always assume the best of someone until proven otherwise, it saves a lot of stress :)

[+] zokier|10 years ago|reply

I hate that sense of entitlement. The whole point of foss is that people can mold software to fit their vision of how stuff should work without caring about what anyone else thinks. More pointedly the point of free software specifically is to guarantee the power to end users to control the software.

[+] ohm|10 years ago|reply

Just yesterday I ran into a problem where xfce file manager thunar would show a warning message about using it as root. Every single discussion thread about this issue was closed with a comment how thunar should not be used as root and there will be no feature changes to disable this message.

[+] JonathonW|10 years ago|reply

> I am confused as to why this behavior also carries over to MinGW versions of these tools. After all, I thought the whole point of MinGW was not to try to provide a POSIX layer.

Not really.

MinGW's compilation environment doesn't try to provide a POSIX layer-- if you build something with MinGW, you get a native Windows executable that runs exclusively against the APIs provided by Windows (it's a replacement for the MSVC toolchain).

The MinGW/MSYS/MSYS2 toolchain itself, though (including gcc, binutils, and the shell and coreutils that it's usually installed with), requires a POSIX environment. They got this by forking Cygwin-- which explains why you see the same behavior between Cygwin's tar and MSYS's tar. They share the same behavior because they share code. (The old MSYS was a pretty old fork of Cygwin that they didn't keep up-to-date with upstream, but MSYS2 tries to stay pretty close to upstream Cygwin when they can.)

[+] nanis|10 years ago|reply

Thank you for that explanation.

[+] acqq|10 years ago|reply

It seems that there's the way to work it around: the order the files are packed matters!

https://sourceware.org/ml/cygwin/2009-08/msg00293.html

    > tar -xvzf test.tar.gz
    > mydir/myexe.exe
    > mydir/myexe
    >
    > ls myddir
    > myexe

"if a file foo.exe exists, and an application calls stat("foo"), it will get told that, yes, "foo" exists."

But:

"if the order of the files in the tar archive is reversed, both files are unpacked. Or, unpack mydir/myexe.exe explicitely afterwards. The reason that this works is that Cygwin does not check for a file "foo", if the name of the file is explicitely given as "foo.exe"."

Anybody tried?

Also, it's relatively (as in since some years) recent change: "Cygwin always handled the .exe suffix transparently in terms of stat(2) calls, but Cygwin 1.7 also handles them transparently in terms of open(2) and any other call."

[+] nanis|10 years ago|reply

This is illustrated in all three screenshots in my blog post.

[+] Nacraile|10 years ago|reply

Having grappled with mapping between posix paths and the windows case-preserving-but-insensitive semantics, I can't say that I find this sort of thing terribly surprising. Unfortunately, this kind of edge-case behaviour is unavoidable when you're trying to bridge semantic differences between OSes.

[+] mark-r|10 years ago|reply

I expect to have problems with filename case. Not with .exe extensions. This would have taken me completely by surprise as well.

[+] spdustin|10 years ago|reply

So it seems to me: ".exe" is essentially translated as the "+x" bit and dropped from the file.

Didn't read TFA, was just breezing through, and thought I'd see if I the gust of it.

[+] dspillett|10 years ago|reply

So it is trying to map "file" which has its executable bit set to "file.exe", and vice versa, potentially clobbering one file with the other. Or is it doing this irrespective of any posix execute bit that is set?

[+] nanis|10 years ago|reply

In my experiments, I did not explicitly set any executable bits using the corresponding `chmod`. Also, you can see in the Cygwin screenshot that neither `test` nor `test.exe has executable bit set in the archive.

[+] colejohnson66|10 years ago|reply

It's probably because Windows implicitly adds the .exe extension when trying to execute something, and Cygwin works around this by just treating files with and without the extension as the same name.

[+] captainmuon|10 years ago|reply

Is there any way to disable this "feature", or to patch it out? This magic handling of ".exe" should really be handled by bash, not by the standard library.

[+] _kst_|10 years ago|reply

It would make more sense for it to be handled by execve() or equivalent.

For example, you need to be able to execute a file named "/bin/sh" -- but for that file to be executable under Windows, it needs to be named "sh.exe". If you have a file with no extension in its name, Windows more or less doesn't know what to do with it.

Cygwin's solution to this is to treat "sh" and "sh.exe" as the same file, by tweaking the filesystem code. If "foo.exe" is the only file in the current directory, "ls" will show it -- but "ls foo" will show the same file, but with the name "foo".

Making any system call or library function that searches $PATH for executable files, or that executes a file, accept "foo.exe" when you specify "foo", would have been another solution, and perhaps a better one (though there could easily be some nasty gotchas that I haven't thought about).

There is, as far as I know, no way to implement a POSIX layer on top of windows without some kind of ugly hack for "*.exe" files (short of running a self-contained VM, but the point of Cygwin is that you can still access the Windows filesystem).

[+] truist|10 years ago|reply

Yes, this. The problematic behavior is in the shell, but they "fixed" it by updating the standard library to have it try to auto-guess what the application wanted. The fix should have been in the shell.

[+] pervycreeper|10 years ago|reply

Are there any use cases for Cygwin these days on a modern system where VMs are so easy to set up?

[+] sp332|10 years ago|reply

You don't have to transfer files between Cygwin and Windows. Keeping Cygwin up to date is a lot easier than maintaining a whole Linux distro. And you don't have to wait for it to boot up.

[+] Nacraile|10 years ago|reply

The use cases are certainly narrowing, but cygwin is still occasionally useful, e.g.:

- You want an actually functional ssh server running on windows so you can integrate with a mostly-linux test automation environment.

- You're committed to a (terrible) windows-only VCS, but develop some linux applications.

[+] nwatson|10 years ago|reply

Sure -- I might be constrained to work on Windows development or use Windows tools, and having access to good UNIX/Linux-like command-line utilities for scripting, etc., in that environment is a big win.

[+] auganov|10 years ago|reply

Do you actually do that?

There's a lot of hoops one would have to get around for a seamless experience with a VM. Most of the time Cygwin just works[0]. I tried running my Emacs and terminal in a separate VM, wasn't so pleasant. Sharing the file system bidirectionally is the biggest problem.

[0] I'm sure for certain domains, say unix'y systems programming it's not true at all.

[+] ctstover|10 years ago|reply

In any case where one has to mitigate the problems of bringing windows into the mix. At my last job I had several bridges to windows only systems and silliness that exposed out to a nix universe with ssh invokable cli tools.

Currently my only use case is that it does count in my book as a different variety of nix. If you are attempting to have a portable software it is another platform you can use to for testing that can be brought into a CI pipeline.

A more common use case (though still strange) is people using windows natively, who want to use C, C++, Fortran, Ada, etc to make windows programs (instead of cross compiling from Linux). Generally this category is students, and this is still far more educationally beneficial than attempting to learn skills with long term value from say visual studio's IDE.

edit note: asterisks make things italics on hn

[+] barrkel|10 years ago|reply

Yes; when you want to operate on your Windows OS filesystem using Cygwin tools; when you want to use Cygwin tools freely in shell pipes with native tools; when you want to run something like Emacs locally with modes that use background executables for semantic / syntax / linting support; when you want to use shell job control with Windows executables; etc.

[+] gkfasdfasdf|10 years ago|reply

In no particular order, using vim, openssh, git, perl, bash, grep, tar, etc etc etc on the entire native file system in a POSIX environment is really fantastic. I also have an ubuntu VM to spin up on hand but I find I rarely need to do so. The cygwin mintty console is also far superior to the windows command console.

[+] Zardoz84|10 years ago|reply

Yes, when you are forced to work on windows and you need sane CLI tools and a decent shell.

[+] gtk40|10 years ago|reply

It's nice having a nearly native Unixlike shell and its more convenient than a VM. Even for simple tools like git and ssh. I use babun and it works great.

[+] auxym|10 years ago|reply

I prefer using cygwin and opens to ash into vms rather than putty

[+] julie1|10 years ago|reply

Computer Science is looking like speaking french.

A lot of rules. The first rule being to expect exceptions for every rules.

If a natural language can work this way, why not software industry?

Oh! I forgot uncertainties are bad and you may want to get things done in a way that is expected in a way every one understand without equivocation.

Are not inconsistent behaviour defeating the purpose of abstractions? It is an exception to a well expected result.

Cygwin is supposed to be having a low level consistent behaviour. At least that is what POSIX and unix are all about.

I don't see anything good coming from Cygwin and mingwin discarding this critic by a brush of pragmatism.

[+] xorblurb|10 years ago|reply

You don't but I do: you don't want to replace all your exec of e.g. "cp" with "cp.exe", .exe are not very often handled in real Unix environments, results of build without .exe under a real Unix would already clash with a .exe-less entry, Cygwin/MSys/MSys2 are there mainly for compat, and in this context the most compat you achieve (both way) is to handle .exe the way they do -- and there should be no exception.

This leads to odd behaviors, which is unfortunate, but alternatives would lead to even more and/or worse ones.

[+] dave2000|10 years ago|reply

"Computer Science is looking like speaking french.

A lot of rules. The first rule being to expect exceptions for every rules.

If a natural language can work this way, why not software industry?"

Huh? They do both work the same way in that respect. Do you wish they did, or didn't?

55 comments