top | item 43838856

Path Isn't Real on Linux

113 points| max__dev | 10 months ago |blog.danielh.cc

112 comments

order

mzajc|10 months ago

Fun fact: if you've ever had bash (or another shell) complain that a file doesn't exist, even though it's on $PATH, check if it's been cached by `hash`. If the file is moved elsewhere on $PATH and bash has the old path cached, you will get an ENOENT. The entire cache can be invalidated with `hash -r`.

JohnMakin|10 months ago

you just solved a bug I couldnt explain like 6 years ago

vlovich123|10 months ago

Is this an old behavior? I would think ENOENT would invalidate the cache entry at least.

ninkendo|10 months ago

I think bash has an alias “rehash” that does the same as hash -r too. But zsh doesn’t have it, so “hash -r” has entered my muscle memory, as it works in both shells.

Edit: wrong shell, zsh has rehash, bash does not.

bobbylarrybobby|10 months ago

Ah, so that's where sudo texhash -r comes from when installing a latex package!

OptionOfT|10 months ago

Unsure of in which situation, but I've had situations where a script didn't have the right shebang, and as such I had to resort to `alias --save hash='#'` to make sure the script worked.

eternauta3k|10 months ago

The other typical cause is when an interpreter or library is compiled with the wrong libc version.

noman-land|10 months ago

Wtf. TIL about hash.

blcknight|10 months ago

Path globbing, pipes, redirection, job control (fg/bg), and all shell variables -- not just $PATH -- are all handled by the shell.

The kernel has no idea what the current process' environment $PATH is, and doesn't even parse any process environment variables at all.

thayne|10 months ago

PATH isn't just handled by the shell though. Many (but not all!) of the exec* family of functions in libc respect PATH.

wpollock|10 months ago

Why would strace cat be useful here? By the time cat runs, it was obviously already found.

It is basic knowledge that PATH is used by a command interpreter to locate the pathname of binaries. This is true for Window's cmd.exe as well. I never heard of a system where locating files for execution was performed by a kernel.

V99|10 months ago

True... `strace bash -c cat` would give more the series of stat calls they're intending to see:

newfstatat(AT_FDCWD, ".", {st_mode=S_IFDIR|0700, st_size=4096, ...}, 0) = 0

newfstatat(AT_FDCWD, "/usr/local/sbin/cat", 0x7fffcec2f3b8, 0) = -1 ENOENT (No such file or directory)

newfstatat(AT_FDCWD, "/usr/local/bin/cat", 0x7fffcec2f3b8, 0) = -1 ENOENT (No such file or directory)

newfstatat(AT_FDCWD, "/usr/sbin/cat", 0x7fffcec2f3b8, 0) = -1 ENOENT (No such file or directory)

newfstatat(AT_FDCWD, "/usr/bin/cat", {st_mode=S_IFREG|0755, st_size=68536, ...}, 0) = 0

userbinator|10 months ago

I never heard of a system where locating files for execution was performed by a kernel.

Also true for MS/PC-DOS... which also holds the distinction of having some rare "truly monolithic" API-compatible variants that put the kernel, drivers, and shell in a single binary, so that may satisfy your criteria.

MathMonkeyMan|10 months ago

In the [exec][1] family of POSIX functions, if the command path doesn't contain a slash, then it's looked up in the PATH.

> If the file argument contains a slash character, the file argument shall be used as the pathname for this file. Otherwise, the path prefix for this file is obtained by a search of the directories passed as the environment variable PATH [...]

[1]: https://pubs.opengroup.org/onlinepubs/009695399/functions/ex...

HelloNurse|10 months ago

The kernel's job is to execute executable files, while the shell's job is to bridge the gap between a user-facing command name ("cat") and an executable file (/usr/bin/cat). The PATH environment variable provides such a good general and transparent way to control this task that most shells on most operating systems work that way.

inlets|10 months ago

Why would the author think that the PATH environment variable is being used by the kernel? What an odd assumption.

MisterTea|10 months ago

Ignorance leading to assumptions. Their eureka moment: "The shell, not the Linux kernel, is responsible for searching for executables in PATH!" makes it obvious they haven't read up on operating systems. Shame because you should know how the machine works to understand what is happening in your computer. I always recommend reading Operating Systems: Three Easy Pieces. https://pages.cs.wisc.edu/~remzi/OSTEP/

quotemstr|10 months ago

Well, execve(2) and execvp(3) are both "system" functions. C (which is already black magic for some people) invokes both by calling into functions exported from libc. If you're not super dorky^Wfamiliar with low-level systems stuff, you might guess that the two functions are implemented in the same place and in the same way. That the latter is just a libc wrapper around the former that does a PATH search is arcane detail you don't have to care about 99% of the time.

It's hard to appreciate how the world looks before you learn a fact. You can't unsee things.

LegionMammal978|10 months ago

One thing I was surprised to learn a couple years ago is that users and groups aren't really tracked much by the Linux kernel: they're just numeric IDs that track process and file ownership. So if you setuid() to a user ID that doesn't exist in /etc/passwd or anywhere else, the kernel won't stop you.

rafram|10 months ago

Unnecessarily rude. There was also a time when you didn’t know this. I can guarantee it!

klysm|10 months ago

I don't think it's an odd assumption at all! The lines between shell, exec calls, globbing, etc, are very blurry if you don't already know how it all fits together.

Joker_vD|10 months ago

Why not? Every executable is started with execve(2) syscall which takes an array of the environment variables that the kernel use to reset the process's environment variables it inherited from its parent, so obviously the kernel has full knowledge of the environment variables of all of the processes in the system.

Now, there is a reason why kernel actually does not have such knowledge, but it's not at all unreasonable to assume that the kernel has it.

mynegation|10 months ago

You and I and bunch of other people know it and take it to be self-evident, but someone discovered it (maybe recently, maybe they have known it for a while) and did the nice write up for people who had not have known that yet. https://xkcd.com/1053/

dzaima|10 months ago

That's not a truth that'd come from first principles, never mind a trivial truth; it's extremely trivial to imagine a kernel that does parse PATH where it wouldn't be true.

As such, it's a thing one has to explicitly look up to know, which the author did.

jakogut|10 months ago

The Linux kernel also doesn't have any concept of shared libraries, which are resolved by ld.so, a program that's usually shipped as part of libc.

I like this approach of shunting off functionality that's important, necessary, and omnipresent across all OSes to userspace, rather than giving into the temptation to put everything and the kitchen sink into the kernel. It seems to make a more versatile and future proof OS, that's easy to work with in spite of uncertainty.

userbinator|10 months ago

I've worked with "both sides" and the way ELF shared libraries on Linux work is an absolute bloody mess compared to how Windows' PE works. On Windows the same executable format and dynamic linker are usable in both user and kernel mode.

matheusmoreira|10 months ago

This is even reflected in the ELF format itself. There's this really arcane dichotomy between sections and segments.

Sections are very detailed metadata that all sorts of things use for all sorts of purposes. Compilers use them. Debuggers use them. Static and dynamic linkers use them. Anyone can use them for any purpose whatsoever. You can easily add your own custom sections to any executable using tools like objcopy. It's completely arbitrary, held together by convention.

Segments, on the other hand, don't even have names. They are just a list of file extents required for the program to actually execute and their address space locations. The program header table is essentially a sorted list of arguments for the mmap system call.

This is Linux kernel's ELF loader:

https://github.com/torvalds/linux/blob/master/fs/binfmt_elf....

It basically just mmaps in the PT_LOAD segments of the ELF file, copies stuff like arguments and environment and then starts a thread at the entry point specified in the ELF header.

It's just that when loading dynamic ELFs it jumps into the dynamic linker instead of the actual program. It's as though every single program had a #!/lib/ld.so shebang line. The absolute path is even hardcoded into the executable itself.

  readelf -a $(which cat) | grep -i interpreter
        [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
When an "interpreter" is requested, Linux will load it alongside the actual program and will run it instead of the actual program. This "ELF interpreter" then does an absurd amount of work by recursively loading and linking libraries, linking the actual executable and only then jumping into its entry point.

I'm not kidding about the "absurd amount of work" part. These linkers even have to topologically sort dependencies like a package manager so they can be initialized properly.

https://blogs.oracle.com/solaris/post/init-and-fini-processi...

SuperNinKenDo|10 months ago

I was trying to understand what the lede was here, and it turns out the author assumed that PATH was something understood by the kernel, which is rather an odd assumption, but perhaps one that others make.

I did get one thing out of this though. I had honestly wondered for the longest time why we need to call env to get the same functionality as PATH in a shebang.

Ironically, thanks to either an article I read here (or on the crustacean site) recently, I already knew that the shebang is something which is parsed by the kernel, but had not put two and two together at all.

Much like the author. So goes to show the benefits of exploring and thinking about seemingly "obvious" concepts.

khrbtxyz|10 months ago

Another bit of trivia about the shebang support in Linux is that is possible to build the kernel without it. https://github.com/torvalds/linux/blob/master/fs/Kconfig.bin...

  config BINFMT_SCRIPT
  tristate "Kernel support for scripts starting with #!"
  default y
  help
    Say Y here if you want to execute interpreted scripts starting with
    #! followed by the path to an interpreter.

drougge|10 months ago

Using "#!sh" at the top of the file does work, but not predictably. It may execute sh in your current directory, which is what Linux does, but your shell may override that (zsh does if the first attempt fails). So it works, but not the way you want it to.

And I'm sure other kernels do other things too.

teo_zero|10 months ago

I don't get the logical passage from "PATH is handled by the shell" to "isn't real on Linux".

cryptonector|10 months ago

It's real, it's just implemented by the shell -- same as all Unix-like operating systems. Heck, same as Windows.

klysm|10 months ago

Might be more accurate to say "Linux doesn't know about PATH, but your shell does"

taraindara|10 months ago

This actually helps explain some behaviors I’ve encountered. It was never a serious issue, since the answer is to use a full path. But is slightly annoying none the less. Understanding helps a lot.

0xbadcafebee|10 months ago

The title is nonsense. PATH is the name of an environment variable (a Real Thing(TM)) which lists a set of directories to search for an executable. It is used by shells (including those running on Linux) to locate an executable when the full path to the executable is not supplied by the user.

This is needed because the exece()/execve() [2] kernel system call is unaware of things like environment variables so it will not have any idea how or where to execute a program 'cat' unless it is given the full path to 'cat', so the shell has to look it up (again if the user doesn't pass the full path). It's the same on every POSIX system and the original UNIXes. It's been this way for at least 50 years. (edit 60 years, it's from Multics [1])

Kids today really need to learn the fundamentals of computer operating systems. Or do that boring old-person thing we did before StackOverflow, and read all the manual pages, which tell you all this [3] [4].

[1] https://en.wikipedia.org/wiki/PATH_(variable) [2] https://man7.org/linux/man-pages/man2/execve.2.html [3] https://www.man7.org/linux/man-pages/man1/dash.1.html [4] https://www.man7.org/linux/man-pages/man1/intro.1.html https://www.man7.org/linux/man-pages/man2/intro.2.html https://www.man7.org/linux/man-pages/man7/man-pages.7.html https://www.man7.org/linux/man-pages/man7/standards.7.html

Joker_vD|10 months ago

The fact that the Linux kernel does not track environment variables of the processes is not a "fundamental". The setenv/getenv could very well have been syscalls, it's simply a design decision that they are not. One can make a kernel with such tracking, and it'd still be POSIX compliant as long as you supply setenv(3)/getenv(3) wrappers with expected signatures in your system libc.

saagarjha|10 months ago

Reading the code to things is perfectly fine, actually.

ashu1461|10 months ago

Is it right to assume that the PATH env variable and the context in what it is used are two different things ?

While the PATH variable fundamentally is same as other env variables like HOME / USER

but how PATH is interpreted will change from context to context ?

Tsiklon|10 months ago

Silly tangentially related question; I like to think of myself as fairly competent in the Linux and unix world.

In the unix systems of the past was it easier to hold a more complete understanding of the system and its components in your head?

jfax|10 months ago

A few others are saying, "well yeah, duh!", but this to me demonstrates a mental fault that arises in calling GNU+Linux, "Linux".

dfedbeef|10 months ago

It's real in GNU/Linux tho...

dfedbeef|10 months ago

legitimately, if you're interested try writing a shell, your own libc, an elf loader even. It's fun! C is good and cool!

self_awareness|10 months ago

I'm telling you, that environment variables you have are NOT real!

i140i485i765|10 months ago

Nobody talks about vfs path resolution here? There are too many layers in the whole process, even the path from strace can be resolved to another path.

anacrolix|10 months ago

Uh yeah duh. But I through waiting for him to discover hash in the shell. No such luck. Guess it's in the magic somewhere. (Do man hash or something if you have no idea what I'm talking about)

bawolff|10 months ago

Doesn't that go without saying?

m463|10 months ago

what about rehash?

smcameron|10 months ago

> The shell, not the Linux kernel, is responsible for searching for executables in PATH!

I mean, no shit, Sherlock? the exec family of system calls requires a path to a file, not a filename with an implicit path from the environment, of course the PATH is handled by the shell.

Joker_vD|10 months ago

All members of the exec family of system calls, which consists of only two syscalls, namely, execve(2) and execveat(2), literally have the envp parameter which is supposed to have all the environment variables for the process.

Now, the semantics of this parameter is that kernel does not use it for path resolution when searching for the executable — but it could.