Fun fact: if you've ever had bash (or another shell) complain that a file doesn't exist, even though it's on $PATH, check if it's been cached by `hash`. If the file is moved elsewhere on $PATH and bash has the old path cached, you will get an ENOENT. The entire cache can be invalidated with `hash -r`.
I think bash has an alias “rehash” that does the same as hash -r too. But zsh doesn’t have it, so “hash -r” has entered my muscle memory, as it works in both shells.
Unsure of in which situation, but I've had situations where a script didn't have the right shebang, and as such I had to resort to `alias --save hash='#'` to make sure the script worked.
Why would strace cat be useful here? By the time cat runs, it was obviously already found.
It is basic knowledge that PATH is used by a command interpreter to locate the pathname of binaries. This is true for Window's cmd.exe as well. I never heard of a system where locating files for execution was performed by a kernel.
I never heard of a system where locating files for execution was performed by a kernel.
Also true for MS/PC-DOS... which also holds the distinction of having some rare "truly monolithic" API-compatible variants that put the kernel, drivers, and shell in a single binary, so that may satisfy your criteria.
In the [exec][1] family of POSIX functions, if the command path doesn't contain a slash, then it's looked up in the PATH.
> If the file argument contains a slash character, the file argument shall be used as the pathname for this file. Otherwise, the path prefix for this file is obtained by a search of the directories passed as the environment variable PATH [...]
The kernel's job is to execute executable files, while the shell's job is to bridge the gap between a user-facing command name ("cat") and an executable file (/usr/bin/cat).
The PATH environment variable provides such a good general and transparent way to control this task that most shells on most operating systems work that way.
Ignorance leading to assumptions. Their eureka moment: "The shell, not the Linux kernel, is responsible for searching for executables in PATH!" makes it obvious they haven't read up on operating systems. Shame because you should know how the machine works to understand what is happening in your computer. I always recommend reading Operating Systems: Three Easy Pieces.
https://pages.cs.wisc.edu/~remzi/OSTEP/
Well, execve(2) and execvp(3) are both "system" functions. C (which is already black magic for some people) invokes both by calling into functions exported from libc. If you're not super dorky^Wfamiliar with low-level systems stuff, you might guess that the two functions are implemented in the same place and in the same way. That the latter is just a libc wrapper around the former that does a PATH search is arcane detail you don't have to care about 99% of the time.
It's hard to appreciate how the world looks before you learn a fact. You can't unsee things.
One thing I was surprised to learn a couple years ago is that users and groups aren't really tracked much by the Linux kernel: they're just numeric IDs that track process and file ownership. So if you setuid() to a user ID that doesn't exist in /etc/passwd or anywhere else, the kernel won't stop you.
I don't think it's an odd assumption at all! The lines between shell, exec calls, globbing, etc, are very blurry if you don't already know how it all fits together.
Why not? Every executable is started with execve(2) syscall which takes an array of the environment variables that the kernel use to reset the process's environment variables it inherited from its parent, so obviously the kernel has full knowledge of the environment variables of all of the processes in the system.
Now, there is a reason why kernel actually does not have such knowledge, but it's not at all unreasonable to assume that the kernel has it.
You and I and bunch of other people know it and take it to be self-evident, but someone discovered it (maybe recently, maybe they have known it for a while) and did the nice write up for people who had not have known that yet. https://xkcd.com/1053/
That's not a truth that'd come from first principles, never mind a trivial truth; it's extremely trivial to imagine a kernel that does parse PATH where it wouldn't be true.
As such, it's a thing one has to explicitly look up to know, which the author did.
The Linux kernel also doesn't have any concept of shared libraries, which are resolved by ld.so, a program that's usually shipped as part of libc.
I like this approach of shunting off functionality that's important, necessary, and omnipresent across all OSes to userspace, rather than giving into the temptation to put everything and the kitchen sink into the kernel. It seems to make a more versatile and future proof OS, that's easy to work with in spite of uncertainty.
I've worked with "both sides" and the way ELF shared libraries on Linux work is an absolute bloody mess compared to how Windows' PE works. On Windows the same executable format and dynamic linker are usable in both user and kernel mode.
This is even reflected in the ELF format itself. There's this really arcane dichotomy between sections and segments.
Sections are very detailed metadata that all sorts of things use for all sorts of purposes. Compilers use them. Debuggers use them. Static and dynamic linkers use them. Anyone can use them for any purpose whatsoever. You can easily add your own custom sections to any executable using tools like objcopy. It's completely arbitrary, held together by convention.
Segments, on the other hand, don't even have names. They are just a list of file extents required for the program to actually execute and their address space locations. The program header table is essentially a sorted list of arguments for the mmap system call.
It basically just mmaps in the PT_LOAD segments of the ELF file, copies stuff like arguments and environment and then starts a thread at the entry point specified in the ELF header.
It's just that when loading dynamic ELFs it jumps into the dynamic linker instead of the actual program. It's as though every single program had a #!/lib/ld.so shebang line. The absolute path is even hardcoded into the executable itself.
readelf -a $(which cat) | grep -i interpreter
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
When an "interpreter" is requested, Linux will load it alongside the actual program and will run it instead of the actual program. This "ELF interpreter" then does an absurd amount of work by recursively loading and linking libraries, linking the actual executable and only then jumping into its entry point.
I'm not kidding about the "absurd amount of work" part. These linkers even have to topologically sort dependencies like a package manager so they can be initialized properly.
I was trying to understand what the lede was here, and it turns out the author assumed that PATH was something understood by the kernel, which is rather an odd assumption, but perhaps one that others make.
I did get one thing out of this though. I had honestly wondered for the longest time why we need to call env to get the same functionality as PATH in a shebang.
Ironically, thanks to either an article I read here (or on the crustacean site) recently, I already knew that the shebang is something which is parsed by the kernel, but had not put two and two together at all.
Much like the author. So goes to show the benefits of exploring and thinking about seemingly "obvious" concepts.
config BINFMT_SCRIPT
tristate "Kernel support for scripts starting with #!"
default y
help
Say Y here if you want to execute interpreted scripts starting with
#! followed by the path to an interpreter.
Using "#!sh" at the top of the file does work, but not predictably. It may execute sh in your current directory, which is what Linux does, but your shell may override that (zsh does if the first attempt fails). So it works, but not the way you want it to.
Accessing environment variables from the kernel space isn't even all that easy, because the information lives in userspace in process VM. Here's how it's done for the purpose of showing it in `/proc/[pid/environ`:
This actually helps explain some behaviors I’ve encountered. It was never a serious issue, since the answer is to use a full path. But is slightly annoying none the less. Understanding helps a lot.
The title is nonsense. PATH is the name of an environment variable (a Real Thing(TM)) which lists a set of directories to search for an executable. It is used by shells (including those running on Linux) to locate an executable when the full path to the executable is not supplied by the user.
This is needed because the exece()/execve() [2] kernel system call is unaware of things like environment variables so it will not have any idea how or where to execute a program 'cat' unless it is given the full path to 'cat', so the shell has to look it up (again if the user doesn't pass the full path). It's the same on every POSIX system and the original UNIXes. It's been this way for at least 50 years. (edit 60 years, it's from Multics [1])
Kids today really need to learn the fundamentals of computer operating systems. Or do that boring old-person thing we did before StackOverflow, and read all the manual pages, which tell you all this [3] [4].
The fact that the Linux kernel does not track environment variables of the processes is not a "fundamental". The setenv/getenv could very well have been syscalls, it's simply a design decision that they are not. One can make a kernel with such tracking, and it'd still be POSIX compliant as long as you supply setenv(3)/getenv(3) wrappers with expected signatures in your system libc.
Nobody talks about vfs path resolution here? There are too many layers in the whole process, even the path from strace can be resolved to another path.
Uh yeah duh. But I through waiting for him to discover hash in the shell. No such luck. Guess it's in the magic somewhere. (Do man hash or something if you have no idea what I'm talking about)
> The shell, not the Linux kernel, is responsible for searching for executables in PATH!
I mean, no shit, Sherlock? the exec family of system calls requires a path to a file, not a filename with an implicit path from the environment, of course the PATH is handled by the shell.
All members of the exec family of system calls, which consists of only two syscalls, namely, execve(2) and execveat(2), literally have the envp parameter which is supposed to have all the environment variables for the process.
Now, the semantics of this parameter is that kernel does not use it for path resolution when searching for the executable — but it could.
mzajc|10 months ago
JohnMakin|10 months ago
vlovich123|10 months ago
ninkendo|10 months ago
Edit: wrong shell, zsh has rehash, bash does not.
bobbylarrybobby|10 months ago
OptionOfT|10 months ago
eternauta3k|10 months ago
noman-land|10 months ago
DiggyJohnson|10 months ago
blcknight|10 months ago
The kernel has no idea what the current process' environment $PATH is, and doesn't even parse any process environment variables at all.
thayne|10 months ago
wpollock|10 months ago
It is basic knowledge that PATH is used by a command interpreter to locate the pathname of binaries. This is true for Window's cmd.exe as well. I never heard of a system where locating files for execution was performed by a kernel.
V99|10 months ago
newfstatat(AT_FDCWD, ".", {st_mode=S_IFDIR|0700, st_size=4096, ...}, 0) = 0
newfstatat(AT_FDCWD, "/usr/local/sbin/cat", 0x7fffcec2f3b8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/cat", 0x7fffcec2f3b8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/sbin/cat", 0x7fffcec2f3b8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/cat", {st_mode=S_IFREG|0755, st_size=68536, ...}, 0) = 0
userbinator|10 months ago
Also true for MS/PC-DOS... which also holds the distinction of having some rare "truly monolithic" API-compatible variants that put the kernel, drivers, and shell in a single binary, so that may satisfy your criteria.
MathMonkeyMan|10 months ago
> If the file argument contains a slash character, the file argument shall be used as the pathname for this file. Otherwise, the path prefix for this file is obtained by a search of the directories passed as the environment variable PATH [...]
[1]: https://pubs.opengroup.org/onlinepubs/009695399/functions/ex...
HelloNurse|10 months ago
inlets|10 months ago
MisterTea|10 months ago
quotemstr|10 months ago
It's hard to appreciate how the world looks before you learn a fact. You can't unsee things.
LegionMammal978|10 months ago
rafram|10 months ago
klysm|10 months ago
Joker_vD|10 months ago
Now, there is a reason why kernel actually does not have such knowledge, but it's not at all unreasonable to assume that the kernel has it.
mynegation|10 months ago
dzaima|10 months ago
As such, it's a thing one has to explicitly look up to know, which the author did.
unknown|10 months ago
[deleted]
unknown|10 months ago
[deleted]
jakogut|10 months ago
I like this approach of shunting off functionality that's important, necessary, and omnipresent across all OSes to userspace, rather than giving into the temptation to put everything and the kitchen sink into the kernel. It seems to make a more versatile and future proof OS, that's easy to work with in spite of uncertainty.
userbinator|10 months ago
matheusmoreira|10 months ago
Sections are very detailed metadata that all sorts of things use for all sorts of purposes. Compilers use them. Debuggers use them. Static and dynamic linkers use them. Anyone can use them for any purpose whatsoever. You can easily add your own custom sections to any executable using tools like objcopy. It's completely arbitrary, held together by convention.
Segments, on the other hand, don't even have names. They are just a list of file extents required for the program to actually execute and their address space locations. The program header table is essentially a sorted list of arguments for the mmap system call.
This is Linux kernel's ELF loader:
https://github.com/torvalds/linux/blob/master/fs/binfmt_elf....
It basically just mmaps in the PT_LOAD segments of the ELF file, copies stuff like arguments and environment and then starts a thread at the entry point specified in the ELF header.
It's just that when loading dynamic ELFs it jumps into the dynamic linker instead of the actual program. It's as though every single program had a #!/lib/ld.so shebang line. The absolute path is even hardcoded into the executable itself.
When an "interpreter" is requested, Linux will load it alongside the actual program and will run it instead of the actual program. This "ELF interpreter" then does an absurd amount of work by recursively loading and linking libraries, linking the actual executable and only then jumping into its entry point.I'm not kidding about the "absurd amount of work" part. These linkers even have to topologically sort dependencies like a package manager so they can be initialized properly.
https://blogs.oracle.com/solaris/post/init-and-fini-processi...
SuperNinKenDo|10 months ago
I did get one thing out of this though. I had honestly wondered for the longest time why we need to call env to get the same functionality as PATH in a shebang.
Ironically, thanks to either an article I read here (or on the crustacean site) recently, I already knew that the shebang is something which is parsed by the kernel, but had not put two and two together at all.
Much like the author. So goes to show the benefits of exploring and thinking about seemingly "obvious" concepts.
khrbtxyz|10 months ago
drougge|10 months ago
And I'm sure other kernels do other things too.
teo_zero|10 months ago
cryptonector|10 months ago
klysm|10 months ago
megous|10 months ago
https://elixir.bootlin.com/linux/v6.14.4/source/fs/proc/base...
taraindara|10 months ago
0xbadcafebee|10 months ago
This is needed because the exece()/execve() [2] kernel system call is unaware of things like environment variables so it will not have any idea how or where to execute a program 'cat' unless it is given the full path to 'cat', so the shell has to look it up (again if the user doesn't pass the full path). It's the same on every POSIX system and the original UNIXes. It's been this way for at least 50 years. (edit 60 years, it's from Multics [1])
Kids today really need to learn the fundamentals of computer operating systems. Or do that boring old-person thing we did before StackOverflow, and read all the manual pages, which tell you all this [3] [4].
[1] https://en.wikipedia.org/wiki/PATH_(variable) [2] https://man7.org/linux/man-pages/man2/execve.2.html [3] https://www.man7.org/linux/man-pages/man1/dash.1.html [4] https://www.man7.org/linux/man-pages/man1/intro.1.html https://www.man7.org/linux/man-pages/man2/intro.2.html https://www.man7.org/linux/man-pages/man7/man-pages.7.html https://www.man7.org/linux/man-pages/man7/standards.7.html
Joker_vD|10 months ago
saagarjha|10 months ago
ashu1461|10 months ago
While the PATH variable fundamentally is same as other env variables like HOME / USER
but how PATH is interpreted will change from context to context ?
Tsiklon|10 months ago
In the unix systems of the past was it easier to hold a more complete understanding of the system and its components in your head?
jfax|10 months ago
dfedbeef|10 months ago
dfedbeef|10 months ago
self_awareness|10 months ago
i140i485i765|10 months ago
anacrolix|10 months ago
bawolff|10 months ago
m463|10 months ago
unknown|10 months ago
[deleted]
semiquaver|10 months ago
[deleted]
smcameron|10 months ago
I mean, no shit, Sherlock? the exec family of system calls requires a path to a file, not a filename with an implicit path from the environment, of course the PATH is handled by the shell.
Joker_vD|10 months ago
Now, the semantics of this parameter is that kernel does not use it for path resolution when searching for the executable — but it could.