(no title)
wander_homer | 2 years ago
The two core issues are:
1) How do you quickly get a list of all files and their attributes from the filesystem, without recursively visiting all directories? The kernel has no such functionality and neither do most filesystems (except NTFS with the MFT, which is how Everything solves that).
2) How do you know which files have been modified on a filesystem since it was last mounted on the system or since your monitoring daemon/application was running the last time? This information also needs to be stored persistently on the filesystem (like the USN journal, which Everything is using) if you want to avoid slow recursive traversals.
> I've got a couple of out-there ideas which may or may not pan out, one of which was, indeed, a kernel module.
Well the problem is, my kernel isn't the only kernel who changes the filesystems I'm using. Hence a kernel module only works if your system is the only one whose modifying the data you're working with or most other systems need to be using the same kernel module, which isn't realistic.
> Another idea is to deploy the indexer as a daemon with the applications all using IPC to query and update it. This will give the query applications a significant advantage on startup compared to Everything.
Everything uses a daemon as well and it's not a solution to that issue, because somehow the daemon also has to get the list of files/folders and their attributes out of a filesystem without walking it. How else would the daemon know which files belong to the volume which was just mounted moments ago?
> As for updating the index timeously, I've got a few ideas there as well. Walking the filesystem starting at `/` for each update will result in only performing index updates once a day or so (hence, the reason I expressed the metric as a delta) so I feel that that is no good.
Walking the filesystem shouldn't be done at all, because it's just too slow.
> I'll do an implementation and try to message you (if you want to check it out) because code talks louder than words :-)
Of course, I'd appreciate that.
lelanthran|2 years ago
I wasn't intending to include transient filesystems in the index.
> Of course, I'd appreciate that.
Gimme about a week :-)
wander_homer|2 years ago
There's absolutely no difference between transient and persistent filesystems in regards to that problem. Every time a filesystem gets mounted, you have no idea what you're going to get. The last time it was mounted there could have been 13 million files on it and now when you mount it all of them could be gone or renamed. This is also super common on modern Linux systems, because many of them boot into a minimal boot environment to perform system updates and hence alter the filesystem heavily while such daemons as a file system monitor isn't running.
So the question is: how do you know, whether /some/random/file has been modified while your daemon or application wasn't running or the filesystem wasn't mounted on your system, without performing a stat call on it? If you don't have an answer to that, which also needs to be orders of magnitudes faster, then you'll never match the performance of Everything. And that's not some uncommon situation, because your daemon/app has to figure that out every time it gets launched for every file and folder.