top | item 38677336

(no title)

wander_homer | 2 years ago

Author here. The app works in two steps:

Step one is building an index of the file system. This is simply done by walking the filesystem. The resulting index is stored in RAM and a file. On the next app start the index ia loaded from that file, which is much quicker than walking the file system.

Step two is using this in RAM index for searching. This scales really well with the number or CPU cores and on modern systems a normal case insensitive substring search should finish almost instantly with few million files.

The next release will support file system monitoring with inotify and fanotify to keep the index updated. Although this has some drawbacks.

discuss

order

CyberDildonics|2 years ago

This is simply done by walking the filesystem.

This is the part I'm wondering about. Everything scans the filesystem very fast and there is no way it is just using 'stat' on every file then diving into the directories.

Are you just using stat from C to walk the filesystem or are you doing something else?

I've used sqlite to cache filesystem results and it is also extremely fast once everything is in there, but I think a lot of approaches should work once the file attributes are cached.

soundarana|2 years ago

On NTFS Everything reads the MFT, which is sequential on disk.

Then on subsequent starts it reads the NFTS update journal to see what changed.

lelanthran|2 years ago

> Everything scans the filesystem very fast and there is no way it is just using 'stat' on every file then diving into the directories.

The last time I checked, Everything worked by using the AV calls microsoft provides; anytime a file is written, the name (and other metadata) can be written to a log that Everything can check once every 5 seconds or so.

If I thought there was any money at all to be made from providing an Everything equivalent[1] on Linux, I'd spend the week or so to write it, but as far as I can tell there's just no market for something like this.

[1] By that I mean "similar in performance and query capabilities"; I would obviously need more time than that to hook into the common file-open dialog widgets (Gnome/KDE/etc) so that users could run their queries straight from existing file dialog widgets.

wander_homer|2 years ago

Yes, it's simply using stat on every file/folder. There's probably some room of improvement there with clever parallelization, but it'll remain a bottleneck.

Everything is parsing a file called the MFT to build its index. This much more efficient but unfortunately this file only present on NTFS volumes, which makes it super useful on Windows systems, but not so much everywhere else.

Another benefit you get on Windows is the USN journal, which allows Everything to keep the index updated much more efficiently.

bdzr|2 years ago

I've never used fsearch, but I use a CLI tool that replaces locate (https://plocate.sesse.net/). Do you have an idea of how the performance and index format compares with fsearch?

wander_homer|2 years ago

I'm not familiar with the internals of plocate, but I'll have a brief look at it.

pangey|2 years ago

Is it possible to use eBPF for this task instead of inotify?

wander_homer|2 years ago

Maybe, but I'm not sure if there's much benefit to that. The most inefficient part of the inotify or fanotify solution is that you have to walk the file system before monitoring can even start, because you first need to know which folders and files are there to begin with. And unfortunately this can't be avoided with eBPF.