(no title)
wander_homer | 2 years ago
But the daemon also has to start at one point (you're just shifting the problem down that stack) and that's where it gets expensive IF you want to be as accurate as Everything. But of course, if you don't care about accuracy, starting up the daemon isn't time consuming. I've already discussed this with my users in the past and we settled for a toggle switch where users can opt-in to that behavior of more speed at the cost of having false results.
> How will it put me further behind? I did say that it will only be done during software installation, right?
Everything also only does this whenever a filesystem is first detected and scanned; still people care about the performance in those cases. Especially when you're often plugging in USB HDDs and such.
> 1. Shared filesystems - I don't care about this because Everything doesn't care about this being performant: In Everything, as far as I understand it, the user manually indexes network shares.
This is not only about network shares, but also about dual boot system, where multiple OSes use the same filesystem and USB HDDs/SSDs.
> 2. Maintenance modes won't give you thousands of false positives; at most you're looking at a diff of maybe dozens of index entries, if that.
Of course it does. Just in the last week ~13,000 files and folders were modified on my system with the system update (which ran in a maintenance boot environment where other daemons don't get started). That's 13,000 files and folders which will either be missing in your indexing solution or show up as false positives (because you're using outdated metadata, like their old size or timestamps).
> Well, it will be a few orders of magnitude faster to start up than checking for filesystem changes on startup, no?
Of course, but again that's not the problem. The problem is doing what Everything does: Start up a few orders of magnitude faster AND at the same time checking for filesystem changes on startup.
> If you don't mind me asking, how do you do it? Because inotify is out of the question if you want to monitor 2.5m files. Even for just the home directory you will run the risk of exhausting file descriptors by using inotify.
I'm using fanotify by default and inotify as a fallback in the case the filesystem or kernel doesn't support fanotify with the feature set I need. Running out of file descriptors is usually not an issue, because you don't need to keep file descriptors open for all files. My system has more than 3 million files and even using just inotify for that does work.
> All the existing utilities create the index only during installation?
Obviously not all, because some don't even create an index to begin, but many do.
And btw. I doubt that your solution, of creating an index only once, even works, because sooner or later you need to rescan larger parts of the filesystem, when the inconsistencies become to frequent (like when you suddenly become filesystem change notifications for files which you didn't even know about).
> Which hard and important problem? That changes made in maintenance mode aren't seen?
Getting the index in a consistent state with the filesystem after boot.
> A major feature of Everything when people wax on about its speed is how quickly new entries in the filesystem show up in the applications query results.
> Even while the results is open, the user can see files that were added since the last keystroke.
> 1. How does FSearch handle this common and obvious use-case?
It detects filesystem events with fanotify, queues some of them for batch processing, then applies them to the index and results.
> 2. What's the newest filesystem change you can expect to see when performing a query in FSearch? Is it "the last change made prior to the application startup"? Is it "The last change made prior to the query"? Is it "The last change made since we walked the filesystem"?
In the development version with monitoring support changes to the filesystem show up in the results almost immediately; it's usually less than a second. Only in the rare case when many thousand files get modified almost simultaneously, it can take a few more seconds. Hence when you sort your results by date modified, you can live monitor all the recent changes that are being made on your system.
> 3. What's the p99 for startup time in FSearch? The p99 for query results of N (where N is a suitably large number)?
This depends on the storage type. But on modern SSDs with a few million files it's usually a second or so to load the index from the database file. You can then search right away and depending on whether you've configured the system to also be accurate or not, a rescan might be triggered in the background, which obviously takes much longer to finish, but then you'll guaranteed to have correct results.
> 4. 4. You mentioned "areas that you and others care about". Can you briefly list the areas, other than complete and 100% accuracy during maintenance mode. All I know about is what Everthing users appear to care about, and they simply aren't caring about USB memory sticks, cameras plugged in, network drives, maintenance mod diffs, etc. They do appear to care that it is responsive.
I'll have to answer that in a few hours if you don't mind, I have to get going now.
No comments yet.