Arranging Invisible Icons in Quadratic Time

[+] baobabKoodaa|5 years ago|reply

> Laying out icons on a grid should be an inherently linear operation, but somehow it was written as quadratic and was executed even when the icons were not being displayed.

I could sort of understand this sloppiness if it was some random app from a hot startup, but this is a Windows issue affecting, presumably, all Windows users... Somehow this gets past code review and QA and into production, where it remains broken and nobody has fixed it. Depressing.

I would argue part of the problem is that the majority of developers are computer science illiterate. They just hack things together, gluing libraries on top of frameworks, with very little understanding about what goes on underneath. Even when you point out a problem like this, people seem to be unwilling or incapable of understanding it.

I have a personal experience where I raised doubts about an unoptimized algorithm (the time complexity was something like O(n^2 log n) where typical case n was supposed to be 10^6). I'm paraphrasing, but the response I got was roughly "don't worry, we can just run it on a powerful VM" and also the classic "developer time is more valuable than CPU time". When people realized this problem can't be solved by throwing money at it, the next suggestion was to artificially constrain n to be small enough that the program completes. Again I was raising doubts that this doesn't make sense, but everybody else thought it's fine. So I spent time running a series of experiments to demonstrate that the results we're getting are nonsense. It was at this point that I optimized the algorithm so it ran in O(n log n) or something similar. Something that we should have done from the start.

[+] adrianN|5 years ago|reply

I would argue that the majority of developers are juggling half a dozen tasks that need to be done ASAP and a simple quadratic solution was fast enough for the use case they had in mind and there definitely was no time to optimize for the case of invisible icons.

[+] fingerlocks|5 years ago|reply

I applied for a job at Microsoft recently. I had to take an online test which consisted of two medium and one hard leetcode problems. In one hour.

It took nearly 15 minutes to just read the problems and make a game plan. That left 15 minutes solve and code all of them up. Have you ever done a hard leetcode problem? In 15 minutes? If that’s the bar at Microsoft, I doubt their developers are computer science illiterate. That test was harder than any of the other FAANG tests I’ve done, online or in person. Clearly the icon code was delegated to some intern or perhaps a child of one of the developers, because the bar is stupid high there just to have a conversation with a hiring manager. And they don’t even pay that well.

[+] saagarjha|5 years ago|reply

> I would argue part of the problem is that the majority of developers are computer science illiterate.

Sometimes, but it’s certainly only part of the problem. I’ve seen these kinds of bugs from people who would “know better”, who understand what O(n^2) is and are entirely willing and capable of fixing these things when you point them out…it’s just that they don’t see them when they are writing the code, or they mispredict the value of n that the code will be used under. And this seems to be prevalent no matter how “production grade” the software is; I’ve diagnosed these in my web browser, filesystem, text editor…in many of these cases I know who the developers are and they are exceptionally talented, and yet these kinds of bugs still exist. I don’t think we can solve these issues without some sort of concerted effort to uncover them, which I would argue is a strangely rare skill.

[+] bigiain|5 years ago|reply

> I would argue part of the problem is that the majority of developers are computer science illiterate.

While I wouldn’t argue against that...

Sometimes people who do know what they’re doing choose slow algorithms knowingly.

I implemented a way finding algorithm that works in O(n^4) a while back. Because I knew the specific use case we needed it for had a max of around 200 for n, and that I could run a brute force of all possible routes for that in less than a minute in unoptimised Python on my laptop, so that I could then cache all the results and just do a very fast lookup from a ~40,000 row database table in production. I was very very clear to everybody concerned that this solution was not a scalable general solution, but it worked like magic for what our customer needed.

[+] MontyCarloHall|5 years ago|reply

Couldn’t agree more. I’m saving this post to send to the next person who expresses disapproval that I ask computational complexity questions during my interviews.

[+] pezezin|5 years ago|reply

> When people realized this problem can't be solved by throwing money at it, the next suggestion was to artificially constrain n to be small enough that the program completes... It was at this point that I optimized the algorithm so it ran in O(n log n) or something similar. Something that we should have done from the start.

I had the exact same situation at my previous job. Heck, the CEO even sold it as an advantage of our system, highlighting how few data we needed to get good results. Except our results were crap.

I fixed that problem and many others, and left the job three years ago, but they are still selling their product the same way. Some times you really need a big amount of data to get good results.

[+] hoseja|5 years ago|reply

Well it's probably there since Windows 3.0 or so.

[+] olau|5 years ago|reply

There used to be a quadratic algorithm in character input in readline, the library that many common shells are using to parse commands. So if you copy-pasted a somewhat long list into something like Postgres or the Python shell, it would take a lot of time because it had to check a couple of things on each input, and that somehow involved checking the existing characters in the input (I've forgotten the details).

I think that's a really classical case of O(n^2). You have a linear algorithm (n), and then run it on each instance you insert (n) so n^2.

It bothered me for several years, until I eventually did some profiling and contacted the readline (and bash) maintainer who fixed it.

This is about 5 years ago.

There's also another readline-related problem in several shells that they don't discover if you resize the terminal window. I can't remember if I raised that too. It sometimes amazes me that very few people appear to ever look at or debug our common infrastructure.

[+] btilly|5 years ago|reply

There are a lot of such.

When Larry Wall wrote the map operation for Perl, he just took the grep operation, added the function call, and added a resize if it was needed. The result is that the common "turn a list into a hash with map" operation was quadratic because it did the resize on every input element. I fixed that. I later noticed that unshift did the same thing for a different reason and fixed that. And then I looked at Ruby's source code and saw all the same performance mistakes.

Exactly as the blog says, n^2 is fast enough to make it into production and slow enough to fall over once there. We focus on making it right, and only return to making it fast when there is need. And most of the time the cause is a hidden n^2 that we never thought about.

[+] jeffbee|5 years ago|reply

`java` used to have quadratic command line parsing, which nobody much noticed as long as unixes had fixed ARG_MAX, but it wasn't very long after Linux got arbitrary-length arguments that the first person discovered that java might take a week to parse a really big command line. I personally reported this defect and I don't know if they ever fixed it.

https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6728845

[+] MagerValp|5 years ago|reply

Is that related to checkwinsize sometimes becoming unset? I’ve never managed to find a pattern to when it happens.

[+] forgotmypw17|5 years ago|reply

>I don’t know how many people this bug affects (anyone with 200-300 icons is hitting a modest version of this, and it gets progressively worse with more) and I have no power to fix it. So, I filed a bug. I am not hopeful that it will be fixed. My last quadratic-in-Windows bug has had zero comments since it was filed a few months ago.

At some point I realized that there is a sweet spot when it comes to software largesse, and it's way below the most popular applications.

Using popular apps like Chrome, Firefox, Windows, Fedora, Debian, etc., I don't have a snowball's chance in a hot place at even getting my issue looked at... by anyone. I'm lucky if my bug report gets a WONTFIX -- after I spend an hour filling out some elaborate bug report form which makes a tax form look easy.

However, once I started using less popular software, browsers and GNU distros which only have hundreds or thousands of users, I get way more attention. I may even get as fortunate as a developer (!) holding my hand through an issue and fixing a bug on the spot, probably from just asking a question on IRC.

It's a luxury which not everyone can afford, requiring both having technical knowledge and being willing to get my hands dirty. But the quality of experience I get in the end is so much better and more stable than anything else I experienced in the past.

[+] orlp|5 years ago|reply

At least in Firefox this doesn't appear to be true, at least not always.

Recently I noticed that on my main PC, if I have uBlock Origin enabled, all network requests are delayed significantly. So I made a bug report on uBlock Origin's Github. They couldn't find the issue and they eventually asked if I could try it with another ad-blocking addon, and sure enough, the same bug. So the issue was with Firefox, not a specific addon.

So I made a performance profile and made a Firefox bug report. It took some time but they eventually figured that it was because I run a 240Hz monitor, which was causing some events not to fire until a timeout expires: https://bugzilla.mozilla.org/show_bug.cgi?id=1684703.

[+] fenomas|5 years ago|reply

> Using popular apps like Chrome, Firefox, Windows, Fedora, Debian, etc., I don't have a snowball's chance in a hot place at even getting my issue looked at..

Is this based on experience? In building a browser game engine, over the past few years I've filed ~3 chrome bugs and they all got fixed. Also the experience was frankly nicer than what I'm used to with github repositories - nobody on the chrome team closed my issue for departing from the bug report template! In fact they didn't even ask for repro files; basically as long as I described the issue clearly, they took it from there.

Granted the process took a looong time (months IIRC). But overall I found it very painless.

[+] userbinator|5 years ago|reply

I wonder which versions of Windows have this bug, since I don't think it was there when I was using 98SE with a nearly-full desktop (~400 icons) and hardware from around 2 decades ago (single core)...

WPA just looked at the debug information for all of the EXEs and PE files and downloaded symbol files from Microsoft’s symbol servers

I've always found it funny that MS would publicly distribute these, since they get you probably 90% of the way to actual source code from the perspective of understanding how Windows works.

...so I pulled out the shell32.pdb from some version of XP (not sure which one, might be SP1 or SP2) and it has no BatchPositionChangesHelper, CCollectionSink, CListViewHost, nor IconLayoutEngine. It does have CDesktopBrowser and SHDesktopMessageLoop. Also, "windows.storage.dll" definitely isn't present in XP, given that it's a WinRT thing.

In other words, I'm inclined to believe this bug was not there originally, but was introduced during one of their horrid shell rewrites that happened after XP --- someone who has easy access to the Vista or newer symbols can see when those functions were introduced.

I also bet it was partially due to the insane amounts of abstraction that seems to be the norm in code today, where algorithms can become accidentally quadratic or worse without being noticed ("it's just a loop so it must be linear"... forgetting that in that loop's body, there's another loop somewhere down in the long chain of function calls.)

...and speaking of source code, someone who isn't bothered by the legal aspects could probably get to the bottom of this bug: https://news.ycombinator.com/item?id=24586708

[+] kumarharsh|5 years ago|reply

It is stunning to see this brought in the open - and in such a clear and seemingly easily reproducible way. Kudos to the OP for digging into this as much as he did. In a world of 8 billion people, I don't think anyone would have dug so deep to reproduce this problem affecting a large portion of population. I'd love to see this fixed. Hope Microsoft pays attention and fixes this, rather than ignore it like they do all the Feedback Hub/Uservoice feedbacks.

[+] markdog12|5 years ago|reply

> Kudos to the OP for digging into this as much as he did

You should read the rest of his blog posts.

These are really good ones:

https://randomascii.wordpress.com/2017/07/09/24-core-cpu-and...

https://randomascii.wordpress.com/2019/10/20/63-cores-blocke...

[+] Cthulhu_|5 years ago|reply

You'd think Microsoft would have automated perf / regression tests running with scenarios for large amounts of files.

But then, there's so much variability with Windows that it's impossible to think of all scenarios to test for.

[+] Jyaif|5 years ago|reply

This post is honestly rocking my world.

I've always been telling people that having lots of icons on your desktop will maybe use a little bit more ram, but it won't slow your computer down because computers are super fast.

How many other assumptions of mine are false? Would I have faster browser experience if I deleted some bookmarks, or if I deleted my browsing history? Would my computer run more quickly if I have fewer folders on my disk? Would gmail be faster if I had hundreds of emails instead of tens of thousands of emails?

[+] tinco|5 years ago|reply

It is stunning that the problem was in such a highly visible location, but the idea that your operating system becomes slow after years of use has always been a thing. Reinstalling Windows was something I would do every couple of years, I even made money in highschool doing house calls fixing people's computers.

It was about an hour or two, most of which was spent drinking tea with a random family's dad. I'd back up their data, reinstall windows, replace IE with firefox and a decent free antivirus. The computer would go from being a nightmare to being a joy to use.

In that time there were definitely more dangers looming, they would accumulate search bars and other adwares and trojans and such. There wasn't much value to be won from PC's then so having a virus would usually not affect their lives much. It would be common to get an email from your provider that they'd disable your connection if you didn't remove the DDoS bot from your machines.

[+] londons_explore|5 years ago|reply

Windows has had problems with desktop icons causing slowness for years. In fact, back in Windows 95 it was notable how moving the entire contents of the desktop into a folder on the desktop really sped stuff up.

I always assumed it was the overhead of loading every icon (there is substantial overhead in loading icons - in many cases requiring many disk seeks per icon). There are also a lot of shell extensions that get involved in the process for thumbnailing documents (very expensive for some document types), and determining status for shortcuts and offline files (which might involve network requests).

Overall, icons on windows are just slow. Avoid them.

[+] tinyhitman|5 years ago|reply

Anecdotally in a certain Chrome version somewhere in the past years I've noticed something like this. The first 2 characters in my omnibox caused a terrible slowdown. So likely the history used for autocomplete kept growing or there was no sane index on the data.

Clearing my history fixed it to be _instantly_ again (even with history of a few days) and it took a few weeks to be very slow again.

So yes, I'd say its very likely that some of our assumptions might be false around such matters :)

luckily this was fixed sometime, but unfortunately this seems to coincide with autocomplete of Chrome not showing literally anything anymore.

[+] wisty|5 years ago|reply

I've noticed it started taking a long time to download files, they'd "download" to 100%, then stall. Moving all my "Downloads" folder into an "Old Downloads" folder fixed the problem. I suspect there was a quadratic algorithm at play.

[+] MarekKnapek|5 years ago|reply

Yes. At previous job where we were making a desktop Win32 application we experienced this. Our automatic pipeline is: Build server compiles all the .EXEs and .DLLs, creates an .EXE installer and sends it for testing. Test agent grabs the installer, installs the product and runs various test scenarios (pokes buttons, menu items, etc). One of those scenarios is HTML help (.CHM file) displayed inside Windows built-in help viewer that started to time-out. The solution was to go to Control Panel, Internet options, Clear Cookies. Also the product's check-for-updates-in-background feature on this machine was very slow because cookies. Both features are using Windows' (or IE's) high level HTTP library containing functions such as fetch-this-url-and-figure-out-TLS-and-redirects-automatigally.

[+] nikbackm|5 years ago|reply

> Would I have faster browser experience if I deleted some bookmarks, or if I deleted my browsing history?

In some ways, probably.

Using the "Forget about this site" command in Firefox will generally be quite slow and cause high CPU usage for ten seconds or more.

[+] saint-loup|5 years ago|reply

"Arranging Invisible Icons in Quadratic Time"

That title is pure poetry. The post could have ended with "Upon hearing this, the programmer was enlightened." :^)

(reference to https://simple.wikipedia.org/wiki/Hacker_koan)

[+] loa_in_|5 years ago|reply

Well there goes my standard tech support line:

"No, don't worry the icons don't lag your computer, just the programs that are currently running."

I do feel silly now.

[+] mijoharas|5 years ago|reply

so, I wonder if I'm suffering from selection bias because of these excellent articles I keep reading in randomascii.

I keep reading these articles about terrible quadratic complexity for things on windows, it gives me the impression that performance on windows can be pretty bad, and frankly, I don't have the best opinion of windows.

Is performance worse in general on windows? I haven't used it for a fair few years. Is it just generally a worse OS? I used to somewhat resent having to use windows when I needed to develop on it (developing for games consoles) but there were some nice parts (I have to say, visual studio is really nice as an integrated debugger, even if I prefer other editors).

I'm trying to check my biases. Is windows actually a fairly poor OS nowadays?

> If you are not on Windows – I’m very sorry.

The writer on the article definitely seems to think that it's a nice one, despite the constant issues they run into. Is this just because they've used the system so long, and have developed such expertise with it, or there are things to love that I'm missing about it?

[+] jiggawatts|5 years ago|reply

For most users, most of the time, Windows performance is just fine. This is especially true on modern 64-bit hardware with SSDs when using the latest Windows 10 builds.

In the past? It was a bit of a mixed bag. TCP performance and network latency in general was always worse on Windows than Linux, largely for silly reasons. Similarly, the 32-bit versions had a lot of fixed sized memory regions for things like file caching that were set in stone in the NT4 era and not changed until Windows Vista. Towards the end when all 32-bit machines had 4GB of memory, this made disk access unnecessarily slow.

There are also design space differences between Linux and Windows. On Linux a process is very lightweight, basically a fat function call. On Windows the equivalent need is typically met by RPC calls to an already running process. Threads are cheap on Windows and threading was very well supported since the 1990s. On Linux, not as well, even now. For example, even NT4 and async scatter/gather IO that SQL Server used back in v7. I believe that Linux is still struggling with this.

Conversely, Windows beats the pants off Linux in some areas, even now. Try using a Linux desktop with a flaky DNS server. Hooo-boy. Timeout city! On Windows this is almost unnoticeable, because it fails over to the secondary DNS server in milliseconds and caches all results as expected. Linux failover and caching seems to require a significant effort to set up by the users, for some mysterious reason.

Similarly, Windows SMB v3 over RDMA is crazy fast. If you have the right NIC, you can flatline a 40 Gbps pipe with a single file copy. In the Windows Explorer GUI! No special tools needed. Copy, paste, and next thing you know you're doing 5 gigabytes per second. That takes real effort to match in Linux.

Etc...

Basically, everyone is used to the warts on their preferred system, and has learned to work around them. But when exposed to something unfamiliar, all the little issues seem to be always in the way.

[+] brucedawson|5 years ago|reply

On a day-to-day basis the only times that Windows feels slow are: 1) Git operations. These run faster on Linux and I waste some measurable number of seconds/minutes each due to this slowdown, presumably due to slower file operations. 2) Start bar menus. I frequently right-click an icon on the task bar and even after they fixed the problem I reported in 2019 there is still a noticeable delay. If I then have to right-click on the application name to bring up the sub-menu then there is a secondary delay which is even longer. These delays happen even if I repeatedly right-click on the same icon. I find it bizarre that "waiting for a menu" is a thing in 2021. Blog post is here:

https://randomascii.wordpress.com/2019/09/08/taskbar-latency...

[+] jstanley|5 years ago|reply

> If you are on Windows then make sure you are using symbol servers. If you are not on Windows – I’m very sorry.

I thought this was a particularly bizarre take, given that in the very same post the author has lamented not having access to the source code of the program he is trying to debug.

You can keep your symbol servers, I'll take the source code please.

[+] hdjcktnr|5 years ago|reply

File system performance is arguably terrible on Windows, when working with thousands of files.

Stuff like "npm install", large git operations are many times slower than doing the same operation in a linux virtual machine running on that same Windows. Hard to tell if it's Windows itself, or maybe npm/git not using the "proper ways"

Other than that, Windows performance it just fine, and arguably smoother than a Linux desktop (no mouse freezing for example).

[+] phkahler|5 years ago|reply

It seems that edge also has an N^2 complexity rendering SVG files. I've been using SVGchart to render time series simulation data to SVG and viewing it in the browser. Putting out 4 traces sampled at 40KHz over 1 second produces a slow but tolerable rendering. 2 seconds of data kills it, while 0.5s is fairly snappy.

The issue at this particular threshold of time complexity is IMHO that you often need to create a new data structure with multiple fast algorithms (insert, and query) in order to reach O(n log n). Many people will punt at that point, questioning weather it's really worth the trouble.

[+] pletnes|5 years ago|reply

I got the «no desktop icons» advice in the early xp days and had no idea why it was so slow. Now I know, at last. I’m guessing this algorithm is at least from the win2k days.

[+] shreddit|5 years ago|reply

This is not surprising to me at all. At work, we had this wierd behavior where opening a “open file” dialogue took minutes! The solution was to delete dead shortcuts (target programs were removed) on the desktop… After that the dialogue would open again in seconds.

I still can’t see the connection though.

[+] kasabali|5 years ago|reply

Oh, that's an infamous one, too. Mark Russinovich did a live demo of troubleshooting it in one of his talks. ( I think it was "Case of the unexplained" 2013? https://channel9.msdn.com/events/TechEd/2013/WCA-B306 )

[+] goblin89|5 years ago|reply

I want to develop a habit for looking at my code through the lens of time complexity.

What are some resources (articles, public knowledge bases, books, courses, etc.) that could make one more fluent in this matter? Something that could help build up time complexity intuition and learn to assess it faster.

[+] brucedawson|5 years ago|reply

Watch for nested loops. That's pretty much it. Well that and be aware that if you have two nested loops then one of them had better be looping through log(n) the elements of the other or less.

The most common quadratic algorithms have two loops that both loop 'n' times, or the inner loop runs between 0 and n times. Both are quadratic - the second case is twice as fast, but still quadratic. In some cases the inner loop is not on 'n' at all but is on something that is typically proportional to 'n'. That's just as bad. That is, O(n*n/100) is still O(n^2).

[+] rbirkby|5 years ago|reply

Just be glad no-one thought of running Internet Explorer as the desktop background....oh wait....

[+] fn1|5 years ago|reply

Findings like this are great.

Performance-bugs cost thousands of kilowatts-hours.

Computing energy isn't the largest contributor to climate change, but still: Optimizing for energy-efficiency will get more and more important even for Microsoft.

[+] jdright|5 years ago|reply

> If you are not on Windows – I’m very sorry.

Do you mean OSX right? Last time I had an issue of this kind on Linux, I just got the source, built in debug and actually fixed the issue.

[+] brucedawson|5 years ago|reply

Sure, and that is a wonderful ability to have. But what would you do if you got a perf recording from a user who was running a different version of Linux? I was able to load this other person's ETW trace and analyze it with basically zero effort, and that (common, for me) case would be much harder on Linux.

Once I had a repro the Linux case would be arguably easier, but I couldn't get a repro until I could analyze the OP's trace.

[+] rini17|5 years ago|reply

IIRC there's an "invention" in windows95 to draw icons in folders/desktop from the bottom up, because to human perception it's faster than drawing from top to bottom.

Wouldn't surprise me if that code survived with this paradoxical result.

[+] crististm|5 years ago|reply

Yes. Some hard facts to knock that theory that today's computers are slow because they do a lot more stuff than 20 years ago.

No, they do a lot more of _nothing_. Something that would have been obvious and unacceptable on a 600MHz single core machine.

[+] jeffbee|5 years ago|reply

Maybe I just can't read it, but it looks to me like ~BatchPositionChangesHelper doesn't have any callees. Is the layout of this inclusive profile just confusing?

[+] brucedawson|5 years ago|reply

The layout can be quite confusing at first. It's well designed, I think, but takes a while to understand automatically.

One decision they made (visible at the top of the call stack also) is to not indent for children if there is only one child. So, ~BatchPositionChangesHelper calls FinishAndSaveBatchPositionChanges (always, in the samples) which then calls _SetRedraw, SetExtendedStyle, and FlushDataModelAndSave.

Not indenting in the single-child-case is good because it reduces the width needed for the call stack.

[+] bzb6|5 years ago|reply

>I don’t know how many people this bug affects (anyone with 200-300 icons is hitting a modest version of this, and it gets progressively worse with more) and I have no power to fix it. So, I filed a bug. I am not hopeful that it will be fixed. My last quadratic-in-Windows bug has had zero comments since it was filed a few months ago.

What’s the way to report a bug in Windows? The closest thing I know is the feedback hub which is an unmoderated public forum full of spam and dunces

111 comments