Troubleshooting: Terminal Lag

vijucat|1 year ago

Love such articles where I learn something new. cdb is completely new to me. It's apparently the Microsoft Console Debugger. For others like me who were wondering how `eb win32u!NtUserSetLayeredWindowAttributes c3` neutered the window animation:

"By executing this command, you are effectively replacing the first byte of the `NtUserSetLayeredWindowAttributes` function with a `ret` instruction. This means that any call to `NtUserSetLayeredWindowAttributes` will immediately return without executing any of its original code. This can be used to bypass or disable the functionality of this function"

(Thanks to GitHub Copilot for that)

Also see https://learn.microsoft.com/en-us/windows-hardware/drivers/d...

xelxebar|1 year ago

Nice. Here's a breakdown for anyone interested:

- eb[0] "enters bytes" into memory at the specified location;

- The RETN[1] instruction is encoded as C3 in x86 opcodes; and

- Debuggers will typically load ELF symbols so you can refer to memory locations with their names, i.e. function names refer to their jump target.

Putting those three together, we almost get the author's command. I'm not sure about the "win32u!NtUser" name prefix, though. Is it name-munging performed on the compiler side? Maybe some debugger syntax thrown in to select the dll source of the name?

[0]:https://learn.microsoft.com/en-us/windows-hardware/drivers/d...

[1]:http://ref.x86asm.net/geek64.html#xC3

txdv|1 year ago

So the root cause of the slowness was not found, it was just circumvented by keeping 3 xterms open and just using hiding/showing them?

imp0cat|1 year ago

But that does not make his solution any less valid. Or does it?

In fact, keeping something preloaded and ready to go is quite common, these two examples are off the top of my head:

- The Emacs server way - https://ungleich.ch/u/blog/emacs-server-the-smart-way/

- SSH connection reuse.

aardshark|1 year ago

300ms for startup still sounds slow to me. Not ridiculously so, but it won't give that snappy feeling.

jlarocco|1 year ago

I thought so, too. I'm not interested enough to benchmark it, but for all practical purposes it's instantaneous on my machine. As fast to open a new terminal as it is to switch to the existing one.

Szpadel|1 year ago

interesting side note, our brain is compensating for delay, it can do it to around 250ms

so if anything lags up to that amount our brain will compensate and make it feel imstantenious

there was interesting experiment that I reproduced at university, create app that slowly build up delay to clicks to allow brain to adapt, and then remove it completely. result is that you have feeling that it reacts just before you actually click until brain adapts again to new timing

pcthrowaway|1 year ago

> I’ve been using this configuration for a few days, so far it’s working great. I haven’t noticed any issues running it this way.

The journey was very useful, even the destination may be pretty specific to your needs. The process of how to go about debugging minor annoyances like this is really hard to learn about.

zokier|1 year ago

Just for fun I did film some video footage from my 60Hz monitor to see how quickly my terminal starts up. Seems like 2-3 frames to show up the terminal window, and 1-2 frames to show shell prompt. So 50 ms - 83 ms. This is with foot terminal on Sway.

My very unscientific methodology was to run

    $ echo hello && foot

in a terminal and measure the time between the hello text appearing and the new window appearing. Looking at my video, the time from physical key press to "hello" text appearing might be 20ish ms but that is less clear, so about 100 ms total from key press to shell prompt.

This is pretty much completely untuned setup, I haven't done any tweaks to improve the figures. Enabling foot server might shave some milliseconds, but tbh I don't feel that's necessary.

It'd be fun to do this with better camera, and also with better monitors. Idk how difficult it would be to mod in some LED to keyboard to capture the exact moment the key is activated, just trying to eyeball the key movement is not very precise.

SushiHippie|1 year ago

Wouldn't it make more sense to screen record with wf-recorder instead of a video camera?

mberning|1 year ago

This is a tour de force on the type of curiosity it takes to be really successful with computers.

beachy|1 year ago

I'm at the tail end of my career, so working on efficiency gains like this doesn't usually add up for me.

However I was interested in knowing whether it does for the author.

Assuming he/she does suffer this 1300 ms delay "hundreds" of times a day (let's say 200) and for the sake of argument they use their computer 300 days a year and have 20 years of such work ahead of them with this config, then this inefficiency will total 1300 x 200 x 300 x 20 / 1000 / 60 / 60 hours wasted during author's lifetime - some 430 hours.

So well worth the effort to fix!

anonymoushn|1 year ago

I find that annoyances cost me much more than the wall-clock time of the delay. You're lucky to disagree :)

sllabres|1 year ago

I had a printout of [1] at my office. Of course at is base it is only a simple multiplication table, but nevertheless is reminded me several time that a issue is worth fixing.

[1] https://xkcd.com/1205/

jd3|1 year ago

I'm so distracted by latency that I run my macOS with vsync disabled 24/7 (through Quartz Debug).

When I used to use Windows 10+ years ago, I had decent luck using xming + cygwin + Cygwin/X + bblean to run xterm in a minimal latency/lag environment.

I also launch Chrome/Spotify/Slack desktop using:

$ open -a open -a Google\ Chrome --args --disable-gpu-vsync --disable-smooth-scrolling

abhinavk|1 year ago

One way to have the cake and eat it too is to upgrade to a high-refresh rate display. No tearing + less latency + smoother display. Although it's diminishing returns even 60Hz -> 144Hz+ will make a lot of difference. On a 240Hz display, vsync penalty is just 4ms.

Also if you are using a miniLED M-class MBP, its pixel response is abysmal.

apazzolini|1 year ago

Out of curiosity, have you tried a 144hz monitor on macOS with vsync enabled?

Borg3|1 year ago

Very nice article, I love such debugging. I sometimes do it myself too.

Anyway, this also made me think about general bloat we have in new OSes and programs. Im still on old OS running spinning rust and bash here starts instantly when cache is hot. I think GUI designers lost an engineer touch...

aardshark|1 year ago

300ms for startup still sounds slow to me. Not ridiculously so, but it won't give that snappy feeling.

tonymet|1 year ago

We need a community of those obsessed with responsive applications. UI latency irks me on every device. Not only computers and smart phones, but now TVs, refrigerators, cars all have atrocious UI latency.

Great debugging work to come up with a solution!

pikseladam|1 year ago

it was a fun read

aftbit|1 year ago

Upvote just for teaching me about the existence of `hyperfine`.

    $ hyperfine 'alacritty -e true'
    Benchmark 1: alacritty -e true
      Time (mean ± σ):      84.1 ms ±   4.9 ms    [User: 40.1 ms, System: 30.8 ms]
      Range (min … max):    80.5 ms … 104.4 ms    32 runs
    
    $ hyperfine 'xterm -e true'
    Benchmark 1: xterm -e true
      Time (mean ± σ):      81.9 ms ±   2.6 ms    [User: 21.7 ms, System: 7.9 ms]
      Range (min … max):    74.9 ms …  87.1 ms    37 runs
    
    $ hyperfine 'wezterm -e true'
    Benchmark 1: wezterm -e true
      Time (mean ± σ):     211.7 ms ±  13.4 ms    [User: 41.4 ms, System: 60.0 ms]
      Range (min … max):   190.5 ms … 240.5 ms    15 runs

JNRowe|1 year ago

If we're handing out tips, then as noted in a few examples from the article hyperfine is even more useful when called with multiple commands directly. It presents a concise epilogue with the information you're probably trying to gleam from a run such as yours:

    $ hyperfine -L arg '1,2,3' 'sleep {arg}'
    …
    Summary
      sleep 1 ran
        2.00 ± 0.00 times faster than sleep 2
        3.00 ± 0.00 times faster than sleep 3

If your commands don't share enough in common for that approach then you can declare them individually, as in "hyperfine 'blib 1' 'blob x y' 'blub --arg'", and still get the summary.

cryptonector|1 year ago

Besides learning about `hyperfine`, the combination of `xargs` to keep N warm processes ready, `LD_PRELOAD` to trick them into waiting to map their windows, and `pkill --oldest ...` to get one of those to go is quite neat.

But I have a very different solution to this problem: have just one terminal window and use and abuse `tmux`. I only use new windows (or tabs, if the terminal app has those) to run `ssh` to targets where I use `tmux`. I even nest `tmux` sessions, so essentially I've two levels of `tmux` sessions, and I title each window in the top-level session to match the name of the session running in that window -- this helps me find things very quickly. I also title windows running `vi` after the `basename` of the file being edited. Add in a simple PID-to-tmux window resolver script, scripts for utilities like `cscope` to open new windows, and this gets very comfortable, and it's fast. I even have a script that launches this whole setup should I need to reboot. Opening a new `tmux` window is very snappy!

pimlottc|1 year ago

Handy project link:

https://github.com/sharkdp/hyperfine

Shugyousha|1 year ago

I also didn't know about `hyperfine`, very nice!

Even 80ms seems unnecessarily slow to me. 300ms would drive me nuts ...

I'm using a tiling window manager (dwm) and interestingly the spawning time varies depending on the position that the terminal window has to be rendered to.

The fastest startup time I get on the fullscreen tiling mode.

   hyperfine 'st -e true'
   Benchmark 1: st -e true
     Time (mean ± σ):      35.7 ms ±  10.0 ms    [User: 15.4 ms, System: 4.8 ms]
     Range (min … max):    17.2 ms …  78.7 ms    123 runs

The non-fullscreen one ends up at about 60ms which still seems reasonable.

oguz-ismail|1 year ago

Does it parse commands and call exec*() or spawn a new shell for every run of every command?

unknown|1 year ago

[deleted]

55 comments