top | item 44306438

(no title)

foxylad | 8 months ago

You seem knowledgeable in the domain, so I hope you don't mind me picking your brains about the two things stopping me switching to Wayland.

1. Autokey. I use this to expand abbreviations for long words and phrases I use often. This relies on being able to insert itself between the keyboard and all userland apps, and this is apparently impossible under Wayland.

2. SimpleScreenRecorder. This relies on being able to access the window contents for all userland apps, and again this is apprently impossible.

Would I be right in thinking that both trip over because Wayland enforces application privacy, so preventing applications accessing other applications resources? And if so, why isn't there a "user root" level for apps that do need to interact with other apps?

discuss

order

mappu|8 months ago

The first google result for 'autokey wayland' is someone recommending https://espanso.org/ , that looks like it has good Wayland support. And you only need look at OBS to see screen video capture is perfectly possible on Wayland.

Who is saying those are impossible use cases? I think your two apps have just not updated, that happens often with software.

rcxdude|8 months ago

It's not impossible, but it requires extensions to the protocol. Historically the main headache is that support across different compositors was inconsistent, and may not even use the same extension, which means a developer looking make such a tool would need to in practice need to implement many different interfaces to make it work, which tended to mean it didn't happen (I think most devs kinda get as far as looking at the interfaces, seeing a bunch of drama between different compositors about what the correct way to do it is and whether it should even be done at all (GNOME), then decide to go for a nice walk instead). I think screen recording is now reasonably well supported and standard, I don't know about input interception and simulation.

jcgl|8 months ago

There are multiple screen recorders that work under Wayland. I use Spectacle, which comes with KDE Plasma. Works well for me.

jchw|8 months ago

The quick answer is that Wayland, while it has certain design provisions for privacy and security, doesn't really enforce anything on its own, it's just a set of protocols that applications can use to talk to a display server. The display server is free to do whatever it wants. Unfortunately this is poorly understood due to it being generally poorly explained.

I'll start with screen capture because it is easier. This one can be done on basically any Wayland compositor by using desktop portals + Pipewire, with the downside that applications must ask permission before they can capture the screen or a window. On KDE, XWayland apps can also capture the screen or a window, but it will also require a permission prompt. On some wlroots-based compositors, there are special protocols that allow Wayland clients to see other Wayland top-level windows and capture their contents directly without any permission prompts; for example, with OBS you can use the wlrobs plugin.

In fact, screen capture in OBS will be more efficient than it was in X11, as it will work by passing window buffers through dma-bufs, allowing zero-copy, just like gpu-screen-recorder on X11. OBS is a bit overkill for screen recording, but I still recommend it as it's a very versatile tool that I think most people wouldn't regret having around.

Now for Autokey. This one is possible to do, but I'm not 100% sure what the options are yet. Programmatically typing is certainly possible in a variety of ways; wlroots provides virtual input protocols, and there are other provisions for accessibility tools and etc. However it seems right now the main approach people have taken is to use something like ydotool which uses uinput to inject input events at the kernel level. It's a bit overkill but it definitely will bypass any security in your Wayland compositor :)

The more proper way to support this sort of use case would actually be by interjecting yourself into the input method somewhere. I don't know if anyone has done this, but someone DID try another route, which is to implement it on top of accessibility technology. I haven't tried it so YMMV, but I believe this is relatively close to what you are looking for.

https://snippetpixie.com/

Though, it has the caveat that it only works with applications that properly support accessibility technology (I would hope most of them...)

> why isn't there a "user root" level for apps that do need to interact with other apps?

Truth told, Wayland being inherently capabilities based, all of that could be implemented. Wlroots implements all of the protocols you'd imagine would be in that group, but it's just passively available to all applications by default. (I think they may support lower privilege applications now, too; there's protocols that convey security context and etc. for sandboxed apps.) The wlroots protocols are very useful, so they're also being implemented elsewhere.

Listing top-level windows: https://wayland.app/protocols/ext-foreign-toplevel-list-v1

Implementing docks and other desktop UI widgets: https://wayland.app/protocols/wlr-layer-shell-unstable-v1

If you wanted you could grant just as much capabilities to a Wayland app as before, and give apps the ability to interpose themselves between everything else, get and set absolute window positions, etc. it's all up to the compositors. Personally I think over time we'll see more provisions for this since it is useful even if it's not needed 99% of the time. Just don't expect too much stuff to work well on GNOME, they can be... challenging.

DrillShopper|8 months ago

> The quick answer is that Wayland, while it has certain design provisions for privacy and security, doesn't really enforce anything on its own, it's just a set of protocols that applications can use to talk to a display server. The display server is free to do whatever it wants.

I love having to detect at runtime the compositor I'm using (and its version) and have bespoke code paths to work around their various bugs and omissions.

Definitely a recipe for reliable, usable, maintainable software.

> Unfortunately this is poorly understood due to it being generally poorly explained.

This reads like a "missing missing reason"[1]. People do understand it, and they explain why it's a dealbreaker. Wayland has had a decade and a half to grow some consensus and make these very basic things that work under X11 work. Instead of doing that, they're now relying on the main distributions just giving up on (the hardware) X11 servers instead of fixing this. I don't care if only one or two compositors that I don't run support the one thing that I need. That doesn't help me because those compositors don't implement other functionality I need. Having a stable, agreed upon, universally consistently implemented base of functionality that application developers and toolkits can rely on is a good thing.

This is a complete clusterfuck, and that's why there's user feedback. Trying to frame it as "people just don't understand" isn't productive. They do, and their criticisms have some validity. It's up to the Wayland devs to see if they care, and historically, they haven't.

[1] https://www.issendai.com/psychology/estrangement/missing-mis...