top | item 44632571

(no title)

ibotty | 7 months ago

One problem is that you can't filter its "syscalls" as you can regular syscalls. This removes a security boundary that e.g. container runtimes regularly use. So you cannot use it in your regular kubernetes cluster without weakening its security for these pods.

discuss

holowoodman|7 months ago

This just reinforces the (maybe unfounded) impression that security is a secondary consideration, and performance is primary.

I'd use io_uring in a heartbeat on a dedicated system where the job is only I/O and security isolation isn't a concern. But multiuser/multiapplication/networked? Not a chance.

weitendorf|7 months ago

I think there is a very large amount of overlap between the people who

1. know what io_uring is

2. are interested in performance enough to look at improvements based on new linux kernel system calls and talk about it in public

3. care about security in multitenant environments or the syscalls used by third party libraries

I think io_uring right now probably makes a lot of sense for HPC and highly technical, performance-sensitive financial stuff, but they can be kind of insular. I don't think most linux hobbyists really need the performance benefits enough t care about it, and most businesses are using a major cloud vendor/don't have the scale or expertise to be thinking about this kind of stuff. Which leaves major cloud providers and really big businesses like Meta with their own internal clouds as the ones that stand to benefit enough to care about performance while really caring about security

Asmod4n|7 months ago

There should be no issue with disabling it altogether by banning its setup and usage syscalls.

skissane|7 months ago

Isn't the issue here just that io_uring needs to be enhanced such that, when a seccomp-bpf filter is installed, the filter gets called to approve each SQE, before it gets executed?

Someone|7 months ago

That can be done, but reading https://lwn.net/Articles/902466/, writers of security tools are unhappy that:

- io_uring initially was conceived without considering security or auditing tools

- io_uring later was changed to allow ioctl calls, even though security people do not like ioctl because what its arguments mean depends on the device being called (possibly even on the version of the driver), not on the type of device, and often is poorly documented, making it hard for a security filter to decide what to do with a command.

That also made them fear that similar security-breaking changes might be made in the future.

tsimionescu|7 months ago

I don't think this is an appropriate use of "just". If io_uring doesn't work with seccomp-bpf filters today, there are many situations where you just can't use it, period.

That someone with kernel IO dev experience may be able to relatively easily add such a fetaure in the future (though I would doubt that, given that it hasn't yet been implemented apparently) doesn't make it a small problem.

coppsilgold|7 months ago

I believe you can deny io_uring altogether with the syscalls io_uring_enter, io_uring_register, io_uring_setup?

This would be useful if you want to boot with io_uring but deny it for some sensitive workloads.

spwa4|7 months ago

What regular filter for syscalls do you use?

holowoodman|7 months ago

seccomp BPF, eBPF, in a way SELinux/AppArmor/Tomoyo/..., maybe you can even call namespaces some kind of syscall filter. And then there is the auditing framework, where you can at least record which critical syscalls were performed.

Nowadays its mostly a combination of eBPF, SELinux and auditd plus namespaces in case of containers. Usually in the combination that some distro ships, so nothing really fancy.

lima|7 months ago

seccomp-bpf, for instance.

altairprime|7 months ago

Is that a true limitation that cannot be overcome? Are solutions possible and/or available, but require further work to be shipped?