(no title)
aray | 8 years ago
Some issues off the top of my head (that I ran into): VDSO censoring is a lot harder than just symbol overriding, it has to actually be removed from the aux vector (third thing on the process stack when the process launches after arguments and environment variables. The EHDR entry is what you need to remove.
Gist for censoring EHDR: https://gist.github.com/machinaut/a08b581c921775263cf0e20ccc...
Some libc's (notably glibc) are really good at finding/using EHDR even if you do that symbol overriding, so dumping EHDR is the most assured way of making sure it's gone.
ptrace overhead is HUGE -- because you're debugging a userspace program with another program every time call now results in 4 context switches (to/from your debugging program at every time call entry/exit), even pinning both to the same CPU this is not fast.
This is where my least favorite part of the linux kernel comes in handy: SECCOMP-BPF. Instead of firing _every_ syscall, you can write a syscall packet filter rules list that only matches certain time-based syscalls with certain arguments. This greatly improves the performance (but for me, still not fast enough to play video games live).
At the end of the day I ended up reviving a >10 year old patch someone sent to the linux kernel to add these parameters (time offset and time warp) to thread structs and do the warping in the kernel (much faster -- dont pay the context overhead, etc). Sadly even this didn't work because our end application needed to run on multiple clouds in docker, and we'd need to have access to the host kernel to do these operations.
I'd like to have an affine time warp as part of the cgroups, and then maybe extend it through runc so anyone can run time-warped docker containers, but maybe that's wishful thinking.
Overall I think this is great work, and super happy you posted it. I'd love to chat about it sometime.
(P.S. most ironic to me was my version of this was called 'timelord' :)
AstralStorm|8 years ago
Having a clock cgroup would be easier and more useful than you'd think. Also, you can play tricks like ntpd does in a container. (e.g. adjtime)
aray|8 years ago
Ironically, because the folks working on containers/VMs are _really_ good at what they do, time access calls in particular have been really optimized (they get called a lot). This makes it very hard to intercept time calls at this layer! e.g. KVM and LXC both essentially hand time calls straight to the host.
This means time intercepts at the VM/container layer need fundamental support (I mentioned affine time transformation in the linux kernel in another comment) which doesn't work for people who need to deploy on current hosted container.
MayeulC|8 years ago
A proposal should be made, if that's not part of the planned features.