top | item 43033064

(no title)

cchianel | 1 year ago

I had to deal with a lot of FFI to enable a Java Constraint Solver (Timefold) to call functions defined in CPython. In my experience, most of the performance problems from FFI come from using proxies to communicate between the host and foreign language.

A direct FFI call using JNI or the new foreign interface is fast, and has roughly the same speed as calling a Java method directly. Alas, the CPython and Java garbage collectors do not play nice, and require black magic in order to keep them in sync.

On the other hand, using proxies (such as in JPype or GraalPy) cause a significant performance overhead, since the parameters and return values need to be converted, and might cause additional FFI calls (in the other direction). The fun thing is if you pass a CPython object to Java, Java has a proxy to the CPython object. And if you pass that proxy back to CPython, a proxy to that proxy is created instead of unwrapping it. The result: JPype proxies are 1402% slower than calling CPython directly using FFI, and GraalPy proxies are 453% slower than calling CPython directly using FFI.

What I ultimately end up doing is translating CPython bytecode into Java bytecode, and generating Java data structures corresponding to the CPython classes used. As a result, I got a 100x speedup compared to using proxies. (Side note: if you are thinking about translating/reading CPython bytecode, don't; it is highly unstable, poorly documented, and its VM has several quirks that make it hard to map directly to other bytecodes).

For more details, you can see my blog post on the subject: https://timefold.ai/blog/java-vs-python-speed

discuss

LinXitoW|1 year ago

Speaking from zero experience, the FFI stories of both Python and Java to C seems much better. Wouldn't going connecting them via a little C bridge a general solution?

cchianel|1 year ago

JNI/the new Foreign FFI communicate with CPython via CPython's C API. The primary issue is getting the garbage collectors to work with each other. The Java solver works by repeatedly calling user defined functions when calculating the score. As a result:

- The Java side needs to store opaque Python pointers which may have no references on the CPython side.

- The CPython side need to store generated proxies for some Java objects (the result of constraint collectors, which are basically aggregations of a solution's data).

Solving runs a long time, typically at least a hour (although you can modify how long it runs for). If we don't free memory (by releasing the opaque Python Pointer return values), we will quickly run out of memory after a couple of minutes. The only way to free memory on the Java side is to close the arena holding the opaque Python pointer. However, when that arena is closed, its memory is zeroed out to prevent use-after-free. As a result, if CPython haven't garbage collected that pointer yet, it will cause a segmentation fault on the next CPython garbage collection cycle.

JPype (a CPython -> Java bridge) does dark magic to link the JVM's and CPython's garbage collector, but has performance issues when calling a CPython function inside a Java function, since its proxies have to do a lot of work. Even GraalPy, where Python is ran inside a JVM, has performance issues when Python calls Java code which calls Python code.

ignoramous|1 year ago

Also see: cgo is not Go

  Go code and C code have to agree on how resources like address space, signal handlers, and thread TLS slots are to be shared — and when I say agree, I actually mean Go has to work around the C code's assumption. C code that can assume it always runs on one thread, or blithely be unprepared to work in a multi threaded environment at all.

  C doesn't know anything about Go's calling convention or growable stacks, so a call down to C code must record all the details of the goroutine stack, switch to the C stack, and run C code which has no knowledge of how it was invoked, or the larger Go runtime in charge of the program.

  It doesn't matter which language you’re writing bindings or wrapping C code with; Python, Java with JNI, some language using libFFI, or Go via cgo; it is C's world, you're just living in it.

https://dave.cheney.net/2016/01/18/cgo-is-not-go / https://archive.vn/GZoMK

high_na_euv|1 year ago

How IPC methods would fit such cases?

Like, talk over some queue, file, http, etc

cchianel|1 year ago

IPC methods were actually used when constructing the foreign API prototype, since if you do not use JPype, the JVM must be launched in its own process. The IPC methods were used on the API level, with the JVM starting its own CPython interpreter, with CPython and Java using `cloudpickle` to send each other functions/objects.

Using IPC for all internal calls would probably take significant overhead; the user functions are typically small (think `lambda shift: shift.date in employee.unavailable_dates` or `lambda lesson: lesson.teacher`). Depending on how many constraints you have and how complicated your domain model is, there could be potentially hundreds of context switches for a single score calculation. It might be worth prototyping though.