(no title)
cchianel | 1 year ago
A direct FFI call using JNI or the new foreign interface is fast, and has roughly the same speed as calling a Java method directly. Alas, the CPython and Java garbage collectors do not play nice, and require black magic in order to keep them in sync.
On the other hand, using proxies (such as in JPype or GraalPy) cause a significant performance overhead, since the parameters and return values need to be converted, and might cause additional FFI calls (in the other direction). The fun thing is if you pass a CPython object to Java, Java has a proxy to the CPython object. And if you pass that proxy back to CPython, a proxy to that proxy is created instead of unwrapping it. The result: JPype proxies are 1402% slower than calling CPython directly using FFI, and GraalPy proxies are 453% slower than calling CPython directly using FFI.
What I ultimately end up doing is translating CPython bytecode into Java bytecode, and generating Java data structures corresponding to the CPython classes used. As a result, I got a 100x speedup compared to using proxies. (Side note: if you are thinking about translating/reading CPython bytecode, don't; it is highly unstable, poorly documented, and its VM has several quirks that make it hard to map directly to other bytecodes).
For more details, you can see my blog post on the subject: https://timefold.ai/blog/java-vs-python-speed
LinXitoW|1 year ago
cchianel|1 year ago
- The Java side needs to store opaque Python pointers which may have no references on the CPython side.
- The CPython side need to store generated proxies for some Java objects (the result of constraint collectors, which are basically aggregations of a solution's data).
Solving runs a long time, typically at least a hour (although you can modify how long it runs for). If we don't free memory (by releasing the opaque Python Pointer return values), we will quickly run out of memory after a couple of minutes. The only way to free memory on the Java side is to close the arena holding the opaque Python pointer. However, when that arena is closed, its memory is zeroed out to prevent use-after-free. As a result, if CPython haven't garbage collected that pointer yet, it will cause a segmentation fault on the next CPython garbage collection cycle.
JPype (a CPython -> Java bridge) does dark magic to link the JVM's and CPython's garbage collector, but has performance issues when calling a CPython function inside a Java function, since its proxies have to do a lot of work. Even GraalPy, where Python is ran inside a JVM, has performance issues when Python calls Java code which calls Python code.
ignoramous|1 year ago
high_na_euv|1 year ago
Like, talk over some queue, file, http, etc
cchianel|1 year ago
Using IPC for all internal calls would probably take significant overhead; the user functions are typically small (think `lambda shift: shift.date in employee.unavailable_dates` or `lambda lesson: lesson.teacher`). Depending on how many constraints you have and how complicated your domain model is, there could be potentially hundreds of context switches for a single score calculation. It might be worth prototyping though.