Fun fact: in Java versions 5 and 6, it was actually possible to write valid Java source code with overloaded return types!
The trick is that generic types in Java are subject to type erasure at runtime. Due to an oversight, it was possible to declare a class with methods like:
String foo(List<X> l);
double foo(List<Y> l);
which would be erased to:
String foo(List l);
double foo(List l);
At runtime, even though the type information for your List was no longer available, the compiler would be able to locate the correct method using the method signature stored in the caller's bytecode, giving the appearance of return-type dispatch. Technically this violates the Java language spec, and javac 7 was updated to be stricter and prevent this sort of code: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6182950
I guess it's ultimately not that mysterious, but I first encountered it "in the wild", and ended up scratching my head for a while before I figured out why our code suddenly stopped compiling when we upgraded the JDK.
tl;dr Java needs to be able to decide which overloaded method implementation to use at compile time, meaning you can't differentiate between method implementations based on return type alone. However, when compiled to bytecode, each method and method call includes the return type as part of the method identification, so Java bytecode is actually capable of differentiating between implementations with the same name based on return type alone. This fact is used by bytecode obfuscators, and can lead to bugs in decompiled code if the decompiler doesn't account for it.
> Java needs to be able to decide which overloaded method implementation to use at compile time, meaning you can't differentiate between method implementations based on return type alone.
That's not correct. Rust has no issue statically dispatching based on the return type.
Java-the-language does not allow it so it does not have to deal with calls which don't use the return value e.g.
int getSomething()
String getSomething()
If the return value is not used, you have to explicitly disambiguate this call somehow, and Java provides no way to do so.
I was surprised to learn that in Kotlin, it is possible to disambiguate overloaded functions based only on their return type. I had no idea the JVM even supports such semantics. [1]
Not only is this possible, as stated by the article, this level of overloading is incredibly useful for in a number of areas and has been used for a long time.
One use case is API backwards compatibility. If your API wants to change the return type of a function, say from int to double, but also wants to maintain binary backwards compatibility, you can do that. See OverMapped [ https://github.com/Wolvereness/OverMapped ].
Obfuscation is another area and ProGuard employs this to make decompliling more difficult iirc.
This is about Dalvik bytecode format, but the same applies to standard Java bytebode files. Practically any obfuscated Java code will have this, which makes reverse engineering much more difficult without the tools to handle it.
The same thing exists in .NET IL where you can overload methods based only on return values (among other interesting things like modopt/modreq [0] etc.).
That was my first thought, that this is being presented as a fundamental tenant of compilers and really it's a design choice that a whole family of languages, some of them that people even use, didnt make.
A complaint against Ada is that it is hard for the compiler to figure out the right overloading in a complex statement with many nested function calls. My compilers prof took the time to show how to do it, and why it doesn't take much time.
Too bad C's automatic conversions prevent this from being used in C++.
It seems like this is simply a disassembler error (albeit an understandable one). Am I missing something?
Edit: Based on the responses below, I guess the point is that the disassembler can't generate Java code that will "naively" (wrong word, but I can't think of a better one) generate the same output. Notable (I assume) in that name munging would be problematic outside the current compilation unit.
No, Java class files usually include the names of variables and functions, so this isn't the disassembler's fault. The class file actually had two functions with the same name. You could certainly implement an anti-obfuscation layer to detect stuff like this, but I wouldn't call it an "error" as is.
What did the disassembler do wrong? It just happens that there is no valid Java code which could produce that (valid) bytecode. What should it have outputted instead?
In basic blocks (no conditionals or loops), a disassembler mostly does a mechanical job translating opcodes to the appropriate Java code and aggregating expressions.
But it doesn't rename functions and or classes. They are left as were in the bytecode.
No, it's a mismatch between what the compiler accepts and what the JVM can execute.
The interesting part to me is that it seems that Java would be perfectly capable of differentiating between methods by return type if the compiler was tweaked slightly. Is there a reason why this isn't a formal language feature?
It should be noted that the JVM allows for things that javac does not.
This is one of those things, and describes it accurately, albeit in the android java environment, however it applies to the standard JVM too.
One of the more fun things I've done is use the ASM and BCEL libraries to make (and unmake) these kinds of manipulations (manipulations that javac won't let you do.)
I believe the technical term for the correspondence between what it is possible to compile and what is valid bytecode/machine code is fully abstract compilation. It's an interesting concept with many interesting implications (e.g. for security). In the past at least there were various examples of Java programs that were illegal in the language but nonetheless could be created directly as bytecode and would be loaded by the JVM. This obviously becomes a security problem if your program loads bytecode dynamically and makes assumptions about its capabilities at the language level as opposed to the bytecode level.
If you load untrusted code dynamically, it strikes me as wrong to assume anything about its capabilities. Even more so "at the language level". Untrusted code can do anything unless you sandbox it.
This is a well known problem in the field of decompilers and disassemblers. It figures that the pseudo-code it outputs for generic compilers is pretty good, but when encountered with a man-made assembly or byte-code, they go places.
Another oddity to think about in java source code: In regular Java, all objects extend java.lang.Object, including java.lang.Class. So how do you bootstrap building java.lang.Object from source?
[+] [-] teraflop|9 years ago|reply
The trick is that generic types in Java are subject to type erasure at runtime. Due to an oversight, it was possible to declare a class with methods like:
which would be erased to: At runtime, even though the type information for your List was no longer available, the compiler would be able to locate the correct method using the method signature stored in the caller's bytecode, giving the appearance of return-type dispatch. Technically this violates the Java language spec, and javac 7 was updated to be stricter and prevent this sort of code: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6182950I guess it's ultimately not that mysterious, but I first encountered it "in the wild", and ended up scratching my head for a while before I figured out why our code suddenly stopped compiling when we upgraded the JDK.
[+] [-] masklinn|9 years ago|reply
It don't know that it was an appearance, my understanding is it is return-type dispatch which is supported by the JVM (but not java).
[+] [-] jxi|9 years ago|reply
[+] [-] nebulous1|9 years ago|reply
[+] [-] masklinn|9 years ago|reply
That's not correct. Rust has no issue statically dispatching based on the return type.
Java-the-language does not allow it so it does not have to deal with calls which don't use the return value e.g.
If the return value is not used, you have to explicitly disambiguate this call somehow, and Java provides no way to do so.[+] [-] bmc7505|9 years ago|reply
[1]: http://stackoverflow.com/q/42916801/1772342
[+] [-] mbel|9 years ago|reply
[+] [-] mooman219|9 years ago|reply
One use case is API backwards compatibility. If your API wants to change the return type of a function, say from int to double, but also wants to maintain binary backwards compatibility, you can do that. See OverMapped [ https://github.com/Wolvereness/OverMapped ].
Obfuscation is another area and ProGuard employs this to make decompliling more difficult iirc.
[+] [-] matharmin|9 years ago|reply
[+] [-] HighlandSpring|9 years ago|reply
[+] [-] Animux|9 years ago|reply
[+] [-] DorothySim|9 years ago|reply
[0]: http://stackoverflow.com/a/5294456
[+] [-] Gaelan|9 years ago|reply
GHC would beg to differ.
[+] [-] masklinn|9 years ago|reply
[+] [-] mattnewton|9 years ago|reply
[+] [-] cjensen|9 years ago|reply
A complaint against Ada is that it is hard for the compiler to figure out the right overloading in a complex statement with many nested function calls. My compilers prof took the time to show how to do it, and why it doesn't take much time.
Too bad C's automatic conversions prevent this from being used in C++.
[+] [-] mnarayan01|9 years ago|reply
Edit: Based on the responses below, I guess the point is that the disassembler can't generate Java code that will "naively" (wrong word, but I can't think of a better one) generate the same output. Notable (I assume) in that name munging would be problematic outside the current compilation unit.
[+] [-] burkaman|9 years ago|reply
[+] [-] shawnz|9 years ago|reply
[+] [-] barahilia|9 years ago|reply
[+] [-] jandrese|9 years ago|reply
The interesting part to me is that it seems that Java would be perfectly capable of differentiating between methods by return type if the compiler was tweaked slightly. Is there a reason why this isn't a formal language feature?
[+] [-] cremp|9 years ago|reply
One of the more fun things I've done is use the ASM and BCEL libraries to make (and unmake) these kinds of manipulations (manipulations that javac won't let you do.)
[+] [-] anonymousDan|9 years ago|reply
[+] [-] drdrey|9 years ago|reply
[+] [-] tenkeyless|9 years ago|reply
[+] [-] _old_dude_|9 years ago|reply
[+] [-] kuschku|9 years ago|reply
[+] [-] 0x0|9 years ago|reply
[+] [-] pjmlp|9 years ago|reply
This is part of the bootstraping process of a programming language.
Usually such special types are built manually in the compiler data structures, or make use of special primitives, like native methods on Java's case.
[+] [-] seanmcdirmid|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] reitanqild|9 years ago|reply
Author admits this is not valid Java. It is not even compilable.
If I read correctly it is just artifacts from a partially sucessful decompile.
Interesting and this discussion is interesting but this is not and have never been valid Java.
[+] [-] ipsum2|9 years ago|reply
[+] [-] sctb|9 years ago|reply
[+] [-] usmannk|9 years ago|reply
[+] [-] Tideflat|9 years ago|reply