top | item 13976533

Impossible Java

294 points| barahilia | 9 years ago |barahilia.github.io | reply

100 comments

order
[+] teraflop|9 years ago|reply
Fun fact: in Java versions 5 and 6, it was actually possible to write valid Java source code with overloaded return types!

The trick is that generic types in Java are subject to type erasure at runtime. Due to an oversight, it was possible to declare a class with methods like:

    String foo(List<X> l);
    double foo(List<Y> l);
which would be erased to:

    String foo(List l);
    double foo(List l);
At runtime, even though the type information for your List was no longer available, the compiler would be able to locate the correct method using the method signature stored in the caller's bytecode, giving the appearance of return-type dispatch. Technically this violates the Java language spec, and javac 7 was updated to be stricter and prevent this sort of code: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6182950

I guess it's ultimately not that mysterious, but I first encountered it "in the wild", and ended up scratching my head for a while before I figured out why our code suddenly stopped compiling when we upgraded the JDK.

[+] masklinn|9 years ago|reply
> giving the appearance of return-type dispatch

It don't know that it was an appearance, my understanding is it is return-type dispatch which is supported by the JVM (but not java).

[+] jxi|9 years ago|reply
Interesting. Then, why isn't the bytecode always stored with the caller, that way you can have return-type dispatch?
[+] nebulous1|9 years ago|reply
tl;dr Java needs to be able to decide which overloaded method implementation to use at compile time, meaning you can't differentiate between method implementations based on return type alone. However, when compiled to bytecode, each method and method call includes the return type as part of the method identification, so Java bytecode is actually capable of differentiating between implementations with the same name based on return type alone. This fact is used by bytecode obfuscators, and can lead to bugs in decompiled code if the decompiler doesn't account for it.
[+] masklinn|9 years ago|reply
> Java needs to be able to decide which overloaded method implementation to use at compile time, meaning you can't differentiate between method implementations based on return type alone.

That's not correct. Rust has no issue statically dispatching based on the return type.

Java-the-language does not allow it so it does not have to deal with calls which don't use the return value e.g.

    int getSomething()
    String getSomething()
If the return value is not used, you have to explicitly disambiguate this call somehow, and Java provides no way to do so.
[+] bmc7505|9 years ago|reply
I was surprised to learn that in Kotlin, it is possible to disambiguate overloaded functions based only on their return type. I had no idea the JVM even supports such semantics. [1]

[1]: http://stackoverflow.com/q/42916801/1772342

[+] mbel|9 years ago|reply
According you your link JVM does not support this feature. It's implemented by Kotlin itself. I guess it's probably some kind of name-mangling scheme.
[+] mooman219|9 years ago|reply
Not only is this possible, as stated by the article, this level of overloading is incredibly useful for in a number of areas and has been used for a long time.

One use case is API backwards compatibility. If your API wants to change the return type of a function, say from int to double, but also wants to maintain binary backwards compatibility, you can do that. See OverMapped [ https://github.com/Wolvereness/OverMapped ].

Obfuscation is another area and ProGuard employs this to make decompliling more difficult iirc.

[+] matharmin|9 years ago|reply
This is about Dalvik bytecode format, but the same applies to standard Java bytebode files. Practically any obfuscated Java code will have this, which makes reverse engineering much more difficult without the tools to handle it.
[+] HighlandSpring|9 years ago|reply
Why is it not as simple as mapping the method call instructions to returntype_methodname format? Am I missing something?
[+] Animux|9 years ago|reply
Do you have some examples for tools to handle the deobfuscation stuff?
[+] Gaelan|9 years ago|reply
> Any compiler of sound mind and memory will issue an error

GHC would beg to differ.

[+] mattnewton|9 years ago|reply
That was my first thought, that this is being presented as a fundamental tenant of compilers and really it's a design choice that a whole family of languages, some of them that people even use, didnt make.
[+] cjensen|9 years ago|reply
Ada also allows return type overloading.

A complaint against Ada is that it is hard for the compiler to figure out the right overloading in a complex statement with many nested function calls. My compilers prof took the time to show how to do it, and why it doesn't take much time.

Too bad C's automatic conversions prevent this from being used in C++.

[+] mnarayan01|9 years ago|reply
It seems like this is simply a disassembler error (albeit an understandable one). Am I missing something?

Edit: Based on the responses below, I guess the point is that the disassembler can't generate Java code that will "naively" (wrong word, but I can't think of a better one) generate the same output. Notable (I assume) in that name munging would be problematic outside the current compilation unit.

[+] burkaman|9 years ago|reply
No, Java class files usually include the names of variables and functions, so this isn't the disassembler's fault. The class file actually had two functions with the same name. You could certainly implement an anti-obfuscation layer to detect stuff like this, but I wouldn't call it an "error" as is.
[+] shawnz|9 years ago|reply
What did the disassembler do wrong? It just happens that there is no valid Java code which could produce that (valid) bytecode. What should it have outputted instead?
[+] barahilia|9 years ago|reply
In basic blocks (no conditionals or loops), a disassembler mostly does a mechanical job translating opcodes to the appropriate Java code and aggregating expressions. But it doesn't rename functions and or classes. They are left as were in the bytecode.
[+] jandrese|9 years ago|reply
No, it's a mismatch between what the compiler accepts and what the JVM can execute.

The interesting part to me is that it seems that Java would be perfectly capable of differentiating between methods by return type if the compiler was tweaked slightly. Is there a reason why this isn't a formal language feature?

[+] cremp|9 years ago|reply
It should be noted that the JVM allows for things that javac does not. This is one of those things, and describes it accurately, albeit in the android java environment, however it applies to the standard JVM too.

One of the more fun things I've done is use the ASM and BCEL libraries to make (and unmake) these kinds of manipulations (manipulations that javac won't let you do.)

[+] anonymousDan|9 years ago|reply
I believe the technical term for the correspondence between what it is possible to compile and what is valid bytecode/machine code is fully abstract compilation. It's an interesting concept with many interesting implications (e.g. for security). In the past at least there were various examples of Java programs that were illegal in the language but nonetheless could be created directly as bytecode and would be loaded by the JVM. This obviously becomes a security problem if your program loads bytecode dynamically and makes assumptions about its capabilities at the language level as opposed to the bytecode level.
[+] drdrey|9 years ago|reply
If you load untrusted code dynamically, it strikes me as wrong to assume anything about its capabilities. Even more so "at the language level". Untrusted code can do anything unless you sandbox it.
[+] tenkeyless|9 years ago|reply
This is a well known problem in the field of decompilers and disassemblers. It figures that the pseudo-code it outputs for generic compilers is pretty good, but when encountered with a man-made assembly or byte-code, they go places.
[+] _old_dude_|9 years ago|reply
Even the Java compiler uses that trick, by example with a bridge method.

  public static void main(String[] args) {
    class Fun implements Supplier<String> {
      public String get() { return null; }
    }
    
    Arrays.stream(Fun.class.getMethods())
      .filter(m -> m.getDeclaringClass() == Fun.class)
      .forEach(System.out::println);
  }
[+] kuschku|9 years ago|reply
why not use .getDeclaredMethods()?
[+] 0x0|9 years ago|reply
Another oddity to think about in java source code: In regular Java, all objects extend java.lang.Object, including java.lang.Class. So how do you bootstrap building java.lang.Object from source?
[+] pjmlp|9 years ago|reply
You don't.

This is part of the bootstraping process of a programming language.

Usually such special types are built manually in the compiler data structures, or make use of special primitives, like native methods on Java's case.

[+] seanmcdirmid|9 years ago|reply
java.lang.Object is just part of the VM. Same thing with native methods.
[+] reitanqild|9 years ago|reply
I call clickbait:

Author admits this is not valid Java. It is not even compilable.

If I read correctly it is just artifacts from a partially sucessful decompile.

Interesting and this discussion is interesting but this is not and have never been valid Java.