stringCache is only ever assigned to with an initialised string (the output of stringCache()), so given Strings are immutable and assuming Java's assignment is atomic, how can you get a race condition that causes a problem? There shouldn't be a TOCTTOU issue either given stringCache is copied to sc before being null-checked.
What is the statement reordering doing here to break this? How can it be prevented? (Does it need write barriers or something?)
Using the Java terminology, there is no "happens-before" edge. So every action must appear as if it occurred in the specified order in that thread, but other threads are free to see those actions in a different order.
On some machines/CPUs/architectures, this is very easy to organise, since the writes can and will be re-ordered. When this happens you can get the pointer to the string being set, before the contents of the string are actually set. Within the thread, you won't see this, since reorders within the thread of context are done "safely", or at least are visible, but to a different thread/core/CPU those writes may arrive out of order.
The issue may be to do with the guarantees that the JVM gives you about orders of operations and write visibility across threads.
The author hints at a possible solution when they say "As we see a non-volatile field stringCache", the key being 'volatile' which is a Java keyword with a specific meaning that constrains the order of operations and write visibility across threads.
You can see a partially constructed String is the problem. Assignment is atomic, but it's not guaranteed to not be reordered with operations done in the constructor.
Very similar bug was in Digitalk Smalltalk 25 years ago: conversion from integer to string used a global cache... I notices this while running a batch with progress indicator on UI. Sometimes integer to string conversion in batch didn't work and I couldn't find the reason. After loooooooong period of extensive debugging the issue was pinned down to global cache in integer to string conversion, which was not thread safe. At first Digitalk didn't want to fix it "because of performance issues", but in later versions it was fixed... This bug can cause very serious side effects...
BigDecimal has nothing to do with the issue.
There is no race conditional. The code is textbook example how to use (cheap) unsafe publishing with no locks/CAS -- assign to local variable and 'return' the variable value (not a class member one). There is no global cache, it's per BigDecimal instance.
As for 'this bug', yes JVM concurrency bugs are serious by definition.
How does this assign an uninitialized String to stringCache? I thought String is immutable and the String object is fully calculated at assignment? Where is the StringBuilder used when stringCache and sc are objects of type "String?"
Shouldn't the "problem" be that two threads might call layoutChars at the same time, leading to some extra CPU cycles wasted and some extra garbage?
I suspect that the real source code is different, or the real problem that leads to the NRE is different.
You are correct that this is allowed by the Java spec (1). Final fields are guaranteed to happen before reference assignment, and strings are explicitly mentioned. It appears to be an ARM JVM bug.
Not sure if late for the party... But there is no bug int the impl. BigDecimal.toString(). It's a JVM bug. It blatantly violates the spec of the original JMM (as of Java 1.5 and backported to 1.4)
Not only that but it lacks optimization/intrinsics for String. String.length() should never/ever yield a NPE. I'd expect a process crash than adding metadata for trapping read access faults.
---
Edit: final fields have become ubiquitous even for objects that are safely published [via volatile, synchronizeed, CAS, before thread.start()]. All wrapper classes Integer, Long, etc. have them. Final fields are very much advised to be used and they do help reliability and readability by making it easy to reason about object state (i.e. it doesn't change once seen).
On x86 field fields require a compiler barrier at best as the writes are not reordered. In other words they are extremely cheap. Stuff like AtomicReference.lazySet is next to free and a welcome way to build fast concurrency primitives.
There is Doug Lea's parer[0] for the improved jmm.
There are different versions of ARM architecture with different memory models, overall ARM is considered weak.
v7 has dmb[1] only, and to my knowledge it's not cheap. Skipping dmb requires rather deep analysis, so I wonder if that was the case experienced by the poster.
ARMv8 has store-release fence but I don't know how efficient would be spamming it.
In java lang specs: "A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields."
This makes it a JVM bug because no thread should be able to see uninitialized final fields of an object.
Agree with others that String.length() should never ever throw NPE from within String, regardless of it being called from whatever thread.
Now, in the comments there's another idea suggesting the unsafe publication of String (partially constructed). This would be the case if String.value wasn't final (essentially the case of double-checked locking). But that's not the case here because String.value is final[0]
While it's a JVM bug, making stringCache volatile would be one way to fix it - unless the JVM is also broken around volatile...
is there even a good reason for this to be cached. this almost like a memory leak. if someone wants to cache the result of toString they should be doing it in client code. this shouldn't be something that is forced on everyone.
And yet, it's an interface, as a user you don't notice this performance improvement at all - it returns a string version of an (immutable) data structure, how it does that and whether it caches it is not something that the client can control, and thus not something they should care about too much.
What they should and will care about is the performance of BigDecimal.toString(), and if it's cached then the 2nd and onward calls will be very fast.
Mind you, I'm sure there's a lot of similar optimizations in a lot of toString() implementations; a generic implementation that is threadsafe would be preferable to a roll-your-own-caching.
"BigDecimal is an immutable data type. So every method of this class should be thread safe."
I'm not familiar with the ins and outs of Java's API and memory model and this is unexpected, but should it? If so, where does the documentation make that promise?
Also (nitpick) one _can_ subclass BigDecimal and make it mutable (a design error that won't be fixed because of backward compatibility)
I am a complete java noob, but it seems like in the test program the same variable testBigDecimal is being shared by different threads without any lock or mutex or any sort of concurrency control! Won't any function working on testBigDecimal be thread unsafe if it was not specifically written assuming it was working on a shared object? Why is this news?
Disclaimer: I am so java illiterate that I am applying my C understanding to the code snippet.
It's immutable, and immutable objects are thread safe. This is also true in C: you can safely pass references to structs between threads with no locks, as long as you never mutate them. Java just enforces that immutability if you write the class the right way.
...the bug is that it's not really immutable, insofar as it's using an unsafe mutation under the hood to implement its toString method. So it either needs to document the fact that it's not immutable despite the appearance otherwise (crazy), or fix the problem.
BigDecimal does not contain any mutable user visible state, so it should be usable by different threads without synchronisation of any kind (except the implicit synchronisation by means of "can I see pointer to that instance")
It is news because either there is bug in the BigDecimal or JVM in question that causes the above assumption to not hold.
BigDecimal claims to be an immutable class, with only non-mutating public methods. In C, the equivalent would be that every exposed function takes `const BigDecimal *`.
Even in C, it is fine for multiple threads to access the same data as long as they all read and don't write it.
[+] [-] TazeTSchnitzel|8 years ago|reply
stringCache is only ever assigned to with an initialised string (the output of stringCache()), so given Strings are immutable and assuming Java's assignment is atomic, how can you get a race condition that causes a problem? There shouldn't be a TOCTTOU issue either given stringCache is copied to sc before being null-checked.
What is the statement reordering doing here to break this? How can it be prevented? (Does it need write barriers or something?)
[+] [-] pwagland|8 years ago|reply
On some machines/CPUs/architectures, this is very easy to organise, since the writes can and will be re-ordered. When this happens you can get the pointer to the string being set, before the contents of the string are actually set. Within the thread, you won't see this, since reorders within the thread of context are done "safely", or at least are visible, but to a different thread/core/CPU those writes may arrive out of order.
[+] [-] readittwice|8 years ago|reply
[+] [-] juancn|8 years ago|reply
String's value is final and assigned a copy of the array, so it's impossible to see a string that's not completely initialized as per the JMM
[+] [-] joncrocks|8 years ago|reply
The author hints at a possible solution when they say "As we see a non-volatile field stringCache", the key being 'volatile' which is a Java keyword with a specific meaning that constrains the order of operations and write visibility across threads.
See http://tutorials.jenkov.com/java-concurrency/java-memory-mod... for a reasonable explanation.
edit: suggestion elsewhere that it could be a wonky JVM implementation, so may not be a BigDecimal problem after all
[+] [-] hyperpape|8 years ago|reply
The double checked locking is broken declaration describes this: https://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedL...
Edit: nevermind, I believe others' points about Strings having a final value make this irrelevant.
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] pulse7|8 years ago|reply
[+] [-] xxs|8 years ago|reply
[+] [-] gwbas1c|8 years ago|reply
> stringCache = sc = layoutChars(true);
How does this assign an uninitialized String to stringCache? I thought String is immutable and the String object is fully calculated at assignment? Where is the StringBuilder used when stringCache and sc are objects of type "String?"
Shouldn't the "problem" be that two threads might call layoutChars at the same time, leading to some extra CPU cycles wasted and some extra garbage?
I suspect that the real source code is different, or the real problem that leads to the NRE is different.
[+] [-] mmastrac|8 years ago|reply
1. https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.htm...
[+] [-] xxs|8 years ago|reply
Not only that but it lacks optimization/intrinsics for String. String.length() should never/ever yield a NPE. I'd expect a process crash than adding metadata for trapping read access faults.
---
Edit: final fields have become ubiquitous even for objects that are safely published [via volatile, synchronizeed, CAS, before thread.start()]. All wrapper classes Integer, Long, etc. have them. Final fields are very much advised to be used and they do help reliability and readability by making it easy to reason about object state (i.e. it doesn't change once seen). On x86 field fields require a compiler barrier at best as the writes are not reordered. In other words they are extremely cheap. Stuff like AtomicReference.lazySet is next to free and a welcome way to build fast concurrency primitives.
There is Doug Lea's parer[0] for the improved jmm. There are different versions of ARM architecture with different memory models, overall ARM is considered weak. v7 has dmb[1] only, and to my knowledge it's not cheap. Skipping dmb requires rather deep analysis, so I wonder if that was the case experienced by the poster. ARMv8 has store-release fence but I don't know how efficient would be spamming it.
0: http://gee.cs.oswego.edu/dl/html/j9mm.html
1: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc....
[+] [-] GreenToad|8 years ago|reply
[+] [-] vivaamerica1|8 years ago|reply
Now, in the comments there's another idea suggesting the unsafe publication of String (partially constructed). This would be the case if String.value wasn't final (essentially the case of double-checked locking). But that's not the case here because String.value is final[0]
While it's a JVM bug, making stringCache volatile would be one way to fix it - unless the JVM is also broken around volatile...
[0] https://stackoverflow.com/questions/11306032/please-explain-...
[+] [-] gus_massa|8 years ago|reply
[+] [-] benmmurphy|8 years ago|reply
[+] [-] Cthulhu_|8 years ago|reply
What they should and will care about is the performance of BigDecimal.toString(), and if it's cached then the 2nd and onward calls will be very fast.
Mind you, I'm sure there's a lot of similar optimizations in a lot of toString() implementations; a generic implementation that is threadsafe would be preferable to a roll-your-own-caching.
[+] [-] osi|8 years ago|reply
[+] [-] chvid|8 years ago|reply
[+] [-] ht85|8 years ago|reply
[+] [-] Someone|8 years ago|reply
I'm not familiar with the ins and outs of Java's API and memory model and this is unexpected, but should it? If so, where does the documentation make that promise?
Also (nitpick) one _can_ subclass BigDecimal and make it mutable (a design error that won't be fixed because of backward compatibility)
[+] [-] juancn|8 years ago|reply
[+] [-] lozzo|8 years ago|reply
[+] [-] dudul|8 years ago|reply
[+] [-] british_india|8 years ago|reply
[+] [-] craptocurrency|8 years ago|reply
[deleted]
[+] [-] dingo_bat|8 years ago|reply
Disclaimer: I am so java illiterate that I am applying my C understanding to the code snippet.
[+] [-] hyperpape|8 years ago|reply
...the bug is that it's not really immutable, insofar as it's using an unsafe mutation under the hood to implement its toString method. So it either needs to document the fact that it's not immutable despite the appearance otherwise (crazy), or fix the problem.
[+] [-] dfox|8 years ago|reply
It is news because either there is bug in the BigDecimal or JVM in question that causes the above assumption to not hold.
[+] [-] scatters|8 years ago|reply
Even in C, it is fine for multiple threads to access the same data as long as they all read and don't write it.
[+] [-] nerdherfer|8 years ago|reply
[+] [-] merb|8 years ago|reply
Edit: This could be easily fixed with double-checked locking