(no title)
saltysugar | 12 years ago
Not sure on what basis you claim the choice of language to be "questionable", but keep in mind that Scala's type-safety and many other features are much more difficult to achieve in C/C++. Cleaner code is sometimes more important than some tiny gain in performance.
Also in terms of scaling, Kafka cleverly takes advantage of many aspects in their design to ensure low-latency high-throughput.
* Little random I/O
* Relying heavily on the OS pagecache for data storage
Performance-wise, Kafka can outshine some of the in-memory message storing message queues.
Source: http://kafka.apache.org/documentation.html#design
http://research.microsoft.com/en-us/um/people/srikanth/netdb...
shmerl|12 years ago
> Java garbage collection becomes increasingly fiddly and slow as the in-heap data increases.
So they had to work around that. In my view it's not a tiny issue. I'd say, instead of working around such inherent limitations, it's better not to have them to begin with when making high performance systems. That was my main point above. Time spent dancing around such problems defeats the purpose of supposed easiness of development.
saltysugar|12 years ago
The next line reads: "As a result of these factors using the filesystem and relying on pagecache is superior to maintaining an in-memory cache or other structure—we at least double the available cache..."
And if you don't know about pagecache, it's an in-memory cache managed by the OS and has nothing to do with JVM's memory at all.
And you forgot that C++ isn't the easiest language when it comes to designing a distributed system. Scala, as I mentioned, offers many other features that suit the needs of the team. Of course if you're a good engineer you'll know that there are trade-offs such as compiling time, but that's the same for every engineering decision.
noelwelsh|12 years ago
BTW, I've talked to finance people who have JVM applications that haven't had a major GC in months. Really, it's not a big deal.
kasey_junk|12 years ago
Is this difficult in Java? Yes. It's also difficult in C++. Just because it is difficult doesn't mean it is impossible.
So if I am resorting to managing my own memory anyway, why would I use the JVM? Because typically the code that is on the critical path is a small percentage of the entire code base, and the other advantages of the JVM (tooling, language features, libraries etc.) out way the downsides.
That's not always the case, and I don't have any specific knowledge of Kafka, but just because something needs to be low/consistent latency doesn't mean it can't (or shouldn't) be written on the JVM.
fauigerzigerk|12 years ago
This is a persistent message queue for log messages. Messages are coming in sequentially and subscribers read them sequentially. It makes zero sense to keep tons of messages in memory inside complex data structures as they are not indexed or searched or analyzed.
So in this particular case it's not actually a workaround. It's just sensible design and I wouldn't do it any differently in C++ either.