vitalyd's comments

vitalyd | 9 years ago | on: A JVM Does That? (2011) [pdf]

You should've made it explicit then that you're referring to slow HFT -- the post I was replying to drew no such distinction apart from saying the "extreme end" uses FPGAs. Obviously if young gen GC pauses aren't an issue, then there's nothing to talk about here but then I'd argue that's not really HFT, although I know the term is quite vague, and is no different than other types of systems. There are other issues with GC and garbage allocations, such as d-cache pollution, but I suppose no need to really discuss them given the type of system you're discussing.

I know you were throwing 250us out there as a pseudo example, but that's actually a very long time even outside of UHFT/MM.

Also don't forget that your trading daemons will be under a fire hose consuming marketdata, so beyond being able to tick-to-trade quickly, you need to be able to consume that stream without building up a substantial backlog (or worse, OOM or enter permanent gapping).

vitalyd | 9 years ago | on: A JVM Does That? (2011) [pdf]

Yes, Azul has a similar feature (ReadyNow).

This is nontrivial because lots of optimizations depend on class load ordering and runtime profile information.

vitalyd | 9 years ago | on: A JVM Does That? (2011) [pdf]

Tiered JITs are meant to allow slower and more aggressive optimizations to be done on truly hot code. However, you're right in that they still cannot spend as much time or resources as an AOT compiler.

vitalyd | 9 years ago | on: A JVM Does That? (2011) [pdf]

Devirtualization is mostly an issue for Java since everything is virtual by default and the language doesn't have support for compile time monomorphization.

While C++ code does use virtuals, it's nowhere near the amount as Java - there are language constructs to avoid that and move the dispatch selection to compile time.

vitalyd | 9 years ago | on: A JVM Does That? (2011) [pdf]

The fantastic standard library mostly goes away because it allocates. It's possible to write Java code that doesn't allocate in steady state, but the coding style becomes terrible (e.g. overuse of primitives, mutable wrappers, manually flattened data structures, etc).

There's also the issue that even without GC running you pay the cost of card marking (GC store barriers) on every reference write. There's unpredictability due to deoptimizations occurring due to type/branch profile changes, safepoints occurring due to housekeeping, etc.

It's unclear whether that style of Java coding is actually a net win over using languages with better performance model.

vitalyd | 10 years ago | on: The C++ community is polarized about C++11's random-number facility

Sometimes the right defaults depend on context unavailable to the low level library author. But irrespective of who provides the higher level API, the bottom line is that there needs to be an API that's most flexible when the flexibility is warranted. This is really in reference to Colin's "nothing more to add" remark.

vitalyd | 10 years ago | on: The C++ community is polarized about C++11's random-number facility

I think this is a natural consequence of low-level APIs. Anything deemed worthy of being configured by caller is made into an extensible interface, whether that be through flag arguments or types. Unless some lib is very opinionated on how things should be done, the extension/customization hooks have to exist in one form or another.

To satisfy the two camps though (low level control vs "give me some sane default or package up common combinations") someone just writes the higher level API on top.

vitalyd | 10 years ago | on: Random Acts of Optimization

Optimizers are mostly about stripping away abstraction costs of the language. If anyone is asking for more, they're disillusioned :).

vitalyd | 10 years ago | on: Random Acts of Optimization

Yes, this is a side benefit to exceptions (or some other language's way to highlight errors and thus likely cold paths).

vitalyd | 10 years ago | on: Random Acts of Optimization

I agree - profile and cost model are orthogonal and complementary. Former tells you what to optimize, latter dictates how to optimize it.

vitalyd | 10 years ago | on: Random Acts of Optimization

>For example, error paths are generally very cold, but how is a compiler supposed to know that a path is an error path? They just look like regular conditionals.

At least for .NET that path is likely throwing an exception. If you don't have a profile, statically assuming those paths are cold is likely to be correct.

vitalyd | 10 years ago | on: Random Acts of Optimization

Profiling comes with its own baggage, nothing's perfect. A lot of times one actually knows the likely/unlikely paths at development time; that's certainly true for error conditions. It makes sense to inform the compiler of those cases using the builtins.

vitalyd | 10 years ago | on: Performance in Big Data Land: Every CPU cycle matters

Rust is exciting, no doubt, and I have high hopes for its adoption, but I've personally not seen/heard of any visible OSS big data style projects using it. I see Frank McSherry's stuff has been mentioned, but I think that's still his pet project (hopefully not putting words in his mouth).

But really I was using C++ as an example of something more fit for these types of projects than Java, it doesn't have to be only C++ of course.

vitalyd | 10 years ago | on: Performance in Big Data Land: Every CPU cycle matters

Memory indirection is the biggest issue indeed. However, I'd also add that java has a terrible performance model, as a language. Unless you stick to primitives only, the abstraction costs start to add up (beyond pointer chasing). It shoves the entire optimization burden onto the JVM which by the time it runs has lost a bunch of semantic and type information in some cases. There are also codegen deficiencies in current hotspot C2 compiler (i.e. generated code subpar compared to roughly equivalent gcc).

vitalyd | 10 years ago | on: Functional Reactive Programming in Java

You're getting to performance left on the table. Unfortunately, java and the existing JVM jit compilers will not fold away all the abstractions (nevermind the allocations if you have capturing lambdas).

Whether this matters is circumstantial, but for serious game dev it most definitely will.

vitalyd | 10 years ago | on: High-Speed Trading Firm Deleted Some Code by Accident

I'd be surprised if they didn't make postmortem changes to prevent such incidents in the future. Your posts in this thread imply that no mistake should ever happen, but let's be real - mistakes will always happen, the key is to do proper postmortem analysis and learn from them (and obviously prevent recurrences).

vitalyd | 10 years ago | on: Kudu – Fast Analytics on Fast Data

+1 on writing up a blog post on c++ vs java. I suspect given your background in java, people may heed your words a bit more than usual. There's definitely a lot of outdated thinking in java land with regards to (modern) c++ and accompanying toolchain. Many big data projects could benefit from being written in c++ rather than java (or another jvm language).

vitalyd | 10 years ago | on: Clarifications about Redis and Memcached

I think once you implement threaded i/o, requests for hot keys will hit in the cpu cache and you'll become NIC limited. At that point, read replicas is the best solution rather than shared memory since contention will move to NIC and adding more cpus won't help.

Edit: Salvatore, you should also look at the Seastar/ScyllaDB design (if you haven't yet) - that architecture would work well for redis as well. And if user has access to DPDK (or other kernel bypass enabled NICs, like Solarflare), their performance will go up even further.

vitalyd | 10 years ago | on: ScyllaDB: Drop-in replacement for Cassandra that claims to be 10x faster

>However, a lot of the IO overhead with disk, for example.

That's why they benchmarked this workload on a 4x SSD RAID configuration :). Given that i/o bandwidth and throughput continues to increase, processor frequency isn't, and core counts are going up, it's prudent to design a system that can take advantage of this.

page 2