nfa_backward's comments

nfa_backward | 8 years ago | on: Eclipse OpenJ9 – Open-source JVM

http://www.eclipse.org/openj9/

"Shared classes and Ahead-of-Time (AOT) technologies typically provide a 20-40% reduction in start-up time while improving the overall ramp-up time of applications. This capability is crucial for short-running Java applications or for horizontal scalability solutions that rely on the frequent provisioning and deprovisioning of JVM instances to manage workloads."

nfa_backward | 9 years ago | on: Ask HN: What are some examples of good code?

Facebook Presto - See my comment here: https://news.ycombinator.com/item?id=13626109

nfa_backward | 9 years ago | on: Ask HN: What are some well written/engineered open source software?

Facebook Presto, a MPP SQL Engine written in Java.

https://github.com/prestodb/presto

I have learned a lot from reading the source code and watching it develop. It is written in modern Java 8. The authors are obviously experts of the language, JVM and ecosystem. Since it is an MPP SQL engine performance is very important. The authors have been able to strike a good balance between performance and clean abstractions. I have also learned a lot about how to evolve a product. Large features are added iteratively. In my own code I often found myself going from Feature 1.0 -> Feature 2.0. Following Presto PRs, I have seen how for large features they go from Feature 1.0 -> Feature 1.1 -> Feature 1.2 -> ... Feature 2.0 very quickly. This is much more difficult than it sounds. How can I implement 10% of a feature, still have it provide benefits and still be able to ship it? I have seen how this technique allows for code to make it into production quickly where it is validated and hardened. In some ways it reminds me of this: https://storify.com/jrauser/on-the-big-rewrite-and-bezos-as-.... You shouldn't be asking for a rewrite. Know where you want to go and carefully plan small steps from here to there.

nfa_backward | 9 years ago | on: ClickHouse – high-performance open-source distributed column-oriented DBMS

Looks really interesting and not another SQL on Hadoop solution. The benchmarks look impressive, but all of the queries were aggregations of a single table. I did not see any joins. I wonder how mature the optimizer is.

nfa_backward | 10 years ago | on: Facebook's iOS Bug Led ComScore to Overestimate Time Spent

http://www.comscore.com/applicationsdk

nfa_backward | 10 years ago | on: Kudu as a More Flexible and Available Kafka-Style Queue

Glad to hear this is at least being considered. The optimizations for data warehousing you mentioned are my use case. I understand the it is a very active project with a lot on the road map. It's a very cool project and I follow you guys on http://gerrit.cloudera.org/#/q/status:open

nfa_backward | 10 years ago | on: Kudu as a More Flexible and Available Kafka-Style Queue

Does Kudu colocate data from different tables with equal keys? If not, is this or a similar feature on the road map?

nfa_backward | 10 years ago | on: IBM's SystemML Machine Learning – Now Apache SystemML

This looks interesting and something I will definitely watch, but at this point I think I will still stick with http://h2o.ai/ (another JVM based ML open source project that integrates well with 'Hadoop'). I have been really impressed with the quality of the product and even more so with the quality of the people behind the it.

nfa_backward | 10 years ago | on: Kudu – Fast Analytics on Fast Data

Does Kudu colocate data sets with identical keys? If so, are there plans to have Impala take advantage of this?

nfa_backward | 10 years ago | on: Kudu – Fast Analytics on Fast Data

Impala has an in-memory columnar format on its road map for 2016. Is that format being design with Kudu in mind?

Edit: I understand that the formats, while both columnar, serve different purposes. I am more curious about overlap if any between the two.

nfa_backward | 10 years ago | on: Kudu – Fast Analytics on Fast Data

From my experience and the experience of others ( https://www.eecs.berkeley.edu/~keo/publications/nsdi15-final... ) current big data solutions are more often CPU bound not IO. I think that we will be seeing more and more of big data architecture moving to C++. For example: http://www.scylladb.com/

nfa_backward | 10 years ago | on: Kudu – Fast Analytics on Fast Data

Kudu is being positioned as filling the gap between HDFS and HBase. After reading the overview I see this more as bringing features from HDFS+Parquet+HBase. Does that sound reasonable?

Super excited about this and even more so since it is open source. Thank you!

nfa_backward | 11 years ago | on: Java Garbage Collection Distilled (2013)

It looks like their attention has turned to LLVM.

http://www.philipreames.com/Blog/2014/06/04/code-for-late-sa...

http://www.azulsystems.com/about_us/careers/llvm-compiler-en...

nfa_backward | 12 years ago | on: Don't use Hadoop when your data isn't that big

The author is missing a big gap between 5TB - 1PB. For most workloads, I would not look to Hadoop at the 5TB+ scale of data. I would first look at Impala or Redshift.

http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-t...

nfa_backward | 12 years ago | on: Yahoo tops Google in US traffic

Your links are looking at just google.com vs yahoo.com. The cnet article is referring to web properties owned by google vs properties owned by yahoo.

nfa_backward | 12 years ago | on: Yahoo tops Google in US traffic

That explains why debaserab2 was able to find b.scorecardresearch for Yahoo, but not Google.

nfa_backward | 12 years ago | on: Yahoo tops Google in US traffic

The incentive is ad dollars. Ad companies want third party verification.

nfa_backward | 12 years ago | on: Yahoo tops Google in US traffic

From - comscore.com/Insights/Press_Releases/2009/5/comScore_Announced_Media_Metrix_360

"The new approach combines person-level measurement from comScore's proprietary 2 million person global panel with Web site server metrics in order to account for 100 percent of a Web site's audience."

It's possible that Yahoo and Google are providing server metrics via javascript tagging to comScore. That would give them direct access to the traffic data. I believe that Quantcast and maybe Nielsen both offer something similar as well.