pkj's comments | WingNews

pkj | 12 years ago | on: Presto: Interacting with petabytes of data at Facebook

From the website:

"Here from HackerNews? This was originally posted several months ago. Check back in two weeks for an updated benchmark including newer versions of Hive, Impala, and Shark."

pkj | 12 years ago | on: The Writings of Leslie Lamport

A really good intro to Raft --> http://www.youtube.com/watch?v=YbZ3zDzDnrw

Very useful if you are into distributed systems and would like to see a practical implementation with clear examples.

pkj | 12 years ago | on: Optimizing AngularJS

I would argue there is more than sufficient interest :) All you have to do is scan the angularjs google groups for ngRepeat issues.

Infact, one of the solutions proposed for ngRepeat issue (with lot of caveats) has 300+ stars no github

https://github.com/Pasvaz/bindonce

Including me, there are lot of folks who would find your solution very useful. You are addressing a fundamental O(n) scaling problem in AngularJS.

pkj | 12 years ago | on: An unofficial alternative to the HN interface

The top comment by moskie points out one such chrome extension. Just installed it and works great for collapsing threads.

https://chrome.google.com/webstore/detail/hackernew/lgoghlnd...

pkj | 12 years ago | on: Weary of ‘Fruit Fly’ Consumer Startups, Andreessen Horowitz Raises Series A Bar

>"That plays well to our market development program where 1,200 big company management teams are coming through our office every year -- we ask them what they think about new ideas and they tell us."

How about asking BigCos their top pain points once in a while? That would be a 1200x treasure for current/future entrepreneurs.

pkj | 12 years ago | on: Facebook Operations Chief Reveals Open Networking Plan

>Running an all layer2 network requires devices that have large enough CAM tables to support all connected devices, many vendors newer full line rate cards were coming out with smaller CAM tables, as such layer2 simply wasn't an option in some cases.

Does this imply that mac forwarding tables(on switch) and arp cache(on hosts) need to have entries only for their immediate neighbours ?

Curious to know how much modified the host network stack is. Also, how do you provision a new server with the right IP ? Is this mechanism in L2 ?

>There are many other reasons that layer2 wasn't a good choice for us, and that layer3 makes a lot of sense. I'd be happy to discuss more of these as well.

I am sure people would find that very useful. Thanks for the excellent writeup !

pkj | 12 years ago | on: Facebook Operations Chief Reveals Open Networking Plan

Sure, we are trying through global warming :)

On a serious note, take a look at cumulus networks[1] who claim to solve the hw/sw disaggregation problem. They have a linux OS distro which can run on h/w of multiple vendors ( not the popular ones like cisco/jnpr/hp/brcd etc.. since those are closed platforms).

[1] http://cumulusnetworks.com/product/overview/

pkj | 12 years ago | on: The Stanford Academic Who Wrote Google Its First Check (2012)

I had taken a course designed by him at stanford: object oriented programming from a modeling and simulation perspective. I also have the lecture notes for the follow-up course. It was mentioned that the course was based on his real world experience doing c++ programming in academia and startup that he sold to Cisco. It was very clear from the beginning that the course was radically different from any programming course I had seen. Highly opinionated in almost every page. There was rarely anything that he repeated from any standard c++ books. I enjoyed the course, though only about 5% still remains with me. I have worked in teams using c++ extensively to manage $Billion+ products, and can attest that the course definitely had some value. Also, I had interviewed with a C++ team in Cisco 4K catalyst group and they mentioned they were still building on the c++ platform laid by Cheriton's group ( a decade later).

Talking about any disadvantages, I can say that you need someone of Cheriton's ability to really guide you during the initial build-up of the project. His ideas of having the compiler do a lot of checking to avoid bugs in the code later are really cool. But the learning curve is pretty steep. Newer paradigms (since the course was originally designed) like STL, BOOST, and tips in Effective C++/STL etc can replace some of the concepts he espouses. But he is pretty clear that some of them are really inferior. It might be true for some cases, but I have usually used them in production just fine..

pkj | 12 years ago | on: The STEM Crisis is a Myth

Duplicate (mobile version) of an earlier submission. See the earlier discussion at

https://news.ycombinator.com/item?id=6305671

pkj | 12 years ago | on: CoreOS: Boot on Bare Metal with PXE

If you are ok with CentOS/RHEL, there is a scalable way to boot multiple machines simulataneously using ........ bittorrent !

http://www.rocksclusters.org/rocks-doc/papers/two-pager/pape...

Rocks is used at production cluster installations with thousands of servers.

http://www.rocksclusters.org/rocks-register/index.php?sortby...

pkj | 12 years ago | on: Efficient String Concatenation in Python

Similar relative results, except that method 1 is always the fastest and even better than method 6. Ran loop count with large numbers (10 million & 30 million) to reduce measurement noise. Profiling was with cProfile on Core i3 2.53Ghz, 6 GB ram, Python 2.7.3 on Ubuntu 12.04

for 10M loop count, method 1 -> 1.599 s, method 6 -> 1.91 s

for 30M loop count,method 1 -> 4.967 s, method 6 -> 5.871 s

Summary: The KISS s1 += s2 always wins

pkj | 12 years ago | on: The New AWS Command-Line Interface

Just tried it out and it is excellent ! The integration of various aws services is really neat. The help and correction-suggestion are nice too.

Great to hear that this is built on the awesome boto library. Will serve as an useful reference for boto developers.

pkj | 12 years ago | on: Easy Steps to a Complete Understanding of SQL

Absolutely. Very interesting talk. Definitely deserves a submission.

BTW, the person asking the last couple of questions is Ed Bugnion, one of the co-founders of VMWare. He is a faculty now at EPFL.

pkj | 12 years ago | on: Russia detects two missile launches in Mediterranean

US Navy did not fire it. From Reuters:

http://www.reuters.com/article/2013/09/03/us-syria-crisis-us...

pkj | 12 years ago | on: Ask HN: Who is hiring? (September 2013)

Nothing wrong. This is standard industry practice. Check out glassdoor salaries for companies in diverse locations.

pkj | 12 years ago | on: How the Dropbox Datastore API Handles Conflicts – Part Two

Thanks for the comprehensive clarification. Yes you are right that the use of revision-id would allow "first change to reach the server wins" for case (i). It would result in a simple and fair outcome. I did not see the mention of revision usage in the parent link. But it makes absolute sense.

Think we agree on case ii). On case iii) I still think that having 1 local and N-1 remote might be useful when we want to prioritize a particular writer over others. Borrowing the sales example from commenter jchrisa, consider a new user sales-head (local rule). He syncs the data uploaded by his sales folks (remote rule) and then goes offline. When he is done editing and comes online, he wants to make sure his delta takes precedence irrespective of any previous changes by sales folks during his being offline. Since he has local rule, his update will just win. Further, he does not want sales guys who were offline and come online after him to overwrite his last update immediately. I am assuming that the sales folks with "remote" rule will see the data with newer server version and accept it.

pkj | 12 years ago | on: How the Dropbox Datastore API Handles Conflicts – Part Two

Trying to wrap my head around this. Seems difficult without clear usecases.

Let's say I have 10 devices d1,d2....d10 making updates to "a" on the server and went offline. a==20 and last update was by d5 before everyone went offline.

When the devices come back up, the fate of "a" depends on the rulesets. Following are 3 possible high-level combinations.

i) All devices have "remote" rule. On reconnection, everyone rollback "a" to 20. They are essentially back to the time before going offline. Even the device which did the last update(d5) before going offline is rolled back too, which seems bit odd. Still simple to reason with..

ii) All devices have "local" rule. On reconnection, the last device to reconnect updates "a". It is then broadcasted to all other devices. Note that it is not the last device to update "a". Rather it is the last to reconnect (Now, even if all of them reconnect at same time, depending on the queueing at server, the one at the tail wins). Not really simple..

iii) Mix of "remote" and "local" Let's say d1 had "local" rule and all others had "remote". On reconnection, d1's "a" will be propagated to everyone. This is irrespective of the order of reconnection (I am assuming that between reconnections "a" is not modified). This is pretty simple and perfectly predictable. Now, if we have more than one "local", we start getting non-deterministic, and at the extreme move to case ii)

pkj | 12 years ago | on: How the Dropbox Datastore API Handles Conflicts – Part Two

If a client app is processing the cust record, how would cust.email look like ? Typically I would assume a single value, but in case of unresolved conflicts will it be a iterator/array ?

pkj | 12 years ago | on: The STEM Crisis is a Myth

What you describe is very close to how hundreds of thousands of grads are hired into Indian software service industry. Aptitude test followed by training and plenty of time to come to speed and be evaluated. It has worked out fine for them. Layoffs are very rare.

Startups are a different story as time to market is crucial and you have almost no slack.

pkj | 12 years ago | on: The STEM Crisis is a Myth

"plenty of talented developers.."

        ^^^^^^^^

From my experience in bay area & india, I never saw shortage of resumes for open positions. Large enough even if you assume people applying to multiple companies. The shortage though was for talented folks meeting the expectations of mgr/co-workers.