top | item 10089735

(no title)

angrybits | 10 years ago

You are either being dishonest or just patently out of your mind if you think that you can query 100MB and 100PB in the same way. That's not even reasonable by HN standards of hyperbole. Do you have any idea how many orders of magnitude that is?

discuss

parasubvert|10 years ago

He's talking about Apache Spark.

While it can handle 100 MB easily there probably are faster ways to handle that small amount of data. But yes, Spark can handle many PB and doesn't require a ton of changes in the code as you scale up from say 10 TB to 100 PB. The underlying cluster would change, and the performance profile would change a lot (10 TB can be done in-memory ... many PB, not so much)

dang|10 years ago

Please don't make technical comments (or any comments) in this inflammatory, nasty way.

We're lucky that you didn't spark a horrible flamewar, but instead got patient, factual replies.

angrybits|10 years ago

I know. And I'm sorry, bitterly sorry, but I know that... no apologies I can make can alter the fact that in our thread you have been given a dirty, filthy, smelly piece of technical argument.

MichaelGG|10 years ago

He's right. Pretend you have 100PB. Write code for that. It'll work for 100MB but have terrible overheads.

parasubvert|10 years ago

He's talking about writing it for Apache Spark.

Which really isn't intended for 100 MB (I bet I could write a unix pipe & filter script that's faster than Spark), but is intended for 10 TB through several PB.