top | item 23791025

(no title)

bmcfeeley | 5 years ago

I'm no expert, but my understanding is that pig is a combination of

- a language for specifying data transformations, and

- an engine to compile programs written in that language into mapreduce jobs to execute on a hadoop cluster

it was designed to easily map some common functional and SQL idioms (e.g. filter, group by w/ aggregation functions) to parallel execution for processing huge amounts of data.

Impala is another big data project that is an engine for planning and executing SQL on data stored in a hadoop cluster.

Zookeeper is... black magic??

discuss

No comments yet.