Wednesday, March 22, 2017

How Hadoop works


Hadoop divides the given input file into small parts to increase parallel processing. It uses its own file system called HDFS. Each spitted file is assigned to the mapper which works on the same physical machine with the given chunk. 

Mappers are processing small file chunks and passing their processing results to context.Mappers are processing splitted files (each chunk {piece of the main file} size = HDFS block size) line by line in the map function .

Hadoop supports different programming languages so it uses its own serilization/deseriliazation mechanism. That why you see IntWritable, LongWritable,etc types in the examples. You can write your own Writable classess by implementing the Writable interface according to your requirements.

Hadoop collects all different outputs of the mappers and sort them by KEY and forwards these results to Reducers.


"Book says all values with same key will go to same reducer"



map (Key inputKey, Value inputValue, Key outputKey, Value outputValue)


reduce (Key inputKeyFromMapper, Value inputValueFromMapper, Key outputKey, Value output value)



Hadoop calls reduce function for the each line of given file.

And finally writes the result of reducers to the HDFS file system.

See the WordCount example for better understanding : hadoop-wordcount-example


1 comment:

  1. Thanks for sharing, nice post! Post really provice useful information!

    An Thái Sơn chia sẻ trẻ sơ sinh nằm nôi điện có tốt không hay võng điện có tốt không và giải đáp cục điện đưa võng giá bao nhiêu cũng như mua máy đưa võng ở tphcm địa chỉ ở đâu uy tín.

    ReplyDelete