I have recently pushed very simple Map Reduce concept implementation on my Github account (click). My idea was to focus on the concept and mock the rest. You can follow the code to understand how it works but I will enumerate most implementation details:
  • Text files are represented as Strings and stored in memory
  • Master is responsible for splitting the data and scheduling the map/reduce tasks to workers
  • Workers are represented as threads, they are run using CompletableFuture api
  • When map task is finished, the combine function will be run on the result
  • When Master will notice that all map tasks have finished, it will take all resulted distinct keys and pass them as an argument to reduce tasks
  • When reduce tasks are finished, result is printed and the executor is closed
It is easy to define new user programs, check example package.

In memory data:

"file1.txt" -> "This is the first file \ncontent. "
"file2.txt" -> "And this is the second file content. "
"file3.txt" -> "More text in \nthird file"
"file4.txt" -> "And some random text here"
"file5.txt" -> "Why not \none more"
"file6.txt" -> "Lululu tengo manzana"


Distributed Grep (search for "And"):

New map task for key: file5.txt
New map task for key: file4.txt
New map task for key: file3.txt
New map task for key: file6.txt
New map task for key: file1.txt
New map task for key: file2.txt
Combine task for: file2.txt
Combine task for: file4.txt
Writing: IntermediateResult{key=file2.txt, value=And this is the second file content. }
Writing: IntermediateResult{key=file4.txt, value=And some random text here}
New reduce task for: file4.txt
New reduce task for: file2.txt

Results: 
file4.txt - And some random text here
file2.txt - And this is the second file content. 

Word count:

New map task for key: file5.txt
New map task for key: file4.txt
New map task for key: file3.txt
New map task for key: file6.txt
New map task for key: file1.txt
New map task for key: file2.txt
Combine task for: not
Combine task for: here
Combine task for: the
Combine task for: manzana
Combine task for: the
Combine task for: More
Writing: IntermediateResult{key=More, value=1}
Writing: IntermediateResult{key=the, value=1}
Writing: IntermediateResult{key=here, value=1}
Writing: IntermediateResult{key=the, value=1}
Writing: IntermediateResult{key=not, value=1}
Writing: IntermediateResult{key=manzana, value=1}
Combine task for: file
Combine task for: random
Writing: IntermediateResult{key=file, value=1}
Writing: IntermediateResult{key=random, value=1}
Combine task for: file
Combine task for: tengo
Combine task for: file
Combine task for: more
Writing: IntermediateResult{key=tengo, value=1}
Writing: IntermediateResult{key=file, value=1}
Combine task for: some
Combine task for: third
Writing: IntermediateResult{key=some, value=1}
Combine task for: This
Combine task for: Lululu
Writing: IntermediateResult{key=Lululu, value=1}
Writing: IntermediateResult{key=more, value=1}
Writing: IntermediateResult{key=file, value=1}
Combine task for: one
Writing: IntermediateResult{key=one, value=1}
Writing: IntermediateResult{key=This, value=1}
Combine task for: And
Writing: IntermediateResult{key=And, value=1}
Writing: IntermediateResult{key=third, value=1}
Combine task for: in
Combine task for: text
Writing: IntermediateResult{key=in, value=1}
Combine task for: content.
Combine task for: Why
Writing: IntermediateResult{key=content., value=1}
Writing: IntermediateResult{key=Why, value=1}
Combine task for: And
Writing: IntermediateResult{key=And, value=1}
Combine task for: this
Combine task for: is
Combine task for: text
Writing: IntermediateResult{key=text, value=1}
Writing: IntermediateResult{key=text, value=1}
Writing: IntermediateResult{key=is, value=1}
Writing: IntermediateResult{key=this, value=1}
Combine task for: first
Combine task for: content.
Writing: IntermediateResult{key=first, value=1}
Writing: IntermediateResult{key=content., value=1}
Combine task for: is
Writing: IntermediateResult{key=is, value=1}
Combine task for: second
Writing: IntermediateResult{key=second, value=1}
New reduce task for: not
New reduce task for: more
New reduce task for: one
New reduce task for: the
New reduce task for: file
New reduce task for: This
New reduce task for: content.
New reduce task for: first
New reduce task for: Lululu
New reduce task for: More
New reduce task for: in
New reduce task for: text
New reduce task for: Why
New reduce task for: random
New reduce task for: here
New reduce task for: third
New reduce task for: tengo
New reduce task for: manzana
New reduce task for: is
New reduce task for: second
New reduce task for: this
New reduce task for: And
New reduce task for: some

Results: 
not - 1
more - 1
one - 1
Why - 1
the - 2
file - 3
This - 1
content. - 2
is - 2
first - 1
manzana - 1
tengo - 1
Lululu - 1
More - 1
third - 1
in - 1
text - 2
here - 1
random - 1
some - 1
And - 2
this - 1
second - 1

0

Add a comment

Popular Posts
Popular Posts
About Me
About Me
Labels
Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.