- Text files are represented as Strings and stored in memory
- Master is responsible for splitting the data and scheduling the map/reduce tasks to workers
- Workers are represented as threads, they are run using CompletableFuture api
- When map task is finished, the combine function will be run on the result
- When Master will notice that all map tasks have finished, it will take all resulted distinct keys and pass them as an argument to reduce tasks
- When reduce tasks are finished, result is printed and the executor is closed
It is easy to define new user programs, check example package.
In memory data:
"file1.txt" -> "This is the first file \ncontent. " "file2.txt" -> "And this is the second file content. " "file3.txt" -> "More text in \nthird file" "file4.txt" -> "And some random text here" "file5.txt" -> "Why not \none more" "file6.txt" -> "Lululu tengo manzana"
Distributed Grep (search for "And"):
New map task for key: file5.txt New map task for key: file4.txt New map task for key: file3.txt New map task for key: file6.txt New map task for key: file1.txt New map task for key: file2.txt Combine task for: file2.txt Combine task for: file4.txt Writing: IntermediateResult{key=file2.txt, value=And this is the second file content. } Writing: IntermediateResult{key=file4.txt, value=And some random text here} New reduce task for: file4.txt New reduce task for: file2.txt Results: file4.txt - And some random text here file2.txt - And this is the second file content.
Word count:
New map task for key: file5.txt New map task for key: file4.txt New map task for key: file3.txt New map task for key: file6.txt New map task for key: file1.txt New map task for key: file2.txt Combine task for: not Combine task for: here Combine task for: the Combine task for: manzana Combine task for: the Combine task for: More Writing: IntermediateResult{key=More, value=1} Writing: IntermediateResult{key=the, value=1} Writing: IntermediateResult{key=here, value=1} Writing: IntermediateResult{key=the, value=1} Writing: IntermediateResult{key=not, value=1} Writing: IntermediateResult{key=manzana, value=1} Combine task for: file Combine task for: random Writing: IntermediateResult{key=file, value=1} Writing: IntermediateResult{key=random, value=1} Combine task for: file Combine task for: tengo Combine task for: file Combine task for: more Writing: IntermediateResult{key=tengo, value=1} Writing: IntermediateResult{key=file, value=1} Combine task for: some Combine task for: third Writing: IntermediateResult{key=some, value=1} Combine task for: This Combine task for: Lululu Writing: IntermediateResult{key=Lululu, value=1} Writing: IntermediateResult{key=more, value=1} Writing: IntermediateResult{key=file, value=1} Combine task for: one Writing: IntermediateResult{key=one, value=1} Writing: IntermediateResult{key=This, value=1} Combine task for: And Writing: IntermediateResult{key=And, value=1} Writing: IntermediateResult{key=third, value=1} Combine task for: in Combine task for: text Writing: IntermediateResult{key=in, value=1} Combine task for: content. Combine task for: Why Writing: IntermediateResult{key=content., value=1} Writing: IntermediateResult{key=Why, value=1} Combine task for: And Writing: IntermediateResult{key=And, value=1} Combine task for: this Combine task for: is Combine task for: text Writing: IntermediateResult{key=text, value=1} Writing: IntermediateResult{key=text, value=1} Writing: IntermediateResult{key=is, value=1} Writing: IntermediateResult{key=this, value=1} Combine task for: first Combine task for: content. Writing: IntermediateResult{key=first, value=1} Writing: IntermediateResult{key=content., value=1} Combine task for: is Writing: IntermediateResult{key=is, value=1} Combine task for: second Writing: IntermediateResult{key=second, value=1} New reduce task for: not New reduce task for: more New reduce task for: one New reduce task for: the New reduce task for: file New reduce task for: This New reduce task for: content. New reduce task for: first New reduce task for: Lululu New reduce task for: More New reduce task for: in New reduce task for: text New reduce task for: Why New reduce task for: random New reduce task for: here New reduce task for: third New reduce task for: tengo New reduce task for: manzana New reduce task for: is New reduce task for: second New reduce task for: this New reduce task for: And New reduce task for: some Results: not - 1 more - 1 one - 1 Why - 1 the - 2 file - 3 This - 1 content. - 2 is - 2 first - 1 manzana - 1 tengo - 1 Lululu - 1 More - 1 third - 1 in - 1 text - 2 here - 1 random - 1 some - 1 And - 2 this - 1 second - 1
Add a comment