I have recently pushed very simple Map Reduce concept implementation on my Github account (click). My idea was to focus on the concept and mock the rest. You can follow the code to understand how it works but I will enumerate most implementation details:
  • Text files are represented as Strings and stored in memory
  • Master is responsible for splitting the data and scheduling the map/reduce tasks to workers
  • Workers are represented as threads, they are run using CompletableFuture api
  • When map task is finished, the combine function will be run on the result
  • When Master will notice that all map tasks have finished, it will take all resulted distinct keys and pass them as an argument to reduce tasks
  • When reduce tasks are finished, result is printed and the executor is closed
It is easy to define new user programs, check example package.

In memory data:

"file1.txt" -> "This is the first file \ncontent. "
"file2.txt" -> "And this is the second file content. "
"file3.txt" -> "More text in \nthird file"
"file4.txt" -> "And some random text here"
"file5.txt" -> "Why not \none more"
"file6.txt" -> "Lululu tengo manzana"


Distributed Grep (search for "And"):

New map task for key: file5.txt
New map task for key: file4.txt
New map task for key: file3.txt
New map task for key: file6.txt
New map task for key: file1.txt
New map task for key: file2.txt
Combine task for: file2.txt
Combine task for: file4.txt
Writing: IntermediateResult{key=file2.txt, value=And this is the second file content. }
Writing: IntermediateResult{key=file4.txt, value=And some random text here}
New reduce task for: file4.txt
New reduce task for: file2.txt

Results: 
file4.txt - And some random text here
file2.txt - And this is the second file content. 

Word count:

New map task for key: file5.txt
New map task for key: file4.txt
New map task for key: file3.txt
New map task for key: file6.txt
New map task for key: file1.txt
New map task for key: file2.txt
Combine task for: not
Combine task for: here
Combine task for: the
Combine task for: manzana
Combine task for: the
Combine task for: More
Writing: IntermediateResult{key=More, value=1}
Writing: IntermediateResult{key=the, value=1}
Writing: IntermediateResult{key=here, value=1}
Writing: IntermediateResult{key=the, value=1}
Writing: IntermediateResult{key=not, value=1}
Writing: IntermediateResult{key=manzana, value=1}
Combine task for: file
Combine task for: random
Writing: IntermediateResult{key=file, value=1}
Writing: IntermediateResult{key=random, value=1}
Combine task for: file
Combine task for: tengo
Combine task for: file
Combine task for: more
Writing: IntermediateResult{key=tengo, value=1}
Writing: IntermediateResult{key=file, value=1}
Combine task for: some
Combine task for: third
Writing: IntermediateResult{key=some, value=1}
Combine task for: This
Combine task for: Lululu
Writing: IntermediateResult{key=Lululu, value=1}
Writing: IntermediateResult{key=more, value=1}
Writing: IntermediateResult{key=file, value=1}
Combine task for: one
Writing: IntermediateResult{key=one, value=1}
Writing: IntermediateResult{key=This, value=1}
Combine task for: And
Writing: IntermediateResult{key=And, value=1}
Writing: IntermediateResult{key=third, value=1}
Combine task for: in
Combine task for: text
Writing: IntermediateResult{key=in, value=1}
Combine task for: content.
Combine task for: Why
Writing: IntermediateResult{key=content., value=1}
Writing: IntermediateResult{key=Why, value=1}
Combine task for: And
Writing: IntermediateResult{key=And, value=1}
Combine task for: this
Combine task for: is
Combine task for: text
Writing: IntermediateResult{key=text, value=1}
Writing: IntermediateResult{key=text, value=1}
Writing: IntermediateResult{key=is, value=1}
Writing: IntermediateResult{key=this, value=1}
Combine task for: first
Combine task for: content.
Writing: IntermediateResult{key=first, value=1}
Writing: IntermediateResult{key=content., value=1}
Combine task for: is
Writing: IntermediateResult{key=is, value=1}
Combine task for: second
Writing: IntermediateResult{key=second, value=1}
New reduce task for: not
New reduce task for: more
New reduce task for: one
New reduce task for: the
New reduce task for: file
New reduce task for: This
New reduce task for: content.
New reduce task for: first
New reduce task for: Lululu
New reduce task for: More
New reduce task for: in
New reduce task for: text
New reduce task for: Why
New reduce task for: random
New reduce task for: here
New reduce task for: third
New reduce task for: tengo
New reduce task for: manzana
New reduce task for: is
New reduce task for: second
New reduce task for: this
New reduce task for: And
New reduce task for: some

Results: 
not - 1
more - 1
one - 1
Why - 1
the - 2
file - 3
This - 1
content. - 2
is - 2
first - 1
manzana - 1
tengo - 1
Lululu - 1
More - 1
third - 1
in - 1
text - 2
here - 1
random - 1
some - 1
And - 2
this - 1
second - 1

0

Add a comment

I have recently started implementing different distributed system protocols to get some understanding how they work. I think that using Akka Actors to simulate hosts is a good choice because they are easy to set up. What is more you can kill actors on demand to test some failure scenarios.

When I started working as a Java Developer, me and my teammate got a first task to repair all broken tests (great task for new starters!) in some old project. Replacing some old configuration and upgrading a few libraries helped making the tests status green but there was another problem.

In this article I am going to share some cool features I stumbled upon while coding with Intellij. These are not the most popular/productivity improving ones - for these you should watch this video. 

1.

I have recently pushed very simple Map Reduce concept implementation on my Github account (click). My idea was to focus on the concept and mock the rest.

In this article I will try to map methods of Java’s Optional to Kotlin’ssimilar scattered language features and built-in functions. The code in the examples is written in Kotlin, because the language has all the JDK classes available.

Representation

Let’s start with the representation.

Have you ever scrolled someone’s code and bumped into this weird method called flatMap, not knowing what it actually does from the context? Or maybe you compared it with method map but didn’t really see much difference? If that is the case then this article is for you.

Fact - End-to-end  tests are critical if you want to make sure your software works as it should. To be 100% sure that you covered every (or almost every) possible branch in your business code, it is worth to check what code has been invoked after your E2E suite finished successfully.

Functional Programming in Java

Stream and Optional classes - added to Java 8 - allow you to have some fun with functional programming. The problem is Java still misses quite a lot to be taken as a serious FP language.

In this article, I am going to present you a simple trick that will make using java.util.function.Function.andThen() more useful.

1

Checked exceptions & Java 8

Defining custom exceptions (both checked and unchecked) is a common approach to handling errors in Java applications. It usually leads to creating a new class for every different type of error, marking methods with throws keyword or wrapping code with try-catch blocks.

1
Blog Archive
Loading