I have recently pushed very simple Map Reduce concept implementation on my Github account (click). My idea was to focus on the concept and mock the rest. You can follow the code to understand how it works but I will enumerate most implementation details:
  • Text files are represented as Strings and stored in memory
  • Master is responsible for splitting the data and scheduling the map/reduce tasks to workers
  • Workers are represented as threads, they are run using CompletableFuture api
  • When map task is finished, the combine function will be run on the result
  • When Master will notice that all map tasks have finished, it will take all resulted distinct keys and pass them as an argument to reduce tasks
  • When reduce tasks are finished, result is printed and the executor is closed
It is easy to define new user programs, check example package.

In memory data:

"file1.txt" -> "This is the first file \ncontent. "
"file2.txt" -> "And this is the second file content. "
"file3.txt" -> "More text in \nthird file"
"file4.txt" -> "And some random text here"
"file5.txt" -> "Why not \none more"
"file6.txt" -> "Lululu tengo manzana"


Distributed Grep (search for "And"):

New map task for key: file5.txt
New map task for key: file4.txt
New map task for key: file3.txt
New map task for key: file6.txt
New map task for key: file1.txt
New map task for key: file2.txt
Combine task for: file2.txt
Combine task for: file4.txt
Writing: IntermediateResult{key=file2.txt, value=And this is the second file content. }
Writing: IntermediateResult{key=file4.txt, value=And some random text here}
New reduce task for: file4.txt
New reduce task for: file2.txt

Results: 
file4.txt - And some random text here
file2.txt - And this is the second file content. 

Word count:

New map task for key: file5.txt
New map task for key: file4.txt
New map task for key: file3.txt
New map task for key: file6.txt
New map task for key: file1.txt
New map task for key: file2.txt
Combine task for: not
Combine task for: here
Combine task for: the
Combine task for: manzana
Combine task for: the
Combine task for: More
Writing: IntermediateResult{key=More, value=1}
Writing: IntermediateResult{key=the, value=1}
Writing: IntermediateResult{key=here, value=1}
Writing: IntermediateResult{key=the, value=1}
Writing: IntermediateResult{key=not, value=1}
Writing: IntermediateResult{key=manzana, value=1}
Combine task for: file
Combine task for: random
Writing: IntermediateResult{key=file, value=1}
Writing: IntermediateResult{key=random, value=1}
Combine task for: file
Combine task for: tengo
Combine task for: file
Combine task for: more
Writing: IntermediateResult{key=tengo, value=1}
Writing: IntermediateResult{key=file, value=1}
Combine task for: some
Combine task for: third
Writing: IntermediateResult{key=some, value=1}
Combine task for: This
Combine task for: Lululu
Writing: IntermediateResult{key=Lululu, value=1}
Writing: IntermediateResult{key=more, value=1}
Writing: IntermediateResult{key=file, value=1}
Combine task for: one
Writing: IntermediateResult{key=one, value=1}
Writing: IntermediateResult{key=This, value=1}
Combine task for: And
Writing: IntermediateResult{key=And, value=1}
Writing: IntermediateResult{key=third, value=1}
Combine task for: in
Combine task for: text
Writing: IntermediateResult{key=in, value=1}
Combine task for: content.
Combine task for: Why
Writing: IntermediateResult{key=content., value=1}
Writing: IntermediateResult{key=Why, value=1}
Combine task for: And
Writing: IntermediateResult{key=And, value=1}
Combine task for: this
Combine task for: is
Combine task for: text
Writing: IntermediateResult{key=text, value=1}
Writing: IntermediateResult{key=text, value=1}
Writing: IntermediateResult{key=is, value=1}
Writing: IntermediateResult{key=this, value=1}
Combine task for: first
Combine task for: content.
Writing: IntermediateResult{key=first, value=1}
Writing: IntermediateResult{key=content., value=1}
Combine task for: is
Writing: IntermediateResult{key=is, value=1}
Combine task for: second
Writing: IntermediateResult{key=second, value=1}
New reduce task for: not
New reduce task for: more
New reduce task for: one
New reduce task for: the
New reduce task for: file
New reduce task for: This
New reduce task for: content.
New reduce task for: first
New reduce task for: Lululu
New reduce task for: More
New reduce task for: in
New reduce task for: text
New reduce task for: Why
New reduce task for: random
New reduce task for: here
New reduce task for: third
New reduce task for: tengo
New reduce task for: manzana
New reduce task for: is
New reduce task for: second
New reduce task for: this
New reduce task for: And
New reduce task for: some

Results: 
not - 1
more - 1
one - 1
Why - 1
the - 2
file - 3
This - 1
content. - 2
is - 2
first - 1
manzana - 1
tengo - 1
Lululu - 1
More - 1
third - 1
in - 1
text - 2
here - 1
random - 1
some - 1
And - 2
this - 1
second - 1

0

Add a comment

I have recently started implementing different distributed system protocols to get some understanding how they work. I think that using Akka Actors to simulate hosts is a good choice because they are easy to set up. What is more you can kill actors on demand to test some failure scenarios.

First protocol I implemented is Paxos - probably the most important consensus protocol, which guarantees storng consistecy, full transactions, read/write failover and prevents from data loss. This comes with the cost of higher latency and lower throughput. In terms of CAP theorem, Paxos is used to make your system CP. I used this sweet article as a reference.

Implementation can be found on my github repository. Go to the Test.scala and run some simulations.

When I started working as a Java Developer, me and my teammate got a first task to repair all broken tests (great task for new starters!) in some old project. Replacing some old configuration and upgrading a few libraries helped making the tests status green but there was another problem. End-to-end test suite took a few hours to pass making it impossible to get the feedback for a new change quickly.

In this article I am going to share some cool features I stumbled upon while coding with Intellij. These are not the most popular/productivity improving ones - for these you should watch this video. 

1. Set debugger breakpoint by pattern

Problem: You want to check if the debugger steps into some code during the run. Usually you suspect where the flow will end up but sometimes finding the exact line could be hard.

I have recently pushed very simple Map Reduce concept implementation on my Github account (click). My idea was to focus on the concept and mock the rest.

In this article I will try to map methods of Java’s Optional to Kotlin’ssimilar scattered language features and built-in functions. The code in the examples is written in Kotlin, because the language has all the JDK classes available.

Representation

Let’s start with the representation.

Have you ever scrolled someone’s code and bumped into this weird method called flatMap, not knowing what it actually does from the context? Or maybe you compared it with method map but didn’t really see much difference? If that is the case then this article is for you. 

flatMap is extremly usfull when you try to do proper functional programming. In Java it usually means using Streams and Optionals - concepts introduced in version 8.

Fact - End-to-end  tests are critical if you want to make sure your software works as it should. To be 100% sure that you covered every (or almost every) possible branch in your business code, it is worth to check what code has been invoked after your E2E suite finished successfully. The solution I am going to present may be much easier, if you are able to stand up your application locally. Sometimes though, it is not the case.

Functional Programming in Java

Stream and Optional classes - added to Java 8 - allow you to have some fun with functional programming. The problem is Java still misses quite a lot to be taken as a serious FP language. Lambda notation and two monads (Optional and Stream) are just the tip of the iceberg. This leads to arising of libraries like vavr or functionaljava - both deriving from purely functional language Haskell.

In this article, I am going to present you a simple trick that will make using java.util.function.Function.andThen() more useful. 

As an example I will use ExternalSystemGateway class, which job is to call external system along with serializing/mapping the messages:

You can see that every line of the invoke method does some kind of action, which transforms some input type to another output type.
1

Checked exceptions & Java 8

Defining custom exceptions (both checked and unchecked) is a common approach to handling errors in Java applications. It usually leads to creating a new class for every different type of error, marking methods with throws keyword or wrapping code with try-catch blocks. This can lead to the code which is hard to read since every block adds another level of complexity. 

Lambdas in Java 8 started the boom for functional approach in writing the code.
1
Popular Posts
Popular Posts
About Me
About Me
Labels
Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.