Almost
72% of the computer programmed applications in the world are web applications.
Among which, around 81% of applications have batch processes integrated in
them. Among which, nearly 86% of batch processes indirectly interact with the users of the
application by email or by any other means of communication. I know that 89% of
the readers will get bored and start to abuse me if I continue providing such crappy statistics. So, let me come to the point now. By the way, the statistics
provided above are completely nonsense and no reference was taken from anywhere
for them.
Describing the batch processes which indirectly interact with the members, their general behavior is, pick up the members one by one from the database, process their data, and once the process gets complete, send an email to them, pick up the next member and so on. Let us take an example of a batch process which generates the bill for mobile connection owners. I have provided the flow diagram of such batch process.
Describing the batch processes which indirectly interact with the members, their general behavior is, pick up the members one by one from the database, process their data, and once the process gets complete, send an email to them, pick up the next member and so on. Let us take an example of a batch process which generates the bill for mobile connection owners. I have provided the flow diagram of such batch process.
Digging
deep into the ‘Send email to Member’ step, we can say that, for large
applications, at any point of time, there may be more than one batch process
executing and in turn, sending the emails to the members.  If the total number of members is high then,
at any point of time, there will be tons and tons of emails which need to be
sent via email server. Due to such load, the email server may get slowed down
and hence, batch processes may have to wait until the email is sent. Imagine
this scenario; you want to post a letter to someone, so you are going to the
post office. The bloke at the post office says, “Sir, our postman is a bit busy
right now in delivering the letter posted by the customers who came before you. So, you will have to wait for him to come and take the letter from
you.” How annoying it would be! You will grab his collar, may be slap him in
the face and shout, “bitch! I have other tasks to be completed, why should I
wait if the problem is at your end?” Unfortunately, batch processes can’t do
either of these; they will just have to wait.
The
approach I am talking about is to completely separate out the email sending
mechanism from the batch processes which will cause the batch processes to
execute at the speed of light (provided that the other optimization aspects are also taken care of). There would be a separate background process
which will send the emails to the members. All the other batch processes will
dump the data into some temp table. A background process will pick up the data
from it and send the emails. After sending the emails, it will delete that row
from the table. The flow diagrams of both the processes are shown below.
Look,
how simple the billing batch process looks just by changing one step! Apart
from being simple, it also increases cohesion, an essential aspect of any
application design. Also, this does not apply only for batch processes; the
other real time processes (like registration on website, password changing,
deactivation etc) can also dump the data into temp table to avoid waiting for
email sending activity.
Now,
getting into more insight of Send email batch process functionally, it should
be a background process scheduled to run at a predefined frequency (like every
5 minutes or so). All the other batch processes should dump the data into the
temp table. This process should pick up the records from that table and send
the emails. After sending the email, the batch should delete that record. It
may store the sent email information somewhere else for audit purpose; however,
it depends on the functionality. If the email sending fails due to any error
then, that record should not be deleted so that it can be fetched into the next
iteration.
Going
into the technical details, Java provides an excellent framework called
‘Executor framework’ to implement such functionalities. It is a multi threaded
way to achieve the desired behavior. All it requires is, a queue (or we can
say, a list), number of threads and logic which needs to be executed for each
and every element inside the queue. And voila, your job is done! You don’t need
to be arsed about the thread aspects like start, stop, sleep, deadlock, etc;
executor will take care of them for you. All you need to focus on is business
logic. In context of our send email process, one should fill the queue with
records from the temp table, define the number of threads and write email
sending logic in the executor method (there are multiple types of executors and
hence the methods, you should pick according to your requirement). Each thread in execution will pop a record from the queue and process it using the logic specified. Once it is done, it will pop the next record. E.g. If we have 1000 records to be processed and we have defined 5 threads then, in the start, five records will be popped from the queue and processed simultaneously, then another 5 and so on. This yields to high throughput, less execution time and high performance. More information
on executor framework can be found here.
Around
62% of software professionals in the world are Java professionals. Among whom, 37%
work on multi threading of Java. Among whom, only 4% know about the executor
framework, which is strange. By the way, can you guess where these statistics came
from? Yes, you guessed it right!



 
Nice article!!
ReplyDeleteBTW I have a question: How much overhead and delay is introduced because of the entry in database.and it is suitable for real time application?
There is a trade off between the frequency of email notification batch process and delay caused by database operations. The higher the frequency, the lesser amount of data it has to deal with. In general, any frequency between 5 to 10 minutes is ideal and causes negligible database overhead.
ReplyDeleteAs far as real time applications are concerned, the frequency should be set as low as possible. Again, it depends on the urgency of the email communication.
In short, we can say that the frequency plays pivotal role in the performance of this process.