Wednesday, 20 March 2013

Separating email sending mechanism : A new approach

Almost 72% of the computer programmed applications in the world are web applications. Among which, around 81% of applications have batch processes integrated in them. Among which, nearly 86% of batch processes indirectly interact with the users of the application by email or by any other means of communication. I know that 89% of the readers will get bored and start to abuse me if I continue providing such crappy statistics. So, let me come to the point now. By the way, the statistics provided above are completely nonsense and no reference was taken from anywhere for them.

Describing the batch processes which indirectly interact with the members, their general behavior is, pick up the members one by one from the database, process their data, and once the process gets complete, send an email to them, pick up the next member and so on. Let us take an example of a batch process which generates the bill for mobile connection owners. I have provided the flow diagram of such batch process.




Digging deep into the ‘Send email to Member’ step, we can say that, for large applications, at any point of time, there may be more than one batch process executing and in turn, sending the emails to the members.  If the total number of members is high then, at any point of time, there will be tons and tons of emails which need to be sent via email server. Due to such load, the email server may get slowed down and hence, batch processes may have to wait until the email is sent. Imagine this scenario; you want to post a letter to someone, so you are going to the post office. The bloke at the post office says, “Sir, our postman is a bit busy right now in delivering the letter posted by the customers who came before you. So, you will have to wait for him to come and take the letter from you.” How annoying it would be! You will grab his collar, may be slap him in the face and shout, “bitch! I have other tasks to be completed, why should I wait if the problem is at your end?” Unfortunately, batch processes can’t do either of these; they will just have to wait.

The approach I am talking about is to completely separate out the email sending mechanism from the batch processes which will cause the batch processes to execute at the speed of light (provided that the other optimization aspects are also taken care of). There would be a separate background process which will send the emails to the members. All the other batch processes will dump the data into some temp table. A background process will pick up the data from it and send the emails. After sending the emails, it will delete that row from the table. The flow diagrams of both the processes are shown below.




Look, how simple the billing batch process looks just by changing one step! Apart from being simple, it also increases cohesion, an essential aspect of any application design. Also, this does not apply only for batch processes; the other real time processes (like registration on website, password changing, deactivation etc) can also dump the data into temp table to avoid waiting for email sending activity.

Now, getting into more insight of Send email batch process functionally, it should be a background process scheduled to run at a predefined frequency (like every 5 minutes or so). All the other batch processes should dump the data into the temp table. This process should pick up the records from that table and send the emails. After sending the email, the batch should delete that record. It may store the sent email information somewhere else for audit purpose; however, it depends on the functionality. If the email sending fails due to any error then, that record should not be deleted so that it can be fetched into the next iteration.

Going into the technical details, Java provides an excellent framework called ‘Executor framework’ to implement such functionalities. It is a multi threaded way to achieve the desired behavior. All it requires is, a queue (or we can say, a list), number of threads and logic which needs to be executed for each and every element inside the queue. And voila, your job is done! You don’t need to be arsed about the thread aspects like start, stop, sleep, deadlock, etc; executor will take care of them for you. All you need to focus on is business logic. In context of our send email process, one should fill the queue with records from the temp table, define the number of threads and write email sending logic in the executor method (there are multiple types of executors and hence the methods, you should pick according to your requirement). Each thread in execution will pop a record from the queue and process it using the logic specified. Once it is done, it will pop the next record. E.g. If we have 1000 records to be processed and we have defined 5 threads then, in the start, five records will be popped from the queue and processed simultaneously, then another 5 and so on. This yields to high throughput, less execution time and high performance. More information on executor framework can be found here.

Around 62% of software professionals in the world are Java professionals. Among whom, 37% work on multi threading of Java. Among whom, only 4% know about the executor framework, which is strange. By the way, can you guess where these statistics came from? Yes, you guessed it right!