Tuesday, 31 December 2013

The dreaded Production Outage


Outage. The word which sends shivers down our spine when me or my team mates hear about it. It is one of the most fearful words of the software industry. It can vanish all of your joy/confidence in a heartbeat. Just when you think you are doing good and gaining the confidence/trust of your client, suddenly, out of nowhere, in a pleasant morning, you will see an email from your client (with a little red exclamation mark) saying that they are facing an outage in the production environment. Generally, it will be followed by a couple of escalation emails from your manager as well as leader. Bang! All your good deeds are as good as nothing now. The misery won’t stop here; you will be immediately called into an emergency meeting. In this, you will get your ass whipped and will be asked to find the root cause of this outage and provide a lengthy, Shakespeare style email (with all the decorations and formatting) to the client. And, all you will have for analysis (i.e. data and information) will be, well, nothing! This is the subject of this blog. Let’s start with it.

There are multiple definitions for outage. Assume that yours is a web application which is hosted on so and so server. One instance may refer to the server crash (due to any hardware/software failure) which would make all requests get rejected with http 4XX (or any similar) error message. One instance may refer to the crashing of the application hosted on server, resulting in http 5XX error message. One instance may refer to database crash, which will make all the operations fail with ‘technical error’ (or similar) error. One instance (which we will emphasize more in this blog) may refer to the application becoming unresponsive. This is very mysterious instance. The server seems to be working fine, the application seems to be working fine, the database seems to be working fine but still, when someone sends request to server, the browser will display ‘processing’ symbol and then it will time out eventually. See? I told you it is much deeper than it seems.

We faced similar situations multiple times in last month. The client kept on saying that the services became unresponsive and they had to restart the server to make the services available again. For our analysis, we asked for server logs, error logs, database logs, application logs, call logs, message logs, Whatsapp logs and any other type of logs which may exist on this planet. We got all the logs in some time and found nothing. Absolutely nothing! We were certain that there was not any hardware/server/database fault. After some googling and analysis, we found that there must be something wrong with JVM. Hence, we asked them to provide the heap dump of the JVM of production environment.

Let me explain about the heap dump first, well, not exactly heap dump. Let me start with the heap first. Heap (or I should say, heap memory) is an area which is utilized by JVM to allocate the memory for newly created objects. Say I have written a line in java program like Date d = new Date();. This will create a new Date object. That object will be stored in the heap memory. It will be alive as long as we have at least one reference (in our case, d) in our program. When all the references are destroyed, the object will be garbage collected by garbage collector (more on that later). Heap has two parts mainly:

  • Eden Space: This is the part of the heap where the newly created objects will be stored. The memory for all the new objects is initially allocated from this part. In our life, Eden space refers to school.
  • Survivor Space: When JVM ‘thinks’ that young gen space is almost full, it invokes garbage collector on it. GC cleans up all the objects which do not have any valid reference in the executing program(s). The objects which survive this garbage collection are moved to Survivor space (a reward for their so called achievement!). In our life, Survivor space refers to college. Ironically, our days to survive start from college too!
              Both Eden and Survivor space are parts of Young Gen space.
  • Tenured Generation: This area contains the objects which are alive for quite a long time. Examples of such objects are static objects, instance variables of servlet class etc. It is also known as Old Gen space. In our life, this refers to the place where you are probably sitting and reading this blog i.e office.
That was all about heap. Before we start with heap dump, let me explain one more term:
  • Permanent Generation:  When we create an instance of a class or let’s say, when we call any static method/member of a class, the respective class gets loaded in the memory.  In our previous example, Date class will be loaded in the memory before Date object is created. Permanent Generation space is the space where all the loaded classes are stored. Permanent Generation is not a part of heap memory. I couldn't find any mapping for this space in our life. Suggestions are welcome.
Now, let’s start with the heap dump. A heap dump is a binary or textual representation of Java heap which is written to a file. It is like a black box of the air craft. A heap dump has all the information about the objects present in the heap, their object tree and the memory occupied by them. There are many tools available in the market which can read such heap dumps and create the object charts with all the statistics. MAT plugin for eclipse is the popular and free tool. Another licensed tool is YourKit. JVisualVM (comes packaged along with java 6) is also very useful tool in heap dump analysis.

Back to our story, when we got the heap dumps, we started analyzing them. During our analysis, we saw a couple of large stringbuffer objects into it (around 3 GB each). They were occupying 6 GB. Also, there were many other heavy objects which ate up to 2 GB. In the production environment, JVM heap size was configured to 8 GB. All these large objects ate up all the memory, leaving no room for creation of any new object. Hence, the whole system went into comma and stopped responding the requests. The mystery was finally solved!

This was not the end though; identifying the problem was just a small step. We had to find the root cause of generation of that object and eventually, provide a patch which would prevent such situation from occurring. We analyzed that and it was one specific case of looping which appended characters to string buffer object (more on this later). But the presence of large objects in heap dump spawned a very interesting question: Java has much applauded Garbage collection algorithm. What the heck is it and why it was not able to clear these large objects?

Main task of Garbage collector is to clean up the objects which are not needed by the program. The criteria to decide which objects are ‘not needed’ is very complex. There are plenty of algorithms for garbage collection. Detailed explanation for the same is provided here. Instead of going into more details of those algos, we will focus on how garbage collector works with the JVM. There are 2 types of garbage collections:

  1. Concurrent GC: Concurrent garbage collector uses a single thread which runs concurrently with the java application. The main motive of concurrent generator is to prevent tenured generation space from filling up. When concurrent GC is unable to reclaim the memory, full GC is triggered.
  2. Parallel GC or Full GC: Parallel garbage collector belongs to the family of stop the world collector. It won’t start until old generation space runs out of memory. But as soon as that happens, Full GC starts, it will cause all the application thread to pause, resulting in unresponsiveness of the application.
In the production environment, Full GC is configured to kick in when old gen space is 70% used. When that happens, Full GC will kick in and CPU spike will touch 100% mark, causing all the requests to halt.

Back to our string buffer problem, we were storing a string buffer object in user’s session. We were performing some operations on it which used to result in increase/decrease in the size of that object. However, in one specific scenario, such objects (in user’s sessions) used to grow indefinitely. As they were in session, they were live. And GC was not able to recollect them as garbage (obviously it shouldn’t!), this increase led to triggering of full GC and eventually, bringing down the whole application. What a misery!

  As it is very difficult to trace and fix such problems, I have provided some guidelines which one should consider while coding to avoid such disasters.
  • Never store any heavy objects in session, store them in requests instead.
  • While writing any recursive or looping method/logic, be extra careful. Test all the scenarios. In most of the cases, the method will work in 99 out of 100 scenarios. But, the remaining 1 scenario will come back to bite you in the back side when your application gets deployed on production.
  • Be very careful while using third party jars. Use object caching/object pooling concepts in a thriftier manner. Gather enough information about the class loading/memory consumption of a third party library before implementing in the application.
  • Execute the performance testing of each and every smallest possible feature of the application. There are plenty of free tools available in the market for performance testing (like BadBoy, JMeter, Load runner etc).
  • It is always better to have 5 null pointer exceptions than 1 out of memory error in your code.
  • There are plenty of such points. You can search on google. It seems tedious to me to write such points here.
Well, back to the original story (for the 135762432nd time), we managed to trace and eventually fix that problem. The system then started working like a charm! The joy was back. The confidence was back. The trust was back. Things started to work very smoothly. However, the god of outage, sitting in a certain corner of the world didn't like that. Hence, he sent another one from his troops to say hi to us a month later.

And, here we are, battling against another outage. This time, even heap dump has refused to provide the enough information. Preliminary analysis says that the permanent generation space (does it ring any bells?) gets filled up with the classes which get loaded multiple times. This seems to be the whole new world. However, we are ready for the battle, with our weapons, to fire in the dark J

~ Au revoir!

Friday, 4 October 2013

Tortoise SVN Pre-commit hooks: Hook yourself to safety

Writing something after a long time, well; not that much long but relatively long I should say. This has been a fantastic week for me, both footballistically and technically. Footballistically, Arsenal won back to back matches this week. In the match played in midweek, they showed some scintillating football. It contained crisp passing, swift movements, darting runs, insane finish and gorgeous understanding between the players. It was like watching porn on a football pitch. Reaction of Arsène Wenger (manager) after the first goal was priceless! Let me tell you, he deserves each and every bit of it. The efforts he put in to create such team were immense. He created the whole empire right from the scratch and now he is enjoying the fruits. What a guy he is! The talent he is having is unparalled. (*deviating from topic alarm*) Let me come out of this fantasy and start discussing what I am here to discuss about. Change of topic is good sometimes by the way.

I said this has been a good week for me technically also. I developed a solution for a very specific and rare problem but it was very useful. Let me provide a preface first.

We are using Tortoise SVN as a version control tool. The repository resides at client side. Hence, whenever you commit something to the branch, it directly goes overseas and clients can see the changes. We are using JUnit framework for unit testing and hence, test classes also go overseas. In test classes, generally we have many crap things termed as ‘test data’. Those things include database urls, passwords, file paths, encryption keys etc. Before sending these classes overseas, we have to clear all these data.

In English, there is an idiom which says, ‘To err is human’. All the members working in my team are humans (FYI!). Well, not sure about one or two (including myself) but rest all are humans. Hence, we often forget to delete those test data before committing the files. In last week, for the umpteenth time, we received an email from client saying that our classes contained sensitive data and all. It was a typical client escalation. They said that in spite of saying the same 12672 times, they were still seeing sensitive data being committed. This spurred flurry of meetings and discussions on our side and we decided to introduce something which would prevent a developer from committing the file if it contains any sensitive information. 

I googled a lot about this and found an approach to prevent this. Tortoise SVN provides the facility to run our own scripts on various events like Pre Commit, Post Commit, Pre Update, Post Update etc. The most suitable event for me was pre commit event. What I did was; I created a java program which would search for particular text in each and every file which is being committed. It would throw an exception if a match is found which would make the commit process stop. These scripts are called hook scripts. There are 2 types of hooks: Server side hooks and client side hooks. Server side hooks are installed on server and they would work for all the commits whereas client side hooks need to be installed on each and every client machine. We did not have SVN server hence server side hooks were not of any use for us. We then decided to go with client side hooks.

Let me explain you how SVN works. When we check the checkboxes against file names and click on commit, SVN creates a .tmp file in Temp folder of user (considering windows machine). This file contains multiple lines. Each and every line of this .tmp file is full path of file which is being committed (i.e. if 4 files are being committed then .tmp will have 4 lines). SVN also supplies the path of .tmp file to hook script (there are other arguments also which are passed but we don’t require those). So, the script I wrote used to read this file line by line then, read the file present at that path. If it encountered any keyword (like password etc) then it threw exception resulting in failure in commit process. I created a batch script (.bat file) which called jar file (containing my code). Following are the steps to attach a hook script:

  • Open Settings window of tortoise SVN and go to hook scripts as shown in the image below:
  • Click on ‘Hook Scripts’. It will open hook scripts menu as shown below:

  • In Hook Type, select pre commit.
  • In the Working copy path, select the path where you have created the branch.
  • In the Command Line to Execute, select the script which you want to execute (.bat file or .exe file).
  • Check those two checkboxes. Click on ‘OK’ and ‘Apply’ and Voila! Your hook script is ready.
The only situation in which I think this script will create problem will be the one in which you are planning to commit large number of files, each of large size. It will take some time to check all the files. But, I guess the delay is any day better than receiving vitriol from the client resulting in kick up the back side.

I have attached sample batch file and jar file which searches for string “password” for all the files having extensions in (.java, .jsp, .js, .xml, .config, .properties and .txt). You just need to change the path in the .bat file which is executing the jar file. You can download it from here.

A special thanks to my colleague Niket Patel for helping me out on this. Cheers mate.

That's yer lot for this time. See ya.

Saturday, 6 July 2013

Importance of Design in SDLC

During software development, have you ever got involved in the situation where your client suggests change(s) one day prior to go live date, you analyze the change(s), their impact and revert back to the client saying, “It will require a database change, hence it will require a week (or so) to accommodate the change”? Imagine what the client will think about you. He will think like, “What a bunch of absolute cretins we have over there! I asked them to move a field from one page to another and they are saying that they will require a week to do the job. Even my dog can do this in 3 hours.”

Well, at that time, we can’t do much apart from convincing them why we will need a week.  But, what we can do to avoid such situations is to deign our system so strongly/flexibly so that we can accommodate any change without much impact. This is what this blog post is all about. We will discuss the importance of design phase during software development life cycle. And yes, unlike the previous blogs(which were implementation based), this blog is an approach based blog.

From all the constituents of software development life cycle, design phase is the most important one. An ideal design phase should take around 60% time of an SDLC. More sound the design of your system, more flexible/maintainable your system would be. What developers generally do during design phase is, well, nothing. They will fill a document or two as per company’s process and that’s it. They won’t even look at those documents during coding/implementation phase. And, at the time of implementation, they will decide the name of the table, which columns would be there and all that. This is pathetic way of implementing the system.

Let’s discuss how we can design the system properly. We will discuss the different phases/parts of design phase which are important.

Database Design:

This is the most important and critical part of design phase. The whole future of your system depends on this. People do Ph. D’s on database design. Let’s discuss what aspects we need to consider while designing the database.

Before designing any table, one must have a mindset that the client is going to populate gazillions of records into the table. At that time, your system should be able to perform all DML operations including joins without any hitch. Normalization and number of tables are also important aspects. If you are designing any transactional system (OLTP) then the normal form and number of table can be as high. But if you are designing any analytical system (OLAP) then number of tables and normal form can be lower. Type of columns as well as number and type of indexes are also important while designing a database.

The outcome of design phase should be the database design document along with the table creation scripts. Also, there should be DML queries which will be required during development so that developers can straight away integrate those queries into their class.

Low Level Design/Detailed design:

Once database design is completed, we can go ahead with low level design. This phase consists of many sub phases. We will discuss each sub phase one by one.

Class Diagrams:

Once we have table structures ready, we can easily identify which classes need to be created i.e. bean/DTOs, service/business layer classes, database layer classes etc.  We can also identify which methods will be required in these classes. Hence, we can create class diagrams of all the classes. A sample class diagram is shown here. It consists of attributes and methods of a class and also, relationships among the classes. The use of class diagrams is to guide the developer during the implementation. Ideally, we should go ahead and implement those classes. That will make developer’s task easy.

Class Details: 

A class diagram should be followed by the table containing the information about the classes and their methods i.e. which method performs which action, what is the significance of each argument of the method etc, hence, making the implementation of the method a cakewalk.

Sequence Diagram:

A sequence diagram indicates a round trip from sending the request to one page to receiving the response on same/other page. It will contain all the layers of the system. The use of sequence diagrams is to make sure that the control follows the architecture i.e. from jsp page; the request should not go directly to database interaction layer. A sample sequence diagram is shown here.

Algorithms:

Once we are done with the structures of classes and all, we should focus on inner crux of them. This phase is most important sub phase of low level design. The algorithm contains the steps of how a particular workflow should take place inside the method. If a workflow contains the involvement of more than one method/existing method(s) then the corresponding class name and method name should also be present against the respective step. The algorithm will make the task of developer a lot easier. It will serve as a guide of how to implement a particular use case.

Test Plans:

Testing is also one of the important aspects of software development. If the system is not tested properly, then, all the good work of development may go out of window. To test the system thoroughly, proper documentation of test plans is required. There can be many types of test plans i.e. unit test plan, integration test plan, system test plan. If you have a separate QA team for testing then they can also assist you in the creation of test plans. In fact, apart from unit test plan, it is their responsibility to create the test plans. Test plans are used to check the quality of the system.

Traceability Matrix:

Final artifact of the design phase, and also the blog, is the traceability matrix. It provides the traceability of a requirement/use case to its relevant documents. IT contains the table structure like the one shown below:

Use Case #
Use Case name
Requirement document
Prototype
Test Plans
JSP
Properties
Java









































It will map each use case with the corresponding document names, hence, making it easier to trace the whole use case. If more than one instance of the document exists for a particular use case then, all names will be present in the corresponding cell.

Voila! We are done with the discussion of design phase. Please note that we have only discussed the important phases of design here. There are many other phases as well (like High Level Design, etc). We have discussed only those phases which are important.

Consider the scenario of first paragraph once again; you have designed the system properly. One day before go live date, the client asks to move the field from one page to other and also, asks for the estimates. In this case, your proud answer will be, “We will be able to deliver the change in next 2 hours!” At least, now we are faster than his dog!

~ Ciao

Tuesday, 18 June 2013

Less known features of Java

Every time I start (or think of starting) a blog post, a question arises in my mind, ‘How to start? What to write in the first paragraph?’Almost every time, an idea gets popped up in my mind, but today, my mind blanked out. I didn’t get any idea despite of thinking for more than 15 minutes. I even fell asleep a couple of times during that thought process. Then I thought of googling it. I opened google and typed ‘How to start a blog post’, I opened first 2-3 links, went through first link (patience: 100%), found nothing related to this, then went through the second one (patience: 50%), still, same result, then went through the third one hoping that this will really help me, but alas, nothing useful again(patience: 0%). Then I thought, ‘screw this’ and clicked on red button on upper right corner. And here we are, done with the first paragraph! Bingo!

Coming to the main aspect of this blog, this blog discusses about the features which are less known in java but are really useful. They help us a lot in our day to day programming. Even I was not aware of these. But when I came across such requirements, instead of implementing them with the traditional approach, I thought, ‘there must be some other way of doing it.’ Then I googled it and found the alternate solutions which are described in this blog. I have covered very small number of features in this blog. If I come across more of such features, I will definitely add them in my subsequent blog posts. We will look into these features one by one.

Double brace initialization:

Have you ever come across a requirement where you need to create a collection (say List) and fill it with the values available to you? May be you have to invoke the method which takes a collection filled with some predefined values as an argument, in this case, what will you do? You will probably go by the following traditional approach:


List<String> list = new ArrayList<String>();
list.add("A");
list.add("B");
list.add("C");
//pass this list to the method

Or if your list is static then, you will probably initialize it like this

public static final List<String> list = new ArrayList<String>();
      static{
            list.add("A");
            list.add("B");
            list.add("C");
      }

Java provides a way to do this in a single go, by double brace initialization. All we need to do is to pass all the calls to add method while creating the collection itself, as described in the code below:

List<String> list = new ArrayList<String>(){{
     add("A");
     add("B");
     add("C");
}};
//pass this list to the method

Voila! Your list is created in a single go. Isn’t this wonderful? You can do the same for static list as well. Here are some technical details of this double brace initialization.

The first brace creates a new anonymous inner class, the second declares an instance initializer block that is run when the anonymous inner class is instantiated. This type of initializer block is formally called an "instance initializer", because it is declared within the instance scope of the class.

This is just an excerpt; you can find the full article here.

Add shutdown hook:

Shutdown hook contains the code which is executed when an application goes down. Suppose if you want to do some activities before the application gets shut down (just like destroy() method of servlet) i.e. cleaning up resources, closing connections and streams, notify the administrator/user about the shutdown then you can use this shutdown hook to do the same. Following code shows how to do it.

public static void main(String[] args) {
        Runtime.getRuntime().addShutdownHook(new Thread() {
                 @Override
                 public void run() {
                     System.out.println("Inside Add Shutdown Hook...");
                 }
        });
        System.out.println("Last instruction of Program....");
        System.exit(0);
}

Output of this program will be:

Last instruction of Program....
Inside Add Shutdown Hook...

We can attach any number of shutdown hooks to the application. But but but, all shutdown hooks will run in parallel as they run in separate threads. Hence, in case of multiple shutdown hooks, we need to consider all the aspects of concurrency, deadlock, thread interaction etc.

We can also remove the shutdown hook by calling Runtime.removeShutdownHook(Thread hook)method.

The case when these shutdown hooks will not be executed is when JVM crashes unexpectedly or is killed by the user. Because, after all, it’s JVM which executes all these instructions. Hence, no JVM means no execution. More on shutdown hooks is described here and here.

Atomic classes (part of Java.util.concurrent.atomic package):

Atomic classes (e.g. Atomic Integer, Atomic Long etc) can be very useful when you want to share the variables across multiple threads. They provide “lock free thread safe programming on single variables” i.e. you can leave aside the concurrency problems if you use this classes. They have incrementAndGet() method which will provide you the next value.

Say, if you have requirement like how many threads have used a particular method or if you want to have some kind of counter which needs to be shared across the threads and these classes fit the bill. More on Atomic package is described here. A practical example of Atomic Integer is also shown here.

I am done with the features here. Hope you have liked it and will use it in future if required. Now starts the problem of what to write in the ending paragraph. My mind has again blanked out about this and I am not in a mood to google again. Hence, I am leaving it here.

A special thanks to my colleague Harshal Choksi who inspired me to write the blogs. Thanks for the inspiration mate. Keep inspiring people like this.

Happy coding!