I would like to start this blog by wishing ‘Happy Janmashtami’
to all the readers (if there are any). I hoped this blog would be Part
Deux of what we discussed in previous post.
But, in the meantime, I got the chance to exercise my hands on something new and exciting
and hence, thought of sharing it. Don’t worry, we will do part deux some other
time.
This blog briefly discusses a new database, called ‘Graph Database’. Before we dig into it, let’s revisit the history of databases.
It all started with storing the data in punched cards and flat files, offering several access methods with limitations. We then gradually evolved to mainframes of IBM (from their Information Management System). Sometime in 1970s, Larry Ellison read a paper published by IBM on some database programming language and that resulted in revolutionary database management system called RDBMS.. Fast forward to 2000s and people started experiencing difficulties in maintaining large sets of data as well as designing schemas to support modern web application structures, resulting in new database technologies like MongoDB and Graph database.
Have you ever come across any requirement where you had to store hierarchical structures (e.g. application menus, organization structures or parent-child relationships)? I bet most of you have. In such situations, (because our brains are trained to think like SQL schemas only), we generally go by either of these approaches (let’s say we want to persist a tree structure with parent-child relationship):
This blog briefly discusses a new database, called ‘Graph Database’. Before we dig into it, let’s revisit the history of databases.
It all started with storing the data in punched cards and flat files, offering several access methods with limitations. We then gradually evolved to mainframes of IBM (from their Information Management System). Sometime in 1970s, Larry Ellison read a paper published by IBM on some database programming language and that resulted in revolutionary database management system called RDBMS.. Fast forward to 2000s and people started experiencing difficulties in maintaining large sets of data as well as designing schemas to support modern web application structures, resulting in new database technologies like MongoDB and Graph database.
Have you ever come across any requirement where you had to store hierarchical structures (e.g. application menus, organization structures or parent-child relationships)? I bet most of you have. In such situations, (because our brains are trained to think like SQL schemas only), we generally go by either of these approaches (let’s say we want to persist a tree structure with parent-child relationship):
- Create self referencing foreign key on the same table, called as parent id and fetch the records by parent id.
- Create one to many relationship between 2 tables with second table having mappings of parent id and child ids.
To be fair, any of these would work fine. Oracle
even provides hierarchicalquery support to retrieve the whole structure in a single query and I am
sure we can write loops/queries for other databases as well.
However, how amazing it would be if you don’t
need to worry about anything at all! You will be like, “Here’s my list/set/map
of objects, each having another set of children, and another set of grand children and so on... Go and persist the whole list.” ... “Done? Now, fetch me sub tree of child-2 of parent-3.” And boom! Your sub-tree
is ready. It’s cool, isn’t it?
Graph database does exactly that. It first
persists the graph and then lets you query by node name, relation-type,
distance from root and whatever else you can think of. There are plenty of
graph database projects (here
is the full list), each having its own set of features. Among these, I have
chosen neo4j to discuss as it is commercial, easy to implement and has spring data
support. Let’s discuss an example of the same.
I have used organisational structure as an
example. Let’s say an organisation has a structure like this: Director ->
Manager -> Leader -> Member. The structure has 4 hierarchies (all being
one to many i.e. a director has more than one manager under him, a manager has
more than one leader under him and so on). Have a look at the model classes
(code is uploaded here). Each has a list of objects called teamMembers. You will
see some annotations on class members; we are not going to discuss those
(documentation here)
as most of them are self explanatory.
I have put 2 Unit tests under test folder, one
to populate the graph and other to retrieve the data. Have a look at ‘PopulateEmployeeStructureIntegrationTest.java’,
it creates the structure and calls save with director object. The point to note
here is, calling save on director object alone creates all sub nodes and builds the
graph automatically (magic!). This creates 2 directors; having 2 managers, 2
leaders and 2 members each.
Now, it’s time to query the data. Have a look at
‘EmployeeStructureQueryTest.java’, it calls findByName to find an employee
(findByxxx calls are features of spring data, if you are not aware about spring
data then read this).
We are passing director name as an employee name and voila, it brings up the whole
graph! Imagine achieving the same with series of SQL queries and setting
references from results, nightmare..
Before running this example (or, to make this
example run successfully), go through below steps:
Once neo4j is installed and population script is
run, neo4j ui will show tabular structure as well as semi interactive graphical representation of all the nodes. We can even query the graph using neo4j’s cipher query
language (examples here).
That’s it for today. For those wondering about
commercial usage of graph database, here is the bonus
reading. Hope you enjoy playing around with graph database as much as I did. If
something doesn’t work, let me know in comments and I will get back to you. Till then..