Alfresco and Neo4j

06. 11. 2013

Overview

Being one of the strategic development platforms at CROZ, Alfresco is a common subject of many R&D initiatives. In the latest one, we tried integration with Neo4j graph database.

Being one of the strategic development platforms at CROZ, Alfresco is a common subject of many R&D initiatives. In the latest one, we tried integration with Neo4j graph database.

At its core, Alfresco is similar to graph databases. Alfresco internal concepts like nodes, properties and associations can be conceptually mapped directly to graph database concepts as nodes, properties and relationships respectively. If we take advantage of this similarity, we can build an independent graph database which represents Alfresco nodes and associations, and keep Alfresco database and graph database in real time synchronization. Such replicated and synced graph database store enables native graph operations over Alfresco data, while original Alfresco data are kept intact.

When you look at internal Alfresco node structure, it’s clear that its structure is similar to node structure in graph database. This similarity between two products made us believe that some sort of natural integration is possible here. We absolutely love features that graph database brings and would love to have these features when managing data stored in Alfresco. For example, to define ad-hoc new relationships between Alfresco data and in this way enhance existing Alfresco data in a non-invasive manner.

So we gave it a try in our R&D department. After initial validation of technical concepts, we focused on finding the appropriate business use case and decided on a use case from insurance industry. In this use case we took an insurance company with multiple branches and business goal to store and manage documentation produced after the client reports car damage. We’re talking about pictures of the car taken after the accident and all sorts of documentation, either in Office format or scanned. It would be useful for insurance officers to analyze which branch has the most damages reported, how are these damages connected and so on.

Pilot project

Insurance company stores all documents in Alfresco and there is a custom metadata model for each document type. We have developed a front end application for managing these documents and their metadata. Front end application is based on Grails architecture and ExtJS library. We’ve actually built a framework which significantly simplifies creation of ExtJS/Grails applications by providing many features like customizable initial application generator, custom built & enhanced ExtJS components, integrated JavaScript IoC container, automatic XSS & CSRF protection, user notifications, client & server side validations, advanced form handling, simplified and enhanced client/server AJAX communication, simplified error handling, ExtJS/Grails scaffolding, scanning, database registries maintenance, etc.

Among these features is also a Grails component which simplifies secured communication with Alfresco side. On Alfresco side we developed extensions which enable us to handle requests with native Spring components (Controllers, Services etc.) instead of using classic Web Scripts approach.

Our pilot is based on following architecture. In the center of the system is Alfresco that stores all documents. Front end application that represents user interface for managing documents is built on top of Grails and ExtJS frameworks.

In the core of this integration is our Integration module that handles data transfer from Alfresco to graph database Neo4j. In initial phase, when graph database is empty, “initial import” kicks in. Initial import reads data/nodes from Alfresco directly using NodeDAO. Data is transferred to Neo4j connector that currently uses REST interface to post data to Neo4j. On the Neo side we have an extension that receives data and creates nodes. This initial phase is triggered in bootstrap. Initial import is done in such a way that, once triggered, it can be executed in background. This assures that system can be used during initial import which can be very handy when importing high data volumes. Once the initial import is done, all changes to Alfresco data are picked up by our extension built on top of Alfresco behavior mechanism. This mechanism allows us to keep data in sync between Alfresco and Neo after the initial import.

User interface is our custom-made visualization tool based on D3 JavaScript library. Although we spent only around two months in development we come up with some nice features like support for various layouts (forced, hierarchical and radial), interactive node expanding and collapsing, support for visualization of multiple graph analysis algorithms (degree in and out, closeness, betweenness) and some others like hiding and filtering nodes, multiple node selection, criteria based node expansion, partial expansion of node groups, node properties browsing and so on. There is a possibility to execute predefined queries that filter nodes and automatically apply certain layout. You can see visualization tool in action in this short movie.

As often is the case with R&D initiatives, no pain no gain. Along the way we stumbled upon several challenges. The first one was how to initially import all data to Neo. At the end, we opted for reading Alfresco data using NodeDAO.

Another challenge was how to keep data in synchronization. Behavior mechanism hit the nail on the head, it was just what we needed to intercept all data changes and update Neo accordingly.

There is also a challenge that we’re currently working on and it deals with data synchronization after one of the components temporarily fails and restores. As the best solution here we currently see event bus with persistent queue or even event sourcing.

We believe that all this is as useful as it is cool! In its current shape, this implementation can be used as alternative Alfresco node browser. It can be used wherever you need visual data analysis. Very strong point is enhancement of Alfresco data – situations where you need to create new relationships between nodes and it’s hard to do it directly in Alfresco, or you don’t have permissions. Possibilities are endless… If you have any other ideas feel free to contact us!