days
-4
-3
hours
-1
0
minutes
-4
-1
seconds
-1
-5
search
Proprietary vs Open Source

Neo4J and the power of Open Source

MaxDeMarzi

Neo4J’s Max de Marzi shows us why working with open-source software can be way better than the proprietary equivalent.

The following article is from a Neo4j series that first appeared on maxdemarzi.com, a blog by Neo advocate Max De Marzi.

One of the benefits of Open Source Software is that if you want to change how something is done, you can.

At Neo Technology, we have a small team of “Field Engineers” who don’t really work on the product but rather with the product. We help our customers with issues of all kinds, answer questions, give suggestions and whatever we need to do to make people’s project successful. A little while back I had a support ticket for a traversal that was taking longer than they hoped it would.

Think about a social network, one of the things you may want to do is tell the user how big their friends’ network is. But why stop there? How about their friends of friends’ or even friends of friends of friends’ network? These are the kind of questions graph databases excel at compared to relational databases. Let’s take a look at what they were doing:

@GET
@Path("/network/{id}")
public Response getNetworkSize(@PathParam("id") Long id, @Context GraphDatabaseService db) throws IOException {
    Node user = db.getNodeById(id);
    final Evaluator depth = Evaluators.toDepth(4);
    final TraversalDescription traversalDescription = new TraversalDescriptionImpl().
             breadthFirst().evaluator(depth).relationships(FRIENDS, Direction.OUTGOING);
     
    final Integer[] sizes = new Integer[] { 0, 0, 0, 0, 0 };
 
    for (final org.neo4j.graphdb.Path path : traversalDescription.traverse(user))
         sizes[path.length()]++;
    }
 
    return Response.ok().entity(objectMapper.writeValueAsString(sizes)).build();
}

What you are looking at is an unmanaged extension that takes a node id parameter and uses that node as the starting point of a traversal. The Traversal is going to traverse breadth first along the FRIENDS relationship up to a depth 4 hops and at each step in the path it is going to increment the counters inside the sizes array. It’s not obvious here, but the default Uniqueness in a Neo4j Traversal is Global Uniqueness, so a node won’t be visited twice if we’ve already seen it.

To test this out, I created a graph with a million nodes and randomly created 14.5 million unique relationships between them. I can query it via REST and it looks like this:

http> get /example/service/network/1
==> 200 OK
==> [1,20,230,3446,48444]
So how fast is it? I turn to the tried and true Gatling Tool and fead it a list of node ids for 20 seconds, repeating the test a few times:
import com.excilys.ebi.gatling.core.Predef._
import com.excilys.ebi.gatling.http.Predef._
import akka.util.duration._
import bootstrap._
 
class NetworkSize extends Simulation {
  val httpConf = httpConfig
    .baseURL("http://localhost:7474")
    .acceptHeader("application/json")
 
  val testfile = csv("test-data.txt").circular
 
  val scn = scenario("Network Size")
    .during(20) {
    feed(testfile)
    .exec(
      http("Test Network Size")
        .get("/example/service/network/${id}")
        .check(status.is(200)))
      .pause(0 milliseconds, 1 milliseconds)
  }
 
  setUp(
    scn.users(8).protocolConfig(httpConf)
  )
}

…and the best result:

About 60 requests per second with a mean latency of 135ms and a max of 640ms. It’s not bad, but can it go faster? I did a little digging to find out what is going on under the covers and found something interesting in how the Global Uniqueness of our nodes is maintained.

class GloballyUnique extends AbstractUniquenessFilter
{
    private final Set<Long> visited = new HashSet<Long>();
     
    GloballyUnique( PrimitiveTypeFetcher type )
    {
        super( type );
    }
 
    public boolean check( TraversalBranch branch )
    {
        return visited.add( type.getId( branch ) );
    }
...

In the code, we have a Set of Longs that helps us tell if we have seen a node yet. It makes sense, it works, but what if we went a different way. Instead of storing Longs, we can use a BitSet to store true or false whether or not we have seen a node id. Ran into a small problem though… the standard Java BitSet is limited to 2,147,483,647 (an int) and we can have tens of billions of nodes. Not to fear however, OpenBitSet to the rescue. This little class from Lucene can handle up to 64 * 2**32-1 bits, I’ll give you a second to do the math… that’s 274,877,906,943 bits. Perfect.

So I pulled the 1.9 branch from the Neo4j repository from github and made an edit:

class GloballyUnique extends AbstractUniquenessFilter
{
    private final OpenBitSet visited = new OpenBitSet();
  
    GloballyUnique( PrimitiveTypeFetcher type )
    {
        super( type );
    }
  
    public boolean check( TraversalBranch branch )
    {
        if ( visited.get( type.getId( branch ) ) ) {
            return false;
        } else {
            visited.set( type.getId( branch ) );
            return true;
        }
    }
...

I also added “Lucene Core” to the LICENSES.txt and NOTICES.txt files, and modified the pom.xml file:

<properties>
...
  <lucene.version>3.6.2</lucene.version>
</properties>
...
<dependency>
  <groupId>org.apache.lucene</groupId>
   <artifactId>lucene-core</artifactId>
   <version>${lucene.version}</version>
</dependency>

and then in the /neo4j/community/kernel directory I ran:

mvn clean install -DminimalBuild -Dlicense.skip=true

and copied the target/neo4j-kernel-1.9.5.jar into my neo4j/lib directory overwriting the existing kernel and reran my tests:

Now we’re up to almost 100 requests per second with a mean of 81ms and a maximum of 420ms, almost doubling our previous performance numbers. Not bad for a few lines of code. So please, take a look under the hood of Neo4j and if you see something you don’t like, change it… and send us your pull requests!

By the way… you can go faster if you needed to, but it’s not pretty. I call it the OpenBitSet Dance:

@GET
@Path("/network/{id}")
public Response getNetworkSize(@PathParam("id") Long id, @Context GraphDatabaseService db) throws IOException {
    Node user = db.getNodeById(id);
 
    final Long[] sizes = new Long[] { 1L, 0L, 0L, 0L, 0L };
    OpenBitSet[] seen = new OpenBitSet[]{new OpenBitSet(),new OpenBitSet(),new OpenBitSet(),new OpenBitSet(),new OpenBitSet() };
    seen[0].set(user.getId());
 
    for(Relationship level1 : user.getRelationships(Direction.OUTGOING, FRIENDS )){
        Node friend = level1.getEndNode();
        seen[1].set(friend.getId());
    }
 
    for (int i = seen[1].nextSetBit(0); i >= 0; i = seen[1].nextSetBit(i+1)) {
        Node friend = db.getNodeById((long)i);
        for(Relationship level2 : friend.getRelationships(Direction.OUTGOING, FRIENDS )){
            Node fof = level2.getEndNode();
            seen[2].set(fof.getId());
        }
    }
 
    for (int i = seen[2].nextSetBit(0); i >= 0; i = seen[2].nextSetBit(i+1)) {
        Node friend = db.getNodeById((long)i);
        for(Relationship level2 : friend.getRelationships(Direction.OUTGOING, FRIENDS )){
            Node fof = level2.getEndNode();
            seen[3].set(fof.getId());
        }
    }
 
    for (int i = seen[3].nextSetBit(0); i >= 0; i = seen[3].nextSetBit(i+1)) {
        Node friend = db.getNodeById((long)i);
        for(Relationship level2 : friend.getRelationships(Direction.OUTGOING, FRIENDS )){
            Node fof = level2.getEndNode();
            seen[4].set(fof.getId());
        }
 
    }
    seen[1].andNot(seen[0]);
    seen[2].andNot(seen[1]);
    seen[2].andNot(seen[0]);
    seen[3].andNot(seen[2]);
    seen[3].andNot(seen[1]);
    seen[3].andNot(seen[0]);
    seen[4].andNot(seen[3]);
    seen[4].andNot(seen[2]);
    seen[4].andNot(seen[1]);
    seen[4].andNot(seen[0]);
    sizes[0] = seen[0].cardinality();
    sizes[1] = seen[1].cardinality();
    sizes[2] = seen[2].cardinality();
    sizes[3] = seen[3].cardinality();
    sizes[4] = seen[4].cardinality();
 
    return Response.ok().entity(objectMapper.writeValueAsString(sizes)).build();
}

If you find a way to go faster still, please let me know!

Open sign photo via Shutterstock

Author
MaxDeMarzi
Max De Marzi, is a seasoned web developer. He started building websites in 1996 and has worked with Ruby on Rails since 2006. The web forced Max to wear many hats and master a wide range of technologies. He can be a system admin, database developer, graphic designer, back-end engineer and data scientist in the course of one afternoon. Max is a graph database enthusiast. He built the Neography Ruby Gem, a rest api wrapper to the Neo4j Graph Database. He is addicted to learning new things, loves a challenge and finding pragmatic solutions. Max is very easy to work with, focuses under pressure and has the patience of a rock.

Leave a Reply

Be the First to Comment!

avatar
400
  Subscribe  
Notify of