-->

Tuesday, 17 January 2012

Why I Dropped Mysql For Neo4j

If you would like to run the queries in this post the SQL is available here.

One segment of the Horse Racing Information System I worked on in college involved finding horses from a given race that had raced against each other.

This is the more general example where I have a given horse and wish to find all the horses they have run against:

SELECT h2.horsename
FROM horse h1
JOIN raceentrants r1 ON h1.horseid = r1.horseID
JOIN raceentrants r2 ON r1.raceID = r2.raceID
JOIN horse h2 ON h2.horseid = r2.horseID
WHERE r1.horseID =1
AND h2.horsename NOT
IN (
'Horse A'
);

My representation of the steps involved are:

  1. Select the correct horse from the Horse table
  2. Join this to the RaceEntrants table on horseId to find all the horse's races
  3. Join this to the RaceEntrants table on raceId to find all other entrants in these races
  4. Join this to the Horse table to find the other horses
  5. Make sure we do not return the original horse
If I really wantd just the names I could have used 'distinct' but it is more likely that in a ful database you would be returning on all horse-race combinations.

This is quite cumbersome but it gets worse ...


The following query is the similar to the query above but for all the horses in a given race. I preconfigured 'arace' as a view of all the horses to reduce the clunkiness.

SELECT  h1.horsename AS Horse, h3.horsename AS Raced_Against, race.*
FROM horse h1
JOIN arace a1 ON h1.horseid = a1.horseID
JOIN raceentrants r1 ON a1.horseid = r1.horseID
JOIN raceentrants r2 ON r1.raceID = r2.raceID
JOIN raceentrants r3 ON r2.horseID = r3.horseID
JOIN raceentrants r4 ON r4.raceID = r3.raceID
JOIN arace a2 ON r4.horseID = a2.horseID
JOIN horse h2 ON h2.horseid = a2.horseID
JOIN horse h3 ON h3.horseid = r2.horseid
JOIN race on race.raceID = r2.raceID
WHERE h2.horsename <> h1.horsename
AND r2.horseID NOT IN ('1','2')


That got scary quick and I thought I had noticed a mistake in line 9 but transitivity saved the day. I actually relish working on stuff this complex but, unfortunately, my processor does not.

Joins are functions that produce Cartesian Products. This means that growth is exponential. In contrast, with Neo4j, I can set up the database with relationships from horses to the races. Now I can add all of each horse's races to a set and use some variant of the following:

public Set compare(Horse otherHorse){

  TreeSet bothRaced = new TreeSet();
  for (Race race : thisHorsesRaces){
   if (otherHorse.hasRacedIn(race)){
    bothRaced.add(race);
   }
  }
  
  return bothRaced;
}

Enhanced by Zemanta

Arrow Key Nav