Monday, 7 May 2012

Teaching myself object orientation and NoSQL

Python and NoSQL

After listening to an associate's interviewing techniques and insistence on understanding the abstract concepts of objects orientation (encapsulation, inheritance, polymorphism yada-yada) I thought I may teach myself a little programming using some object-oriented concepts combined with trying out some introductory NoSQL concepts, Mongo DB's performance compared to MSSQL - all whilst pushing myself with a little logic.

The challenge I thought worthwhile was something a friend (Dave Barker at Aberystyth University - can't find any trace of him on the interwebs now) had challenged himself with whilst at university but had failed and at the time I believed mostly due to bad technology choices. He was attempting to collate all the possible solutions to peg solitaire

I found it interesting because I remember having the game as a child, back then I had a solution (technically 4 if you ignore the 3 other were symmetrical to the original) that was derived by persistence and guided chance but I knew there had to be many many more solutions. I remember at the time Dave said that he had used some heuristics but I got the impression it was a brute force attack that failed for two reasons, firstly the machines at the time weren't adequate to run Java at any pace and the lack of RAM mean't that as he was using vector every time it expanded it needed enough room to keep the original vector and the new one before the garbage collector cleared up the first vector (with hindsight a linked list would have been smarter and collating results, abandoning failed attempts and only keeping a record of successful games)

Anyway all that is irrelevant as I thought my laptop could crack it and keep every result, bad and good, in a database were eventually I could extract the results once all the games were played. 

I considered a few ways of attempting this but I thought the easiest way was to create a board node object that contained all the possible moves, each move linked to another node object with the board state and its possible moves all the way to a final leaf node that either terminated with a failed game or a successful game so it could be abstracted into a tree.

For example, the root node:
pos 3,4 to pos 4,4
pos 4,3 to pos 4,4
pos 5,4 to pos 4,4
pos 4,5 to pos 4,4

so the root node contained 4 moves which leads to 4 nodes which contain there own set of moves and thus there own set of child moves, the tree would exponentially spread until a node had no more possible moves and would be considered a leaf.

The possible moves of a board was deduced by looping through each position until a peg was found where there was horizontally or vertically another peg and then an empty slot in a row. I was going to represent this in a 2D array but Python had no such concept so an array of an array had to do.

All this was made really simple by how the Python library for MongoDB 'python-mongo' natively allowed lists and arrays to be inserted into the database in a document along with a few other necessary details like board-state (successful leaf, failed leaf, processed node, unprocessed node) so from a programmers point of view the database simply allowed a person to store as close to their mental abstraction/paradigm as possible. This is a big big plus when working with complicated ideas (this task isn't that all that complicated I know). I didn't need it to work in any order just keep on picking up unprocessed nodes at random and computing all their child boards over and over until there were no more to process and that would mean that all the possible games had been computed.

Anyway to cut a long story short my attempt eventually failed because my mathematical naivety hid the fact that a brute force attack would result in far too many hours of computation and a database that was simply too vast. I gave up after running it for three hours and it was showing that it had computed 24 million board states, it still had 18 million un-computed child boards to investigate and had a 23Gig database. I think it is still possible to do this almost completely with brute force if I remove symmetrical board states (apparently if done right there are only 23 million possible board states when symmetry is considered) but that is way beyond just investigating the technology and object orientation.

Technology

MongoDB/NoSQL
I can now understand why people want to programme in NoSQL databases, it allows much more elegant programming without having to dynamically create SQL just to do the simplest of tasks. They have their flaws in that it really doesn't allow multiple documents to be altered in a transactionally secure way, meaning that the data can be left in a broken state easily. That danger can reduced by holding as much related data in the same document and some other tricks but for many many circumstances this just isn't as safe as a traditional database if you value the integrity of your data (ephemeral data is fine though). It is worth pointing out that during this little exercise Mongo performed flawlessly and as far as I know no data was lost or damaged but the most surprising thing was performance, I had added 23 gigs of unstructured data in 3 hours and the performance had not suffered at all, if anything the bottleneck was how fast Python could compute the new board states as it had one of my CPU's at 98% usage and Mongo was at 12% usage on the other CPU. 

(On a side note I originally did this with a 32 bit installation of Kubuntu but Mongo would only allow a 2Gig database. Interestingly Mongo didn't complain it kept on receiving attempts to add data but not doing so. After a restart it didn't allow access at all so it failed in the worst possible way. I installed the 64bit version and everything went well from that point).

I have since discovered that there are no simple ways to extract min and max information in map/reduce notation at the moment (it will be added in future versions). There are some horrible hacky ways using index's etc but it goes to show that this technology has its place and its not a drop in substitute for an old fashioned structured database.

Object Orientation
I'm still non the wiser how object orientation is conceptually any superior than anything I could have done in a procedural fashion but maybe this just isn't the right project to highlight the benefits.

Have a go
The project can be found on Google drive and only requires:
  • A python installation
  • The Python-Mongo library installed
  • An installation of MongoDB
  • I used K/Ubuntu but it should work on any OS without modification

Leaving another project half done
I'm going to have to have another crack eventually once I figure out a nice way to avoid repeating the same games considering board symmetry but for the time being I'm stumped and have many challenges trying to get MSSQL to do things it was never completely intended to.