Combining noSQL and ORM in an MVC framework for a real-case application
I've been trying for some time to put some 'cool' things I've been reading about noSQL (couchDB, mongoDB, Redis...) in the past years into practical use.
I'm quite used to writing apps with Django, and started using Play! when Java is the only acceptable deployment option (and enjoying it too). Both have modules to work, for instance, with MongoDB, and django also has nonrel. But i never felt a need for noSQL.
Until I finally found what I thought was a good use case for document-oriented storage, such as MongoDB.
Let's say we have to manage ordering and follow up (whatever) of some complex items. The items might have a lot of different properties, eg. oversimplifying we could have:
- a Fridge that can have
- one or two doors,
- be of class A,B or C,
- a surface color
- standalone or built-in
- an Oven that can have:
- gas or electricity or both
- self cleaning or not
- standalone or built-in
As you can see, each object can have several properties that can be constrained by type.
In my usual RDBMS through an ORM I would go with defining a "product" model, then inherit two models, a Fridge and an Oven. If a Fridge gets one more property some time after, I modify the model - and as such, the schema-, run a migration, and add a column.
noSQL solutions I can think of are:
- using RDF (with something like Virtuoso or building my own simplified triples storage)
- using a document-oriented db such as MongoDB
But I fail at understanding how different (easier) development would pragmatically be switching to a noSQL solution still using the framework ORM with the right adapter (especially with DODB).
Let's say I'm using Django with MongoDB through mongodb-engine.
I'm still using the same ORM, so I still describe those objects as models, listing all properties. The ORM is thus doing exactly the same job! The cost of producing a migration if the model changes is very limited with an ORM (in particular with something like South), nothing requiring to learn a new technology by itself.
There might be /other/ advantages to a DODB, and some specific to MongoDB (scalability, data-processing, maybe performance) but... what about the exact use case and problem I'm describing?
I am most likely missing a point, so here come the real question(s):
For this specific use case:
- is this example a good or bad one for DODB (do you have a good one)?
- would it make sense to combine an ORM for basic stuff (Users, Orders) and usage of noSQL without ORM for complex objects, is there a compelling reason to switch to noSQL completely, or should I stay with the stock ORM/SQL?
I understand answering these questions could be partially subjective, so you can assume perfect knowledge of both noSQL and SQL theory, and stock ORM; existence of good bridges from stock ORMs to noSQL DB. Let's assume we are talking about this use case with MongoDB as noSQL alternative.
But there is a more general question - which is the core question of this SO post:
- Isn't a good ORM (such as JPA, ActiveRecord, or Django's one) making noSQL and in particular document-oriented databases of little use?
- ...and is it worth using noSQL with a 'classic' ORM?
("little use" from a programming and maintenance standpoint, performance and similar criteria are a different matter and would require a precise product-to-product comparison)
What I am also trying to understand is if it wouldn't be better to drop using an ORM when switching to noSQL. It would be nice to have more "dynamic" models, eg. I could have a table describing what the Fridge and Oven models are (fields), and the Fridge and Oven models in the code would be able to construct their views dynamically (forms for editing and listings for displaying).
: these are here to show my research, but also to clarify that what I am asking is not generic about noSQL vs. SQL
- Is an ORM redundant with a NoSQL API? : similar! but I am trying to see why most frameworks (see above) are providing noSQL access through their ORMs. Is that a good idea or not?
- why the use of an ORM with NoSql (like MongoDB) : more specific than the previous one, but still is the other way around. I'm arguing that ORMs make noSQL unuseful, rather than the other way around!
- When to replace RDBMS/ORM with NoSQL
- When to use MongoDB or other document oriented database systems?
EDIT And links:
- Siena: a persistence API for Java inspired by the Google App Engine Python Datastore trying to draw a bridge between SQL and NoSQL worlds.
- minimongo: lightweight, schemaless, Pythonic Object-Oriented interface to MongoDB
This is what I get for trolling stackoverflow. Once in a great while an outstanding question gets asked and I am compelled to offer my 2 cents (at the risk of my own project timelines).
I just finished up a project where I had to de-couple an ORM from the model so I could implement a NoSQL solution, and found it not that difficult, although it was rough at times trying to figure out the best approach. So without getting too specific about my implementation, I will touch on what I had to do to make it work, as it may offer some enlightenment when you travel down the same path.
- Framework - Symfony 1.4
- ORM - Doctrine 1.4
- NoSQL - My own proprietary solution
- Store image paths within xml files vs the database
- Store html description paths within xml files vs the database
I didn't want to store images as blobs within the persistent store (database), and I didn't want to store image paths in the database, as I didn't want to pay the overhead of creating a database connection and querying for the path. So I decided to store the path information within a NoSQL persistent store (filesystem).
And ditto for html descriptions, I didn't want to create a text column on my table and store what could potentially be hundreds of lines of html within the database, and the same reasons as above.
All my NoSQL files relate to an object (refrigerator for example). These files contain paths to their related assets (html description and images), in what I call pointers, which point to the assets on the filesystem. I opted to use XML format for storing the data so it looks something like this:
// Path to pointer file /home/files/app/needle/myApp/refrigerator/1/1.xml // Example pointer <pointer>/home/files/app/file/myApp/refrigerator/1.png</pointer>
Now, within the framework I had to override the save() methods so I could save the aforementioned assets using the NoSQL API. It was pretty easy, I just checked the parent calls and maintained the values coming into the methods, so they wouldn't break any chain logic (methods calling other methods with the same arguments), that I wasn't aware of. I also made my custom NoSQL API calls throw exceptions as the main save() call was wrapped in a try/catch block. The only thing you have to be careful of here, is determining whether your NoSQL assets are worth stopping the entire transaction. In my example, I had to figure out if uploading an image would break saving the rest of the form fields in the database (I opted to break the transaction).
I also had to alter the load() methods to retrieve the assets using the NoSQL API vs the standard model logic. As with the save methods, this wasn't too hard to do either. I just had to see what the parent classes were doing and not muck with any argument values.
When all was said and done, I was able to store images and html descriptions on the filesystem, with an xml file made up of pointers pointing to their location. So now I don't incur a database call every time I need an asset.
Some considerations (these may be included in other NoSQL solutions, I had to write my own):
- You will not be able to query for refrigerators that have images from your persistent store. You will have to write some logic in your application to pull in the assets from the NoSQL store.
- Backups: As you backup your persistent store data, you also need to backup your NoSQL data.
- Orphans: Now that your schema is unaware of any assets you may have, deleting a row from your persistent store will orphan an asset on the filesystem. So be sure your application has the logic to clean the NoSQL store when a row has been deleted.
I think I hit all the major hurdles I faced when implementing a NoSQL solution with an ORM, if you have any other questions feel free to hit me up.
-- Edit --
Responses to comments:
As I mentioned I didn't want to create a database connection and query just to get a path to an asset. I feel it's better to use a NoSQL solution for this type of information as there is really no reason run queries against this type of information (images or html descriptions).
Developing my own NoSQL solution was more of an ego challenge. At work there was a project to implement a custom NoSQL solution (had bad experiences with MogileFS), and to be frank, was poorly designed and poorly implemented. But rather than just point out the bad, I challenged myself to offer up a better solution, but for a side project. And because of the challenge aspect, I didn't research any already available NoSQL solutions, but in hindsight I probably should have.
I still think you can implement MongoDB or any NoSQL solution by overriding crud functions with the Model layer of your ORM, relatively easy. In fact, not only did I implement my NoSQL solution, I also added the ability to index data into SOLR (for full-text searching) during crud functions as well, so anything is possible.