Need to store LOTS of data on Android device, thinking of going OODB
I'm currently working on a project that's based on Android. Without getting into many details, the software will run on a custom built device. The hardware will never change and will always be the same. That's a definite plus :)
With that said, this project is requiring us to store loads and loads of data on the device - Upwards of 3m rows in some tables. SQLite handles scanning this many rows just fine for us, the problem comes in when we start doing complex joins to bring back all the related data we need. We've thought about denormalizing the database but are afraid that will push the database outside the realm of usable.
We are looking into using an Object Oriented database, something like db4o or NeoDatis. Our hope is that by storing objects we can get rid of our relations on a row level and store them on the object (just like OOP). The problem is we have not been able to find any performance related benchmarks (at least not recent ones) of these ODBs running and being used on Android.
Does anyone have any experience with OODBs on Android and/or with storing and accessing this large amount of data? If so any advice you could provide would be greatly appreciated.
Here's an example of the problem we're facing. It's not related to our app (my NDA says I can't post anything specific) but this example represents the problem well.
Imagine we're building an application to monitor every vehicle that's driving on the New Jersey Turnpike at any given time. For any given car we need to track the car Make and Model, how many people are in the car and what's the demographic of the people in the car. So basically you end up with data that looks something like -
id | color | make_id | in_toll_lane | model_id
id | name
id | name | make_id
id | age | sex | is_driver | car_id
id | cars_in_line | ideal_cars_in_line | ideal_occupants
This data is going to be changing frequently. It's also going to get rather huge, as there are no doubts A LOT of people driving down the NJ Pike at any given time.
With this data we need to be able to a snap shot, on demand, of anyone who's driving on the pike. We also need to be able to take a snap shot of all the males who are driving, or all the females on the turnpike. We also need to be able to search by Age, Sex, Make, Model, etc.
Now imagine we need to figure out what toll lane each car should go into based on the number of people in the car, the ideal number of occupants, the number of cars already in line, and the ideal number of cars that should be in line.
This is a very simple example, though pretty representative of our problem.
-- End Edit
Thanks in advance!
Here are some observations, though I suspect it will not help you directly.
I think the main questions are: Are you going to discover your complex relations via application runtime logic as events generate or change data or are you going to have to just dump data into a store and then discover un-anticipate relations via query?
If your business logic will populate the model then you can easily create model based views of your different slices of the data model e.g. collections which know all cars having male/female drivers. In this case, basically, your relations are semi-static rarely changing ( while data values on the other end of those relations are probably changing a lot ). If this is the case, then why try and store the data in a database technology which is forcing you to constantly recalculate the relations (JOIN). It is just a waste of CPU and is why you will see the poor performance as the model gets complex. So, once you answer these questions, it will be very clear if ODB or RDB is best choice.
Now the question becomes, what will run on Android and handle huge data? This is where I think I cannot help. I work at Versant who has ( db4o and Versant ) ODB. Now db4o will run on Android, but really is it right choice for huge data ... No. Not unless you have very isolated data which can be in separate databases and accessed only in isolation and it does not sound to me like it is your situation. Our other database, Versant is mean't to handle huge data in near real-time, but only the client is 100% Java, the server is written in C, so it will not run on Android.
I think you will need to do some research to see who has ODB which can handle huge data on Android.
You don't say much about your data access needs or the data loadout really.
If you've got 3M main rows, and then a bunch of smaller leaf tables, then you may just do well by caching all of the leaf tables in RAM, and "joining" to them by hand. Many systems have very small leaf tables (particularly compared to the main data), so loading them up in RAM and then simply looking them up when you load the row can be a big win.
Obviously, you don't do this with major parent->child relations, but if you can eliminate the leaf joins, then a read become a single join between the parent and child rather than a half dozen to parent, child, and leaf tables.
Even if this doesn't work for all of the leaf tables, if it works for a large majority, it may well be enough to get you over the hump.
Speaking for db4o: We run all our regression tests on Android because we think it will become a very important platform for db4o.
db4o works very good for the order of magnitude of 3 million objects.
We are doing benchmark testing against other databases on http://www.polepos.org/ and we will soon release a new version of the benchmark where we run a complex setup, also against SqlLite. Porting the benchmark to Android also is a consideration.
If joins are killing your performance and you have very heterogenous data, db4o could work better than a relational database.
Your app sounds interesting. If you need help evaluating db4o, just give me a shout.