Database version control plan: hot or not?
Based on reading around the web, stack overflow, and mostly these articles about db versioning that were linked from coding horror, I've made a stab at writing a plan for versioning the database of an 8 year old php mysql website.
Database Version Control plan - Create a db as the "Master Database" - Create a table db_version (id, script_name, version_number, author, comment, date_ran) - Create baseline script for schema+core data that creates this db from scratch, run this on Master Db - Create a "test data" script to load any db with working data - Modifications to the master db are ONLY to be made through the db versioning process - Ensure everyone developing against the Master Db has a local db created by the baseline script - Procedures for commiting and updating from the Master Db - Master Db Commit - Perform a schema diff between your local db and the master db - Perform a data diff on core data between your local db and master db - If there are changes in either or both cases, combine these changes into an update script - Collect the data to be added to a new row in db_version table, and add an insert for this into the script - new version number = latest master db version number +1 - author - comment - The script must be named as changeScript_V.sql where V is the latest master db version +1 - Run the script against the master db - If the script executed succesfully, add it to the svn repository - Add the new db_version record to your local db_version table - Update from Master Db - Update your local svn checkout to have all the latest change scripts available - compares your local db_version table to the master db_version table to determine which change scripts to run - Run the required change scripts in order against your local db, which will also update your local db_version table
My first question is, does this sound correct? My second question is, the commit process seems a bit complicated to do more than once a day. Is there a way to reliably automate it? Or should I not be commiting database changes often enough for it to matter?
Looking at your proposals, it doesn't seem like something that's feasible nor practical. I was working in a company where we used more than 1k tables per database (very complex system), and it all worked fine like this:
- Have one person in charge of the DB (lets call him DBPerson) - every script/db change has to pass through him. This will avoid any unnecessary changes, and some 'overlooks' of the issues (for example, if someone moves an index to perform better for his query, hi might destroy other persons work, maybe someone will create a table that is completely redundant and unnecessary, etc...). This will keep db clean and efficient. Even if it seems like this is too much work for one guy (or his deputy), in fact it isn't - the db usually rarely changes.
- Each script has to pass validation through DBPerson
- When the script is approved, the DBPerson assigns a number and puts it in 'update' folder/svn(...), with appropriate numbering (as you suggested, incremental numbers for example).
- Next, if you have some continuous integration in place, the script gets picked up and updates the db (if you don't have continuous integration, do it manually).
- Do not store entire database script, with all the data in script. Store the actual database instead. If you have branches of the solution - have each branch with it's own database, or you can always have update scripts divided for each of the branches so you could rollback/forward to another branch. But, I really recommend to have a separate db for each branch.
- Have one database always with default data (intact) - for needs of unit tests, regression tests etc. Whenever you do the tests, do them on the copy of this database. You could even put a nightly cleanup of the test databases with the main one (if appropriate of course).
In an environment like this you'll have multiple versions of database:
- Developers database (local) - the one that the dev guy is using to test his work. He can always copy from Master or Test Master.
- Master database - the one with all the default values, maybe semi-empty if you're doing redeploys to new clients.
- Test Master database - Master database filled with test data. Any scripts you have ran on Master you ran here as well.
- Test in progress database - copied from Test Master and used for testing - gets overwritten prior to any new test.
- If you have branches (similar database with slight difference for each of the clients) than you'll have the same as above for each branch...
You will most certainly have to make modifications of this to match your situation, but anyway I think that keeping the textual version of the create script for entire database is wrong in terms of maintainability, merging, updating, etc...