Backup data different from production?
(If you hate long explanations, read down to my question below)
We're having some issues with some software sold by an uncooperative vendor (unfortunately we are the only customer of this product). The platform for the software is Oracle. At night the app runs a series of night time processes. Occasionally the some of the night process crashes, causing us to scramble to restore the database and carry over work until the following night. Of course this causes users heartache which trickles to management, on down to IT. We're blaming the vendor and they're blaming us. At minimum it would be nice to have some additional logging to help us locate the problem.
We've narrowed down to a couple a couple possible problems. Oddly enough when we restore the backup from right before the failure and re-run the nightly process on a test box, it never fail. So this leads us to believe that either:
- Something different between the test and production box that we haven't detected yet
- Something weird with the software
- Something different with the production data before the export and the restored data
Which leads me to my long awaited question (pertaining to "C" above"):
Is there anything that would cause the production data to be different from the backup data (derived from the command line using "exp")?
I guess I am doubtful, but just wanted to eliminate one of the options above.
This is not likely to be caused by anything that imp/exp does (although it might be that imp/exp loads your data nicely in order and has fresh indexes and so the test database has better performance and avoids some application memory leak building up). It is more likely to be that the initial state of the test database and application is clean and un-contended, whereas the production system will have been running all day and therefore be in a different state (memory loaded with various things) and perhaps even have other processes running that block out some of your processes and cause the application to fail.
You might try to close and restart the database and application before running your process. This gives you the same starting point as you have in test. You might like to rebuild indexes and statistics and alike on production to eliminate issues here. You also need to eliminate any other work that might conflict with the process, either via the application or from anything else running in the same database.
If Oracle doesn't crash and there are no issues in the alert log then the application has the issue and needs investigation. Monitor the application server for memory or process issues and get the vendor to take a log of what going on (tell them that you need this to look for Oracle errors if they are prickly about it).