import script with many queries causes a slow website
We build a link for our offline program to our website. In our offline program we have 50.000 records we want to push to our website. What we do now is the following:
In the offline program we build an xml file with 1500 records and post it to a php file on our webserver. On the webserver we read the xml and push it to the mysql database, before we do that we first check if the record already exist and then we update the record or insert it as a new record.
When thats done, we give back a message to our offline program that the batch is completed . The offline program builds a new xml file with the next 1500 records. This process repeats till it reached the last 1500 records.
The problems is that the webserver become very slow while pushing the records to the database. Probably thats because we first check the records that already exist (that's one query) and then write it into the database (that's the second query). So for each batch we have to run 3000 queries.
I hope you guys have some tips to speed up this process.
Thanks in advance!
- Before starting the import, read all the data ids you have, do not make checking queries on every item insert, but check it in existed php array.
- Fix keys on your database tables.
- Make all inserts on one request, or use Transactions.
there is no problems to import a lot of data such way, i had a lot of experience with it.
A good thing to do is write a single query composed of the concatenation of all of the insert statements separated by a semicolon:
INSERT INTO table_name (a,b,c) VALUES (1,2,3) ON DUPLICATE KEY UPDATE a = 1, b = 2, c = 3; INSERT INTO table_name ...
You could do concatenate 100-500 insert statements and wrap them in a transaction.
Wrapping many statements into a transaction can help by the fact that it doesn't immediately commit the data to disk after each row inserted, it keeps the whole 100-500 batch in memory and when they are all finished it writes them all to disk - which means less intermittent disk-IO.
You need to find a good batch size, I exemplified 100-500 but depending on your server configurations, on the amount of data per statement and on the number of inserts vs. updates you'll have to fine tune it.
Read some information about Mysql Unique Index Constraints. This should help:
I had the same problem 4 months ago and I got more performance coding in java rather than php and avoiding xml documents. My tip: you can read the whole table (if you do it once is faster than make many queries 1 by 1) and keep this table in memory (in a HashMap for example). And before inserting a record, you can check if it exists in your structure localy (you do not bother the DB). You can improve your performance this way.