import script with many queries causes a slow website

We build a link for our offline program to our website. In our offline program we have 50.000 records we want to push to our website. What we do now is the following:

In the offline program we build an xml file with 1500 records and post it to a php file on our webserver. On the webserver we read the xml and push it to the mysql database, before we do that we first check if the record already exist and then we update the record or insert it as a new record.

When thats done, we give back a message to our offline program that the batch is completed . The offline program builds a new xml file with the next 1500 records. This process repeats till it reached the last 1500 records.

The problems is that the webserver become very slow while pushing the records to the database. Probably thats because we first check the records that already exist (that's one query) and then write it into the database (that's the second query). So for each batch we have to run 3000 queries.

I hope you guys have some tips to speed up this process.

Thanks in advance!

Answers


  1. Before starting the import, read all the data ids you have, do not make checking queries on every item insert, but check it in existed php array.
  2. Fix keys on your database tables.
  3. Make all inserts on one request, or use Transactions.

there is no problems to import a lot of data such way, i had a lot of experience with it.


A good thing to do is write a single query composed of the concatenation of all of the insert statements separated by a semicolon:

INSERT INTO table_name
        (a,b,c)
    VALUES
        (1,2,3)
    ON DUPLICATE KEY
        UPDATE a = 1, b = 2, c = 3;
INSERT INTO table_name
    ...

You could do concatenate 100-500 insert statements and wrap them in a transaction.

Wrapping many statements into a transaction can help by the fact that it doesn't immediately commit the data to disk after each row inserted, it keeps the whole 100-500 batch in memory and when they are all finished it writes them all to disk - which means less intermittent disk-IO.

You need to find a good batch size, I exemplified 100-500 but depending on your server configurations, on the amount of data per statement and on the number of inserts vs. updates you'll have to fine tune it.


Read some information about Mysql Unique Index Constraints. This should help:

Mysql Index Tutorial


I had the same problem 4 months ago and I got more performance coding in java rather than php and avoiding xml documents. My tip: you can read the whole table (if you do it once is faster than make many queries 1 by 1) and keep this table in memory (in a HashMap for example). And before inserting a record, you can check if it exists in your structure localy (you do not bother the DB). You can improve your performance this way.


Need Your Help

How to implement security model in MS CRM?

c# .net security dynamics-crm dynamics-crm-2011

I'm trying to secure certain information in MS-CRM. What is the best approach to restrict a user from reading everybody's e-mails or reading certain contacts?

post page with youtube video on facebook which plays directly in facebook

facebook facebook-graph-api video youtube open-graph-protocol

Here is my problem, which I haven't been able to fully solve after much searching and experimenting. We have many pages on our site which primarily hold a single YouTube video. I would like that wh...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.