Optimizing mysql database - List of tasks eats up my server
I run a service that takes user request and adds it to a table named 'queue' that has 3 fields: userid, queuenumber, and processed.
I store there tasks that need to be processed ( I can process less tasks than I get requests )
Unfortunately, I don't really know much about mysql optimization and my processing script ( that I run 50 versions in parallel ) uses queries like those:
SELECT * FROM `queue` WHERE `processed` = 0 ORDER BY `queue`.`queuenumber`
When any instance of the code takes a row to process, it marks it processed = 1 so it won't be taken up by the other instances.
So when I run 50 instances of this code, it becomes heavy on the MySQL database ( and I have only 4GB of ram )
Because of that, I get stats like those:
Traffic: Received 15.8 MiB Sent 42.9 GiB ø per hour Received: 3 MiB Sent: 8.1 GiB
Some stats from phpmyadmin:
How do I optimize this? To be able to run this code in parallel but not list the whole tables?
The results of create table are:
CREATE TABLE queue ( userid int(11) NOT NULL, queuenumber int(11) NOT NULL AUTO_INCREMENT, processed tinyint(1) NOT NULL, PRIMARY KEY (queuenumber), UNIQUE KEY queuenumber (queuenumber), UNIQUE KEY userid (userid) ) ENGINE=InnoDB AUTO_INCREMENT=121617 DEFAULT CHARSET=latin1
Implementing queues with an RDBMS is an anti-pattern: http://blog.engineyard.com/2011/5-subtle-ways-youre-using-mysql-as-a-queue-and-why-itll-bite-you
However if you insist there are ways to do it consistently without race conditions.
First implement a notification (outside of MySQL) when a new row is enqueued so you don't have to poll for new data in a fast loop - maybe this is why your traffic is so high. I have seen UDP multicast used for this, or UNIX signals.
Now check table is using InnoDB
SHOW CREATE TABLE `queue`;
If you need to change it to InnoDB
ALTER TABLE `queue` ENGINE=InnoDB;
If you have multiple consumers processing jobs ensure your locking is race free; every time a consumer encounter a new job the first thing it should try to do is to update it with it's unique process-id. each consumer should perform an update on the table to assign the row/job to a consumer by ID
You will need to have a process column on your table
ALTER TABLE `queue` ADD `process` int unsigned default NULL;
Add an index to help you select the rows efficiently
ALTER TABLE `queue` ADD KEY (`processed`,`process`);
Now your consumers can run the following in their code
UPDATE `queue` SET process = ? WHERE process IS NULL AND processed = 0 LIMIT 1;
You can set ? to be the pid or thread-id of the consumer. You can tell if the update found a row by either checking the affected rows in the response of the query (fastest) or try to get the row details anyway - maybe you will return 0 rows which means the update did nothing - a different consumer got the job.
SELECT * FROM `queue` WHERE process = ? AND processed = 0;
When your consumer is finished with the row/job it can just set processed to 0 on that row (maybe you should consider an auto-inc primary key on the table you you can accurately address a single row)
ALTER TABLE `queue` ADD `id` bigint unsigned NOT NULL auto_increment FIRST, ADD primary key (`id`);
If your consumers crash for whatever reason you can tell what was going on by looking at the row - if processed = 0 but process has been set then something crashed while processing that row.