Multiple “agents” handling a single array

Apologies if this has been covered before - I did my searching but possibly may not know the correct terms to have used.

This process is handled with PHP.

Here's the situation:

I have a large array of file names. The script I have opens these files and enters their content into a database. Processing these files one at a time takes over 24 hours, and these files are updated on a daily basis.

Breaking the single large array into four smaller arrays and running concurrent processes finishes the job before the 24 hour window elapses, but sometimes one or two processes will finish hours before the others because file sizes vary on a daily basis.

Much like people who stock retail shelves (who else has worked that nightmare before?) pitch in to help out with what's left after finishing their own tasks, I'd like to have a script in place where these "agents" do the same.

Here's some basics of what I have figured out - it could be wrong, and I'm not too proud to protest if I am :-)

$files = array('file1','file2','file3','file4','file5'); 
//etc... on to over 4k elements

while($file = array_pop($files)){

    //Something in here...  I have no idea what.

}

Ideas? Something like four function calls or four loops within that overarching 'while' has crossed my mind, but I'm pretty sure it's going to wait on executing subsequent calls until the previous one(s) finish.

Any help is appreciated. I'm seriously stuck on this one!

Thanks!

Answers


A database-backed message queue seems the obvious solution but I think that's overkill in this case. I would simply put the files to be processed into a single dedicated queue directory, then use the DirectoryIterator class to scan it. Something like this:

while (true) {
    look in the queue directory for a file
    if you don't fine one, exit the script, all processing is done
    if you find one, rename it or move it to a work directory
    if the rename/move command succeeded, process the file
    if the rename/move command failed, one of the other threads got it first
}

Edit:

Regarding launching the workers, you could use a simple shell script to spawn the PHP processes in the background:

NUM_WORKERS=5
for WORKER in $(seq 1 ${NUM_WORKERS})
do
    echo "starting worker ${WORKER}"
    php -f /path/to/my/process.php &
done

Then, create a cron entry to run this launcher, for example, at midnight:

0 0 * * * /path/to/launcher.sh

You want what's called a "message queue". Something like beanstalkd

You'll basically create a list of messages that include your individual filenames. You'll then create a set of processors to process them. Each processor will handle one file then go back to the queue to see if there are more messages/files waiting to be processed.

EDIT: Here's an analogy to help explain message queues. Your first idea is like a human manager taking a stack of files, dividing them into four piles and then handing each of his four employees a pile to process. A message queue is more like this: the manager puts all the files on a table and tells each employee to take a single file from the table and process it. He tells them when they're done with the first file to keep taking files until there are no more files on the table. When all the files are done, the employees can go home.

One employee might end up with really large files and only handle a few, while another employee might get smaller files and handle many. It doesn't matter how many each employee handles, they'll all keep working until the table is empty.


Need Your Help

C++ input from array to file through function

c++ string file function

I'm currently studying basic c++, and I've encountered a problem that I can't deal with. In the below code you can see my program. What's bad about it, is that from the cout << word; I can se...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.