Performance of event-source

I'm currently working on a large project, which requires server-sent events implementation. I've decided to use event-source transport for it, and started from simple chat. Currently client side listens only to a new chat message event, but project will have a lot more events in the future. First of, I'm really concerned about a server-side script and loop in it, and second, I'm not sure that using mySQL database as storage(in this case, for chat messages) is actually a good practice. Current loop gives away new messages as they appear in database:

$statement = $connect->prepare("SELECT id, event, user, message FROM chat WHERE id > :last_event_id");
while(TRUE) {
    try {
        $statement->execute(array(':last_event_id' => $lastEventId));
        $result = $statement->fetchAll();
        foreach($result as $row) {
            echo "id: " . $row['id'] . "\n";
            echo "event: " . $row['event'] . "\n";
            echo "data: |" . $row['user'] . "| >>> \n";
            echo "data: " . $row['message'] . "\n\n";
            $lastEventId++;
        }
    } catch(PDOException $PDOEX) {
        echo $PDOEX->getMessage();
    }
    ob_flush();
    flush();
    usleep(10000);
}

From what I've read such loop is inevitable, and my task is to optimize it's performance. Currently I'm using prepared statement outside of while() and reasonable(?) usleep().

So, the questions to those who got experience in server-side events:

  • is such technique reasonable to use in moderately loaded web-sites(1000-5000 users on-line)?
  • if yes, is there any way to boost performance?
  • could mySQL database be a bottleneck in this case?

Appreciate any help, as question is quite complex and searching info won't give me any tips or ways to test it.

Answers


Will all 1000+ users be connected simultaneously? And are you using Apache with PHP? If so, I think the thing you should really be concerned about is memory: each user is holding open a socket, an Apache process, and a PHP instance. You'll need to measure yourself, for your own setup, but if we say 20MB each, that is 20GB of memory for 1000 users. If you tighten things so each process is 12MB that is still 12GB per 1000 users. (A m2.xlarge EC2 instance has 17GB of memory, so if you budget one of those per 500-1000 users I think you will be okay.)

In contrast, with your 10 second poll time, CPU usage is very low. For the same reason, I would not imagine polling the MySQL DB will be the bottleneck, but at that level of use I would consider having each DB write also do a write to memcached. Basically, if you don't mind throwing a bit of hardware at it, your approach looks doable. It is not the most efficient use of memory, but if you are familiar with PHP it will probably be the most efficient use of programmer time.


UPDATE: Just saw OP's comment and realized that was usleep(10000) is 0.01s, not 10s. Oops! That changes everything:

  • your CPU usage is now high!
  • You need a set_time_limit(0) at the top of your script: you are going to hit the default 30 second CPU usage very quickly with that tight limit.
  • Instead of polling a DB you should use a notification queue service.

I'd use the queue service instead instead of memcached, and you could either find something off the shelf, or write something custom in PHP fairly easily. You can still keep MySQL as the main DB and have your queue service poll MySQL; the difference here is you only have one process polling it intensively, not one thousand. The queue service is a simple socket server, that accepts a connection from each of your front-facing PHP scripts. Each time its polling finds a new message, it broadcasts that to all the clients that have connected to it. (There are different ways to architect it, but I hope that gives you the general idea.)

Over on the front-facing PHP script, you use a socket_select() call with a 15-second timeout. It only wakes up when there is no data, so is using zero CPU the rest of the time. (The 15-second timeout is so you can send SSE keep-alives.)


(Source for the 20MB and 12MB figures)


  • is such technique reasonable to use in moderately loaded web-sites(1000-5000 users on-line)?

Pretty much the only way of doing it unless you put the refresh timer in the client side and use the server side as web services only. Load will be high with that amount of users but your limited by doing a pure php only solution I'd rather look at a c/c++ daemon on the server and raw sockets

  • if yes, is there any way to boost performance?

memcached as a temp storage then a back end process to commit the archive hourly / minutely whatever to the mysql db

  • could mySQL database be a bottleneck in this case?

yes but depends how much hardware you're willing to throw at the solution or how confident you are at setting up something such as master-slave replication using one read and one write db

Hope that helps


Need Your Help

What is an unsigned char?

c++ c char

In C/C++, what is an unsigned char used for? How is this different from a regular char?

SQL divide and group by id

sql sql-server database sql-server-2008

I have the following query below. I'm trying to pull a count of records with specific criteria then divide by the total number of records grouped by cstmr_id. However I'm getting an error. Any help...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.