Is file_exist() in PHP a very expensive operation?
I'm adding avatars to a forum engine I'm designing, and I'm debating whether to do something simple (forum image is named .png) and use PHP to check if the file exists before displaying it, or to do something a bit more complicated (but not much) and use a database field to contain the name of the image to show.
I'd much rather go with the file_exists() method personally, as that gives me an easy way to fall back to a "default" avatar if the current one doesn't exist (yet), and its simple to implement code wise. However, I'm worried about performance, since this will be run once per user shown per pageload on the forum read pages. So I'd like to know, does the file_exists() function in PHP cause any major slowdowns that would cause significant performance hits in high traffic conditions?
If not, great. If it does, what is your opinion on alternatives for keeping track of a user-uploaded image? Thanks!
PS: The code differences I can see are that the file checking versions lets the files do the talking, while the database form trusts that the database is accurate and doesn't bother to check. (its just a url that gets passed to the browser of course.)
As well as what the other posters have said, the result of file_exists() is automatically cached by PHP to improve performance.
However, if you're already reading user info from the database, you may as well store the information in there. If the user is only allowed one avatar, you could just store a single bit in a column for "has avatar" (1/0), and then have the filename the same as the user id, and use something like SELECT CONCAT(IF(has_avatar, id, 'default'), '.png') AS avatar FROM users
You could also consider storing the actual image in the database as a BLOB. Put it in its own table rather than attaching it as a column to the user table. This has the benefit that it makes your forum very easy to back up - you just export the database.
Since your web server will already be doing a lot of (the equivalent of) file_exists() operations in the process of showing your web page, one more run by your script probably won't have a measurable impact. The web server will probably do at least:
- one for each subdirectory of the web root (to check existence and for symlinks)
- one to check for a .htaccess file for each subdirectory of the web root
- one for the existence of your script
This is not considering more of them that PHP might do itself.
In actual performance testing, you will discover file_exists to be very fast. As it is, in php, when the same url is "stat"'d twice, the second call is just pulled from php's internal stat cache.
And that's just in the php run scope. Even between runs, the filesystem/os will tend to aggressively put the file into the filesystem cache, and if the file is small enough, not only will the file exists test come straight out of memory, but the entire file will too.
Here's some real data to back my theory:
I was just doing some performance tests of linux command line utilities "find" and "xargs". In the proceeds, I performed a file exists test on 13000 files, 100 times each, in under 30 seconds, so thats averaging 43,000 stat tests per second, so sure, on the fine scale its slow if your comparing it to say, the time it takes to divide 9 by 8 , but in a real world scenario, you would need to be doing this an awful lot of times to see a notable performance problem.
If you have 43 thousand users concurrently accessing your page, during the period of a second, I think you are going to have much bigger concerns than the time it takes to copy the status of the existence of a file more-or-less out of memory on the average case scenario.
At least with PHP4, I've found that a call to a file_exists was definitely killing our application - it was made very repetidly deep in a library, so we really had to use a profiler to find it. Removing the call increased the computation of some pages a dozen times (the call was made verrry repetidly).
It may be possible that in PHP5 they cache file_exists, but at least with PHP4 that was not the case.
Now, if you are not in a loop, obviously, file_exists won't be a big deal.
file_exists() is not slow per se. The real issue is how your system is configured and where the performance bottlenecks are. Remember, databases have to store things on disk too, so either way you're potentially facing disk activity. On the other hand, both databases and file systems usually have some form of transparent caching to optimized repeat access.
You could easily go either way, since chances are your performance bottleneck will be elsewhere. The only place where I can see it being an obvious choice would be if you're on some kind of oversold shared hosting where there's a ton of disk contention, but maybe database access is on a separate cluster and faster (or vice versa).
In the past I've stored the image metadata in a database (including its name) so that we could generate useful stats. More importantly, storing image data (not the file itself, just the metadata) is conducive to change. What if in the future you need to "approve" the image, or you want to delete it without deleting the file?
As per the "default" avatar... well if the record isn't found for that user, just use the default one.
Either way, file_exists() or db, it shouldn't be much of a bottleneck to worry about. One solution, however, is much more expandable.
If performance is your only consideration a file_exists() will be much less expensive then a database lookup.
After all this is just a directory lookup using system calls. After the first execution of the script most of the relevent directory will be cached in storage so there is very little actual I/O involved, and, "file_exists()" is such a common operation that it and the underlying system calls will be highly optimised on any common php/os combination.
As John II noted. If extra functionality and user inteface features are a priority then a database would be the way to go.