The site I am designing allows users to upload images (PNG, JPEG, or GIF) to a servlet within my backend. This is what I've accomplished so far in terms of security...
- Validate the image client side by checking the files extension. If the extension is valid send it to the servlet for backend validation.
- Validate the mime type of the image and make sure it is either image/jpeg, image/gif, or image/png.
- Read the first 10 bytes of the image, convert them to hex, and validate that the hex matches the magic numbers of either a PNG, JPEG or GIF. Here's an example of the magic numbers I get when a user uploads a PNG - 89 50 4E 47 0D 0A 1A 0A.
So mime and magic number validation on the server side, and extension validation on the client side. Everything works great but I have two quick questions...
- Is there any purpose to send the file name to the servlet to check the extension server side since I'm already checking the mime and magic?
- What else should I do in terms of security, and what would you change about my current approach?
And please don't say "I don't think it's really necessary" to any security steps because my number one goal here is to learn. So even if there is only a .001% chance that my site could be at risk, I'd still like to learn the best way to protect myself. Thank you.
Validate the image client side by checking the files extension. If the extension is valid send it to the servlet for backend validation.
This may fail for non-Windows users, whose filetypes are not necessarily determined by an extension on the filename.
It can be useful to add a JS warning to say "this file doesn't end in .png/.gif/.jpeg/.jpg - are you sure it is an image?", but it's generally not a good idea to disallow an upload based on extension.
Validate the mime type of the image and make sure it is either image/jpeg, image/gif, or image/png.
Again there are some problems here. On Windows, the MIME type is retrieved from registry associations, which are variable and not always correct. For example IE commonly sends JPEGs as image/pjpeg, and Citrix users may find they get uploaded as image/x-citrix-pjpeg.
Since the media type is typically unused by upload scripts, there's little point reading/checking it. For the types here, I'd say your best bet would be to ignore the filename and MIME type; use only the magic number sniffing to determine format.
What else should I do in terms of security
1) Be careful what name you use to store the file - taking the user's submitted filename verbatim is dangerous due to directory traversal, special filenames and extensions (.htaccess, .jsp etc), and unreliable just because file naming rules can be complicated cross-platform.
If you want to use the supplied name on the local filesystem at all it should be basenamed, slugified (replacing all but a whitelist of simple characters), length-limited, and the extension replaced/added from the detected filetype.
Better is to store the file with a completely generated name (eg 17264.dat for the file related to item with primary key 17264 in the database); if you need to serve it up to browsers with a pretty filename you can use rewrites on the front-end web server, or a file-serving servlet, to make it visible as /images/17264/some_name.png.
2) Just because it has image magic numbers doesn't mean it's necessarily an image, or that even if it is a valid image, it doesn't have some other content in a different form at the same time (a 'chameleon' file).
For example, HTML-like content in a binary file can fool the dodgy MIME-sniffing in older versions of IE into treating it as HTML. Similarly Flash could be tricked into loading a <crossdomain> policy set out of XML inside an image, and Java could load applets that were also GIFs.
One way of making this much harder is to load the image using a server-side graphics library, and then re-save it, causing a round of recompression which will generally garble any parsable content in the file. The problem with this is for lossy compression like JPEG, where recompressing results in a loss of visual quality.
The ultimate solution is usually to give up and serve the image from a completely different hostname to the main site. Then if the attacker manages to get some XSS content into the file, it doesn't matter as there's nothing on the site it's living in to compromise, only other static images.
3) If you do load the image server-side, for (2) or other reasons, ensure that the image size - both file size and width/height size - is reasonable before attempting to load it. Otherwise you can be hit by decompression bombs filling up your memory and causing denial of service.
Also if you do this make sure to keep your image library/language (eg Java Graphics2D) up to date. There have been image-handling vulnerabilities in these languages before.
I love your question! Steps one, two, and three are excellent for security. Good work!
1) Is there any purpose to send the file name to the servlet to check the extension server side since I'm already checking the mime and magic?
No, not really. The extension is a meaningless token that only has value when attempting to interpret the data contained in the file. Your client side validation can be easily bypassed by even novice attackers, but I would still do it because not only can you save yourself some bandwidth by weeding out the least competent script kiddies, but you can also provide quicker error messages to users making an honest mistake. You are checking for the "magic numbers" server side, which is the right way to do it. It doesn't mean there isn't evil code, but it certainly makes embedding the evil code harder. You'll never stop the elite forever, but you can slow them down and stop everyone else.
2) What else should I do in terms of security, and what would you change about my current approach?
Your current approach is good. I would consider adding file size restriction enforced both client-side and server-side. The client-side can be easily defeated, but again it saves you bandwidth and costs the attacker time. Images should not reasonably be more than a couple MB unless you're building a photo editing app or something similar.
Something else that I would be careful about, is what applications you process the photo with. Some photo programs have vulnerabilities in them that can allow an attacker to get a remote shell to your server if the photo is opened with that application. This is rare but it does happen (this falls into your .001% chance). Because of this be careful with any code that you write to process the photos, and any applications you let open it. That is a deep subject. If you want to learn more about writing secure code, I highly recommend Secure Coding by Robert Seacord. I not only learned a lot about code security, but also writing less buggy code.