Best regex to catch XSS (Cross-site Scripting) attack (in Java)?

Jeff actually posted about this in Sanitize HTML. But his example is in C# and I'm actually more interested in a Java version. Does anyone has a better version for Java? Does his example is good enough that I could just convert it directly from C# to Java?

[Update] I have put a bounty on this question because SO wasn't as popular as today when I asked the question (*). As for anything related to security, the more people look into it, the better it is!

(*) In fact, I think it was still in closed beta

Answers


Don't do this with regular expressions. Remember, you're not protecting just against valid HTML; you're protecting against the DOM that web browsers create. Browsers can be tricked into producing valid DOM from invalid HTML quite easily.

For example, see this list of obfuscated XSS attacks. Are you prepared to tailor a regex to prevent this real world attack on Yahoo and Hotmail on IE6/7/8?

<HTML><BODY>
<?xml:namespace prefix="t" ns="urn:schemas-microsoft-com:time">
<?import namespace="t" implementation="#default#time2">
<t:set attributeName="innerHTML" to="XSS&lt;SCRIPT DEFER&gt;alert(&quot;XSS&quot;)&lt;/SCRIPT&gt;">
</BODY></HTML>

How about this attack that works on IE6?

<TABLE BACKGROUND="javascript:alert('XSS')">

How about attacks that are not listed on this site? The problem with Jeff's approach is that it's not a whitelist, as claimed. As someone on that page adeptly notes:

The problem with it, is that the html must be clean. There are cases where you can pass in hacked html, and it won't match it, in which case it'll return the hacked html string as it won't match anything to replace. This isn't strictly whitelisting.

I would suggest a purpose built tool like AntiSamy. It works by actually parsing the HTML, and then traversing the DOM and removing anything that's not in the configurable whitelist. The major difference is the ability to gracefully handle malformed HTML.

The best part is that it actually unit tests for all the XSS attacks on the above site. Besides, what could be easier than this API call:

public String toSafeHtml(String html) throws ScanException, PolicyException {

    Policy policy = Policy.getInstance(POLICY_FILE);
    AntiSamy antiSamy = new AntiSamy();
    CleanResults cleanResults = antiSamy.scan(html, policy);
    return cleanResults.getCleanHTML().trim();
}

Need Your Help

Bind to collection commands in data template

wpf binding command datatemplate

I have a list box with a collection as its data context and I have defined a data template to have specific view of collection items.

Why does my event property code cause a stack overflow exception?

c# winforms events properties user-controls

I'm making a custom button (Winforms Control Library) and have the code below so that all mouseenter will be added to all controls in my button. When I run it, it causes a stack overflow exception. I

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.