XSS prevention and general sanitization

Posted by Jad on October 21, 2007

Today, while lurking on irc, someone asked about field sanitization and how to avoid XSS attacks (cross site scripting for those who are wondering), something, every one of us should think about when developing an application. Truth is, that while CakePHP does an amazing job at making you ‘forget’ about SQL injections (since it takes care of that right out of the box), it doesn’t deter nor filter other ways like the infamous XSS unless you ask it to do so.

I won’t go over the different kinds of possible attacks, I believe a lot has already been documented but to make it short, if your application uses forms, cookies or accepts parameters directly from the URL and you haven’t thought about that, it’s time you start doing some research. You should never trust your users! Before I jump into the XSS prevention with CakePHP, I will quickly introduce the different methods made available by the Sanitize library and show how some are good at doing more than just one kind of filtering. It’s good to note before reading all this that some kind of field validations are sometime enough to avoid lots of problems, but you know, you always have those fields that have no real validation rules… that’s where you need to sanitize.

Sanitize library

1. clean($data, $connection = ‘default’) This one will make the data safe for inserting into all kind of databases. Like you have guessed it, the connection parameter is not required and will default to your main database connection. I am not that much of an expert in all db sources cake supports to assure you it won’t let anything slip, but I am confident they are experts at what they do and have taken special care of making their framework the most secured for the sloppiest of us.

The interesting part in this method is the fact that it filters, in the following order, ‘odd_spaces’ (0xCA), ‘encode’ or html (<, >, &, etc.), ‘dollar’ (none else than the $ sign), ‘carriage’ (\r), ‘unicode’ (&, #[0-9]+;), ‘escape’ (the one that really makes it SQL safe) and finally ‘backslash’.

Some examples:

//user enters in the url: site.com/products/category/1'; DROP TABLE 'users';
$sanitized = Sanitize::clean($params['pass']); //just like that will take care of it or
$sanitized = Sanitize::clean($params['pass'], array('encode' => false)) //if you don't want it to filter HTML elements
Note that both parameters accept strings and arrays. If a string is passed for the second parameter, all settings will be set to true and the value will be considered the name of the database configuration to use.

The default settings are:

array(
   'connection' => 'default',
   'odd_spaces' => true,
   'encode' => true,
   'dollar' => true,
   'carriage' => true,
   'unicode' => true,
   'escape' => true,
   'backslash' => true
);
2. escape($data, $connection = ‘default’) Yes, the same one used by Sanitize::clean(). Actually, that’s where half of the database filtering magic happens. Basically, depending on the database driver used, it escapes the unsafe characters, quotes the values and/or empty strings and replaces the nullified ones with ‘NULL’.

Usage is pretty simple:

//with the same exmple as above, which I agree, ain't the best kind
$sanitized = Sanitize::escape($params['pass'][0], 'mysybasedb’);
Unlike clean(), this one only takes strings for both parameters.

3. formatColumns(&$model)

That’s the one that takes care of the second half of the database magical sanitization. When the model’s data is not empty, it will force every value to respect the column’s database definition obtained from your db driver’s columns’ attribute. That’s used by model methods like save().

Usage is pretty self-explanatory.

4. html($string, $remove = false)

Here again, that’s one of the methods used by Sanitize::clean(). When the 2nd parameter is set to true, it will return the value of the string after a strip_tags(). In default mode (false), it will replace any of the following characters with their HTML entity:

& % < > ‘ ( ) + -
respectively giving:
& % < > ” ‘ ( ) + -
5. paranoid($string, $allowed = array())

By default, this one will remove any non-alphanumeric character ([^a-zA-Z0-9]). You can extend this regular expression by passing your exceptions in the second paramater like this:

$sanitized = Sanitize::paranoid("Let's allow single quote, spaces and underscores “, array(”‘”, ‘‘, ‘ ‘));
It will automatically take care of escaping those characters for you and insert them right here:
[^ $allowed a-zA-Z0-9]
Notice the spaces I have included just to make it clearer on the eye. You can imagine all kind of stuff you can do with that.

6. The strip methods

stripImages($string) : like it’s name indicates, it strips all image tags from the HTML.

stripScripts($string) : removes all stylesheets and scripts

stripWhitespace($string) : filters out all new lines, carriage returns, tabs and double spaces

stripAll($string) : applies all of the above

Nice, no? Well that ain’t all. What if you want to strip all text presentational tags (bold, italic, underline, etc.)? Sanitize::stripTags() to the rescue.

$sanitized = stripTags($string, 'b', 'strong', 'i', 'u', 'code');
The string to clean should always be the first parameter and after that, you can add as many as you need. It will automatically take care of finding all the <b> and </b> for you while also checking for the ones that have other attributes like <code class=”php”< - now don’t tell me that ain’t wonderful in itself?

Just note, you don’t give it something like ‘blockquote class’ or something like that, only the HTML tag - because it will append it as it to the regex and your ‘ class’ will break the closing tag pattern to match.

That’s all there is to know about the available methods, knowing when and how to use them is what’s important.

XSS prevention

So, to check for real XSS examples, I headed to ha.ckers.org - undoubtedly the best resource you can find online when it comes to those kind of attacks. First off, I started testing those examples by using them in the URL (passing them as params key and/or values) and wasn’t surprised that Cake catches all of them without any extra work on my part, nada! So there, one problem solved.

What about form inputs? As someone could have guessed, those are not automatically sanitized for XSS, it’s up to you to do that according to your needs. I mean it could be as simple as:

foreach ($this->params['form'] as $key => $param)
{
  $this->params['form'][$key] = Sanitize::clean($param);
  //or $this->params['form'][$key] = Sanitize::html($param);
  //or $this->params['form'][$key] = Sanitize::scriptTags($param, 'script');
}
I didn’t have time to check for attacks using cookies but will let you know when I do or if you have done any testing like that, please share. If I have any mis-understanding of the above methods, do NOT hesitate to let me know. Security ain’t something that can be taken care of ‘later on’.

Don’t let people poison your cakes!

Trackbacks

Use this link to trackback from your own site.

Comments

Leave a response

  1. Charlei Fri, 26 Oct 2007 10:25:22 EDT

    Nice article! Used it quite a lot already.

  2. Jad Fri, 26 Oct 2007 19:46:34 EDT

    @Charlei: glad it was useful :)

Comments