$secure_text = strip_tags($original_text, '<b><i>'); // we only allow the <b> and <i> tags, everything else will be removedDo you feel peaceful, calming sense of safety? You shouldn't ;)
The problem with
strip_tags()
is that it only removes the unwanted HTML elements, but it doesn't do anything with the attributes of the allowed elements; and the HTML attributes can be javascript event handlers, this way they can contain malicious code, therefore even a tame <b>
tag can be dangerous. For example if the input is <b onclick="alert('PWNED')">click me</b>
then the onclick
attribute won't be removed if you simply sanitize the input with strip_tags($input, '<b><i>')
and the attacker successfully made a javascript injection.You may think that you can write a simple function to properly remove the dangerous attributes of the allowed elements, and use that instead of
strip_tags()
, but there are 2 problems with that:- you can't do that safely with regexp
- if you take into account the
<a>
tag, then the problems become more complicated: if you allow links in the user input - why not, the user must be able to put links into his forum post - then the obviously neededhref
attribute can also contain malicious code. Example:<a href="javascript:alert('PWNED')">click me</a>
- and this javascript will be executed by the browser. It's one more thing to take care about.
To sum up,
strip_tags()
is not enough. Let's see what else we have.If you are up-to-date with the well-known PHP libraries, then you have surely heard about HTMLPurifier. It's known to be a very mature tool to securely handle the user input. This is the first thing you should have in mind. Secondly, your template engine may also be responsible for properly escape the unsecure data. I didn't want to check all template engines out there, so I picked Twig - which I have had to use recently - and tried out what it knows. I have made the testing using the following PHP script:
<html>
<head>
<title>JavaScript Injection testing</title>
</head>
<body>
<h1>strip_tags()</h1>
<?php echo strip_tags($_GET['txt'], '<b><a>') ?>
<h1>HTMLPurifier</h1>
<?php
require 'path/to/htmlpurifier/HTMLPurifier.auto.php';
$purifier = new HTMLPurifier(array(
'HTML.AllowedElements' => array('b', 'a')
));
echo $purifier->purify($_GET['txt']);
?>
<h1>Twig</h1>
<?php
require 'path/to/twig/Twig/Autoloader.php';
Twig_Autoloader::register();
$loader = new Twig_Loader_String();
$twig = new Twig_Environment($loader);
$template = $twig->loadTemplate('{{ input|striptags("<b><a>")|raw }}');
$template->display(array('input' => $_GET['txt']));
?>
<form action="">
<input name="txt"/>
<input type="submit"/>
</form>
</body>
</html>
After downloading HTMLPurifier and Twig, you can simply try it out.
For testing, I used the following form data:
<b onclick="alert('PWNED')">click me</b> | <a href="javascript:alert('PWNED')">funny pics here</a>
So I tested both the javascript event handler removal and the malicious
<a href="javascript:...">
value removal on strip_tags()
, HTMLPurifier and Twig. Nothing much to say about it, both strip_tags()
and Twig failed to avoid both javascript injections. HTMLPurifier did a good job as expected.Summary:
- If you want to allow some HTML in the untrusted input, then don't rely on
strip_tags()
, use HTMLPurifier instead. - If you want to use Twig then escape the input in your controller using HTMLPurifier, before rendering, and don't rely on the
striptags
filter of Twig, which in fact doesn't do anything else just callsstrip_tags()
. (Otherwise Twig is not a bad choice for templating in my opinion) - You may provide a custom formatting syntax for your user and completely disallow HTML, for example BB codes and wiki syntax can be an option in some cases, but due to the lack of WYSIWYG editors for those "languages" it's not possible in a lot of cases nowadays.
Actually there are WYSIWYG editors for various syntaxes, see for example: http://wysiwygbbcode.codeplex.com/
ReplyDeleteSo in my opinion this is the best solution so far.
This comment has been removed by a blog administrator.
ReplyDelete