Sanitize Strings for Legal Variable Names in PHP

I have code something like the following, which declares a class and its name is based on a retrieved string. But the problem is that the string may contain illegal characters that PHP doesn't accept as a class name. So is there a good way to sanitize the string before using it as a class name?

$retrieved_string = 'some unformatted string; it may contain illegal characters to be passed as a class name.';

$strMyScript = basename(__FILE__, ".php"); 
$strMyScript = sanitize_variable($strMyScript);
$strClassName = sanitize_variable($retrieved_string);

eval('
    class ' . $strMyScript . '_' . $strClassName . ' extends AnotherClass {
        // some code here
    }
');

funaction sanitize_variable($string) {
    // sanitize the string
}

Answers


First decide what you need a filter or a validator. A validator will return true/false. Then you can raise an exception, produce an error for the user or just ignore the file. The other option is to use a filter which will effectively remove characters from the input string.

public function sanitize($input)
{
    $pattern = '/[^a-zA-Z0-9]/';

    return preg_replace($pattern, '', (string) $input);
}

You might also want to check for unicode. The pattern is:

public function sanitize($input)
{
    if (!@preg_match('/\pL/u', 'a'))
    {
        $pattern = '/[^a-zA-Z0-9]/';
    }
    else
    {
        $pattern = '/[^\p{L}\p{N}]/u';
    }
    return preg_replace($pattern, '', (string) $input);
}

Issues also to consider:

  • Do you want to enable whitespace support? In this case you will need to add a space in the $pattern variables.
  • Are the filenames in a language other than English? Then you will need to do some locale specific manipulation to get the $pattern up to date.

HTH


You can check if a string is a valid identifier (class-, variable- or function name) using

if (preg_match("/^[_a-zA-Z][_a-zA-Z0-9]*$/", $received_string)) {
    // valid name
} else {
    // invalid name
}

There's a regular expression the PHP authors provide, see the manual entry on classes:

<?php

if (preg_match('/^([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)$/', $strClasssName)) {
    // etc.
}

It's the same with function names or any other label.

If you're looking to sanitize the string, then maybe you need to remove everything that's not [a-zA-Z0-9_\x7f-\xff] and then validate against ^([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)$ (the difference is, while integers are allowed characters, a class/function name may not start with one).


Need Your Help

Reading a uploaded text file and inserting into database

java servlets

I am uploading a text file using servlets and reading it, and trying to insert into database

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.