The problem with PHP

27 June 2018 | 27 views | Tags: Blog, Security

It feels strange to criticise PHP after developing a substantial (for me) project in it. Obviously I like the language and the recent performance improvements have given it a significant boost. But there are a few things that grate on the nerves, hum discordantly in the back of my head and make me wonder about its long term future. Here are some of them, presented in no particular order:

No native support for UTF-8

This is the future, people. There is no reasonable excuse for this anymore. Every sane web developer now works UTF-8, 100% of the time to avoid the terrible encoding problems of the past. But if you want to force UTF-8 in PHP you need to use the multibyte safe variants of string functions and continuously specify UTF-8 encoding in function calls throughout your application, everywhere. This is not a fail-safe approach, this is ‘fail-dangerous’ design and it needs to be fixed.

Give us a global flag to force strict type checking

If you want to code securely you will be doing a lot of type checking on parameters. As of PHP 7.2 we do have a strict type directive, set by a single line declaration at the top of a script, but the implementation is madness. Say you have a class file and have dutifully specified the required parameter types in all the methods. The type requirements are only honoured if the scripts calling this class file contain the strict type declaration. Yep, you have to make the declaration in all of your scripts, or it doesn’t work (too bad if you’re writing a library that will be used by third party code). You can also change the type of a parameter within the body of the function without triggering an error. Basically, weak typing is the default behaviour. It’s more ‘fail-dangerous’ design.

Seriously, if the author of a function has set type requirements, how does it help security or reliability if third parties are free to ignore them, resulting in implicit type conversions? It wouldn't be so bad if you could just set the directive once in the header file of a project, but you can't. You have to set it in every single file individually. 

The official documentation needs an upgrade

Some of it is fine, but coverage of some of the newer stuff and some key extensions is pretty scrappy; in my opinion it's getting worse. Anything that has a security application or implication requires crystal clear documentation and guidance on appropriate usage. For example, given the importance of PDO in preventing SQL injection (this being a web-oriented language), you’d think it would be well documented with examples, but no, it's horrible. Essential language features should be accompanied by quality documentation, no exceptions.

We need an image library that supports colour profiles

The ‘default’ GD library does not, so if you scale or compress an image the profile is removed and a lot of the colour gets washed out, potentially leaving you with a dull, flat and considerably uglier image (you might be able to avoid this by ensuring that all the images you use have been converted to sRGB). We need something better that we can depend on being installed.

Give us better tools for basic data validation

Yes we have some, but their implementation has more inconsistent or ‘fail-dangerous’ design or is so badly documented that people are bound to misuse it:

  • If you want to test something is simple alphabetical characters you may be tempted to reach for ctype_alpha(), but what is considered a valid alphabetical character is affected by the locale setting. This makes the results unpredictable from the developer’s point of view. Yes, you can write some regex to do exactly what you want, but why should you have to?
  • Similarly, testing hexadecimal strings with is_numeric() gives different results depending on what version of PHP is running (< 7.0 returns true, higher versions return false).
  • ctype_alnum() and ctype_digit() have an additional quirk, which is that if you pass an integer to these functions instead of a string it will be evaluated as the character specified by the corresponding ASCII code. Is this in any way helpful?
  • strip_tags() is potentially hazardous. If your data includes an unencoded ‘<’ symbol adjacent to text it will tend to chop all the text that occurs thereafter (unless it is html entity encoded). Basically you should be using this function on text nodes that have been entity encoded, but the manual doesn’t tell you that. It also doesn’t protect you from code injected into the attributes of allowed tags.
  • filter_var() and filter_input() come with a variety of filters that can be used to sanitise or validate data, which have several shortcomings and quirks to be aware of. The names of the filters are somewhat misleading and some of them are very poorly documented. You’ll need to scrape through the source code to find out exactly what they do. If you do not specify a filter to apply to your data (eg. you forgot to pass the second parameter), the default behaviour of these functions is to do nothing and not warn you. You must specify a filter or your data will be returned to you in an unfiltered state! More fail-dangerous.
  • FILTER_SANITIZE_STRING can have unpredictable results similar to strip_tags(); data containing an entity such as < may causes subsequent plain text content to be cut. While this filter will usually encode quotes by default under some PHP configurations it may not do so, which could create an SQL injection point. From a developer’s perspective it is unreliable and fail-dangerous.
  • FILTER_SANITIZE_EMAIL, FILTER_VALIDATE_EMAIL, FILTER_SANITIZE_URL and FILTER_VALIDATE_URL sound like they would be perfect solutions for cleaning up user input to your web application, but they aren't. All these filters do is to check that your input conforms to the RFC specifications for email addresses or URLs. As single quotes are legitimate characters in these types of input, you must still escape the "sanitised" and "validated" input to make it safe for use in a database query. Now you can say ‘well people should know better’, but they don’t, because a) PHP is (almost) always used in database-driven web applications and b) the documentation is not clear about what the filter actually does, and just as importantly, does not, do, and the security implications of that.

Copyright, all rights reserved.