On Charsets, Timezones, and Hashes

Character sets, time zones and password hashes are pretty much the bane of my life. Whenever something breaks in a particularly spectacular fashion, you can be sure that one of those three is, in some way, responsible. Or DNS, of course. Apparently the software developers Just Don’t Get It™. Granted, they are pretty complex topics. I’m not expecting anyone to care about the difference between ISO-8859-15 and ISO-8859-1, know about UTC’s subtleties or be able to implement SHA-1 using a ball of twine.

What I do expect, is for sensible folk to follow these very simple guidelines. They will make your (and everyone else’s) life substantially easier.

Use UTF-8..

Always. No exceptions. Configure your text editors to default to UTF-8. Make sure everyone on your team does the same. And while you’re at it, configure the editor to use UNIX-style line-endings (newline, without useless carriage returns).

..or not

Make sure you document the cases where you can’t use UTF-8. Write down and remember which encoding you are using, and why. Remember that iconv is your friend.

Store dates with time zone information

Always. A date/time is entirely meaningless unless you know which time zone it’s in. Store the time zone. If you’re using some kind of retarded age-old RDBMS which doesn’t support date/time fields with TZ data, then you can either store your dates as a string, or store the TZ in an extra column. I repeat: a date is meaningless without a time zone.

While I’m on the subject: store dates in a format described by ISO 8601, ending with a Z to designate UTC (Zulu). No fancy pansy nonsense with the first 3 letters of the English name of the month. All you need is ISO 8601.

Bonus tip: always store dates in UTC. Make the conversion to the user time zone only when presenting times to a user.

Don’t rely on platform defaults

You want your code to be cross-platform, right? So don’t rely on platform defaults. Be explicit about which time zone/encoding/language/.. you’re using or expecting.

Use bcrypt

Don’t try to roll your own password hashing mechanism. It’ll suck and it’ll be broken beyond repair. Instead, use bcrypt or PBKDF2. They’re designed to be slow, which will make brute-force attacks less likely to be successful. Implementations are available for most sensible programming environments.

If you have some kind of roll-your-own fetish, then at least use an HMAC.

Problem be gone

Keeping these simple guidelines in mind will prevent entire ranges of bugs from being introduced into your code base. Total cost of implementation: zilch. Benefit: fewer headdesk incidents.

— Elric