Archive for regular expressions
Yesterday, I talked about how to get the most out of running regular expressions in PHP. The reason that I needed to dig in deep on regular expression syntax with PHP is because I needed to write some regular expressions that deal with Unicode characters.
After much reading, I believed that I knew everything that I needed. I started writing some regex strings and testing the code. Unfortunately, every time I ran a test with a string that contained Unicode characters, the match failed. When I removed the Unicode characters from the string and tested again, it would work. I was baffled.
Continue reading “Unicode Support on CentOS 5.2 with PHP and PCRE”
Since beginning work on my DNS Yogi site, I’ve had to do numerous regular expressions to matching all sorts of string bits. I quickly ran into problems when I realized that I need to add support for Unicode characters since certain TLD registrars support registrations with non-Latin characters.
The main issue is that there are multiple regular expression engines. PHP uses a flavor of the PCRE (Perl Compatible Regular Expression) engine. Each engine and varient of an engine has a slightly different way of handling regular expression syntax. I needed to find out exactly how the PHP regular expression engine worked, and finding that information was not easy.





