Built-in Unicode support is one of the features promised for PHP 6.
Earlier PHP versions can manipulate Unicode strings using the multi-byte string extension. However, this extension is not always available in every PHP installation.
This class implements a clever alternative to manipulate Unicode text encoded as UTF-8. It uses the PCRE extension PHP functions.
This extension can perform regular expression manipulation functions on UTF-8 strings and is available since PHP 3.
This package can be use to perform several types of manipulation operations on UTF-8 encoded and unencoded strings.
There are several classes:
a) One to compute the length and extract parts of a string encoded in UTF-8. It uses PCRE extension functions, so it does not rely on multi-byte string manipulation extension.
b) Another class to perform white space normalization functions like: mapping any line break sequences to simple line breaks, reduce multiple space or line break sequences to only one, convert tabs to spaces and vice-versa, etc..
c) Another to perform letter case normalization functions like: convert strings to camel case words and vice-versa, or words separated by underscore characters, etc..