A PHP4 Version for converting a utf8 string/text to unicode:
function utf8_to_unicode( $str ) {
$unicode = array();
$values = array();
$lookingFor = 1;
for ($i = 0; $i < strlen( $str ); $i++ ) {
$thisValue = ord( $str[ $i ] );
if ( $thisValue < ord('A') ) {
// exclude 0-9
if ($thisValue >= ord('0') && $thisValue <= ord('9')) {
// number
$unicode[] = chr($thisValue);
}
else {
$unicode[] = '%'.dechex($thisValue);
}
} else {
if ( $thisValue < 128)
$unicode[] = $str[ $i ];
else {
if ( count( $values ) == 0 ) $lookingFor = ( $thisValue < 224 ) ? 2 : 3;
$values[] = $thisValue;
if ( count( $values ) == $lookingFor ) {
$number = ( $lookingFor == 3 ) ?
( ( $values[0] % 16 ) * 4096 ) + ( ( $values[1] % 64 ) * 64 ) + ( $values[2] % 64 ):
( ( $values[0] % 32 ) * 64 ) + ( $values[1] % 64 );
$number = dechex($number);
$unicode[] = (strlen($number)==3)?"%u0".$number:"%u".$number;
$values = array();
$lookingFor = 1;
} // if
} // if
}
} // for
return implode("",$unicode);
} // utf8_to_unicode
unicode_encode
(No version information available, might be only in CVS)
unicode_encode — Convert a unicode string in any encoding
Description
Takes a unicode string and converts it to a string in the specified encoding .
Parameters
- input
-
The unicode string that is converted.
- encoding
-
The new encoding for input .
- errmode
-
Conversion error mode. This parameter determines the action to take when the converter cannot convert a character. For a list of available modes, refer to unicode_set_error_mode(). If the parameter is not set, the global error mode is used.
Return Values
A string on success, or FALSE on failure.
Errors/Exceptions
Emits a E_WARNING level error if a converter cannot be created for the desired encoding .
Examples
Example#1 A unicode_encode() example
Note: The characters will be seen instead of entities in the output.
<?php
header ('Content-Type: text/plain; charset=ISO-8859-2');
$encoded = unicode_encode ('\u0150\u0179', 'ISO-8859-2');
echo 'Unicode semantics: ', ini_get ('unicode_semantics'), PHP_EOL;
echo 'The string itself:', $encoded, PHP_EOL;
echo 'The length of the string: ', strlen ($encoded);
?>
The above example will output something similar to:
Unicode semantics: 1 The string itself: ŐŹ The length of the string: 2
Notes
This function is EXPERIMENTAL. The behaviour of this function, the name of this function, and anything else documented about this function may change without notice in a future release of PHP. Use this function at your own risk.
unicode_encode
22-Feb-2007 06:45
23-Nov-2005 04:54
As for an example of the usage of the function unicode_encode:
<?php
header ('Content-Type: text/plain; charset=ISO-8859-2');
$encoded = unicode_encode ('\u0150\u0179', 'ISO-8859-2');
echo 'Unicode semantics: ', ini_get ('unicode_semantics'), PHP_EOL, 'The string itself: ';
printf ($encoded . PHP_EOL, '%s');
echo 'The length of the string: ', strlen ($encoded);
?>
The above example will output (please note that there will be characters instead of entities in the output):
Unicode semantics: 1
The string itself: ŐŹ
The length of the string: 2
