logo of blechtrottel.net blechtrottel.net
deutsch

XSLT in PHP4

UTF and ISO

Just like Sablotron, other XML funtions in PHP too only output utf-8. This can lead to problems when your webpages are not encoded in utf-8 and contain XSLT in PHP. Special characters will not be displayed correctly. You can of course have one page differ from all the others, e.g. be in utf-8. When this webpage, however, uses an external stylesheet Internet Explorer sometimes runs into problems. If, for example, the transformation results are in utf-8 while their external stylesheet (just as all other pages) is in iso-8859-1, IE ignores the CSS. So it is worth to have a look or two into the different ways of decoding utf-8 in PHP.

Unfortunately one thing is true here as well: The simplest functions are not included in every installation of PHP. So assessing a webspace starts with trial and error. Or with the test package from blechtrottel brodaktschns, which can be found further below on this page.

Here is the code for the different ways of decoding, which replace the last line of code on the previous page:

utf8_decode()

This is the simplest method, if the result is to be iso-8859-1.

<?php

print utf8_decode($html);

?>

iconv()

This method's advantage lies in that the result does not necessarily have to be iso-8859-1.

<?php

print iconv("utf-8", "iso-8859-1", $html);

?>

preg_replace()

If none of the above methods work you can only fall back on regular expressions. All special characters that are displayed wrongly get replaced by the correct ones. Our example works for German umlauts and the sz in iso-8859-1. For other encodings and special characters it is best to copy the wrong characters from the results that are not displayed correctly.

<?php

$utf8umlaute = array ('@Ã@',
                      '@ä@',
                      '@Ã@',
                      '@ö@',
                      '@Ã@',
                      '@ü@',
                      '@Ã@');

$isoumlaute = array ('Ä', 'ä', 'Ö', 'ö', 'Ü', 'ü', 'ß');

print preg_replace($utf8umlaute, $isoumlaute, $html);

?>

This: Ã for example, is what a German A umlaut encoded in utf-8 looks like in an iso-8859-1 webpage. (The '@ @' in the sample code are needed for the regex array.)

Final remarks

The examples given were or have been at work here at blechtrottel brodaktschns. Our RSS-newsfeed used to get parsed with Sablotron and turned into iso-8859-1 by using regular expressions. As we mentioned our webspace now supports PHP5. At the moment our newsfeed is parsed with libxslt. We stayed with the regular expressions, though.

In order to find out what method works on your own webspace there is a test package available as a ZIP for download. The php4xslttest folder has to be extracted and uploaded completely into the root directory (mostly htdocs) of the webspace. Then you have to call the file http://.../php4xslt/php4xslttest.html (The three dots are to represent the address of the webspace.):

die php4xslttest.html im einsatz
testdatei im einsatz (für größeres bild klicken)

The server in the screenshot supports XSLT with libxslt, UTF-reencodings with utf_decode() and, of course, preg_replace(). The other tests bring up an error message.

These tests work quite as well on PHP5.

Any tips, wishes and complaints are welcome. We would like to thank all those who already did comment and thus helped us refine the texts here.

If you are looking for further PHP tutorials try Dynamic Web Pages. Even though this is a German site, most tutorials listed there are in English.

Page 1 - Page 2 - Page 3 - Page 4