Symmetri Developer Blog

October 22, 2007

MakeXml10Safe()

PHP, XML - By Shourov Bhattacharya

I am writing a Web Service that returns data in simple XML 1.0. The character encoding of the data is UTF-8, but writing the XML output, I found that responses were sometimes throwing errors on the XML parser on the client side, with an "Invalid character in XML …" type error (for example, using the XML parser built into Firefox).

It turns out that my data contains a number of control characters that lie outside the range of valid XML 1.0 characters (see http://www.w3.org/TR/REC-xml/#charsets):

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

The best solution I found was to write my own "replace" function to convert these characters into character code references:

 /**                                                                
 * Strip out special characters from a string and make it XML 1.0 safe.
 *                                                                  
 * @param string $input, string to clean
 * @return $response, cleaned string
 *
 */  
 function MakeXml10Safe($input)
 {
  // escape common characters
  $output = str_replace(’&’, ‘&’, $input);  
  $output = str_replace(’<’, ‘&lt;’, $output); 
  $output = str_replace(’>’, ‘&gt;’, $output);   
  $output = str_replace(’\n’, ‘’, $output);  
  // escape control codes that are not valid XML 1.0
  $pattern = ‘/[\x-\x8\xb-\xc\xe-\x1f]/’;
  $output = preg_replace($pattern,’&#’.ord(’$0′).’;',$output);
  
  return $output;
 }

Comments »

No comments yet.

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>



Anti-spam measure: please retype the above text into the box provided.

Get free blog up and running in minutes with Blogsome
Theme designed by Janis Joseph