"Content is not allowed in prolog" when parsing perfectly valid XML on GAE

The "Content is not allowed in prolog" error typically occurs when you try to parse an XML document that contains characters before the XML prolog (the <?xml ...?> declaration). The prolog is the first line of an XML document, and it specifies the version of XML and the encoding used in the document.

In some cases, the error can also occur if the XML document contains a Byte Order Mark (BOM) at the beginning of the document. The BOM is a special Unicode character that indicates the byte order and encoding of the document, and it is sometimes added to the beginning of an XML document by text editors or other software.

To fix the "Content is not allowed in prolog" error, you can try the following:

  1. Check the XML document for any characters before the XML prolog. If you find any characters, remove them and save the document.

  2. Check the XML document for a BOM at the beginning of the document. If you find a BOM, remove it and save the document.

  3. Make sure that the XML document is well-formed and follows the correct syntax.

  4. Make sure that the XML document is encoded in UTF-8. Some versions of the GAE Java runtime only support UTF-8 encoded XML documents.

  5. If you are using the DocumentBuilder class to parse the XML document, make sure that you are using the ignoreProlog flag. This flag tells the DocumentBuilder to ignore any content before the prolog, which can help to fix the "Content is not allowed in prolog" error.

For example:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
dbf.setExpandEntityReferences(false);
dbf.setIgnoreProlog(true);

DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xml)));

I hope this helps! Let me know if you have any other questions.