Tag Archives: Content is not allowed in prolog

[Solved] DOM parsing XML Error: Content is not allowed in prolog

The error contents are:

Content is not allowed in prolog. Nested exception: Content is not allowed in prolog.

The online summary is   the analysis content contains BOM  。 This tag is invisible. There is only this tag in the stream.
BOM: byte order mark, Chinese name, byte order mark. The UCS specification recommends that the BOM be transmitted before the byte stream is transmitted to judge the byte order.
in fact, UTF-8 does not need BOM to indicate the byte order, but BOM can be used to indicate the encoding method. The UTF-8 code of BOM is EF BB BF. Therefore, if the receiver receives the byte stream beginning with EF BB BF, it indicates that it is UTF-8 code

solution:

if you are parsing a file  :

You can use UltraEdit or emeditor to open XML and save it as. When saving, you can choose whether to save it in UTF-8 without BOM or UTF-8 with BOM

 

if the content is returned from a remote request:

If you change the returned stream new to a string, you will not see the BOM, but you must intercept the content you need:

if(null != result && !"".equals(result)){ if(result.indexOf("<") != -1 && result.lastIndexOf(">") != -1 && result.lastIndexOf(">") > result.indexOf("<")) result = result.substring(result.indexOf("<"), result.lastIndexOf(">") + 1); }

It is also said that it is caused by the lower version of Dom4j, but I have seen that the version I use is 1.6.1, so this possibility is excluded, but in practice, I still recommend using the latest stable version for development