> Appendix F.1 of the XML specs presents examples about how to
> automatically detect the encoding of an entity from the first
> characters of an XML encoding declaration without a byte order mark.
> These examples include UTF-16BE and UTF-16LE. However, section 4.3.3
> says that entities encoded in UTF-16 MUST begin with a byte order
That is strictly limited to the UTF-16 encoding, and excludes the
related UTF-16LE and UTF-16BE encodings, in which BOMs are not present.
Note that "UTF16-LE" does not mean "UTF-16 encoding whose BOM shows it
to be little-endian" but rather "UTF-16-like encoding in little-endian
order without a BOM." If U+FEFF appears at the beginning of a UTF-16LE
UTF16-BE document, it is not a BOM but a ZWNBSP character (and therefore
the document cannot be well-formed XML. cannot be well-formed XML),
not a BOM.
> In the light of the examples it seems that the intention of the specs
> to demand a UTF-16 byte order mark only when no XML declaration is
> Is this interpretation of the specs correct?
No. If the encoding is UTF-16, a BOM is mandatory, whether or not an
XML declaration is present.
> If the answer is "no", I would suggest to remove the two incriminated
> examples from Appendix F.1 and to add an appropriate warning.
The examples are not in error, because they refer to the UTF-16LE and
UTF-16BE encodings rather than the UTF-16 encoding.
The Core WG will be adding language to 4.3.3 stating that UTF-16BE and
UTF-16LE are specifically not UTF-16.
I marvel at the creature: so secret and John Cowan
so sly as he is, to come sporting in the pool [hidden email] before our very window. Does he think that
http://www.ccil.org/~cowan Men sleep without watch all night?