UTF-16 and Byte Order Mark

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

UTF-16 and Byte Order Mark

Dieter Köhler

Appendix F.1 of the XML specs presents examples about how to automatically
detect the encoding of an entity from the first characters of an XML
encoding declaration without a byte order mark.  These examples include
UTF-16BE and UTF-16LE. However, section 4.3.3 says that entities encoded in
UTF-16 MUST begin with a byte order mark.

In the light of the examples it seems that the intention of the specs is to
demand a UTF-16 byte order mark only when no XML declaration is used.  Is
this interpretation of the specs correct?

If the answer is "yes", I would suggest to start the second paragraph of
sect. 4.4.3 with: "In the absence of a text declaration (or an XML
declaration respectively) entities encoded in UTF-16 MUST ..."

If the answer is "no", I would suggest to remove the two incriminated
examples from Appendix F.1 and to add an appropriate warning.

Dr. Dieter Köhler, M.A.
Wissenschaftlicher Assistent
Institut für Philosophie und
Studienzentrum Multimedia
Universität Karlsruhe (TH)

University address:
Institut für Philosophie der
Universität Karlsruhe (TH)
D-76128 Karlsruhe
Phone:       +49-(0)-721-608-2149
Direct Line: +49-(0)-721-608-7743
Fax:         +49-(0)-721-608-3084