Quantcast

Empty xml:lang attributes validation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Empty xml:lang attributes validation

Stéphane Corlosquet
Hi,

On Mon, Nov 30, 2009 at 10:42 AM, Steven Pemberton <[hidden email]> wrote:
There's something slightly odd with classifying people's names as being in
the same language as the document:

      <http://drupalrdf.openspring.net/user/20> a sioc:User ;
           foaf:name "lucerukimam"@en .

good point. This is inherited from the xml:lang="en" attribute of the html
tag of the document. Is there anyway in RDFa to reset this language tag in
the markup? is adding xml:lang="" to the tag containing the foaf:name the
right way to do it?

Yes.

We fixed the Drupal RDFa markup to reset the language of the username following the above, see example [1]. The markup validates with the W3C validator [2], however we got a bug report [3] about http://www.totalvalidator..com/ not validating the markup due to xml:lang="". I have never heard of Total Validator, and I consider the W3C validator authoritative, but the Total Validator mentions at [4] that it "uses the official W3C and ISO DTDs for HTML Validation and the tests are automated from these - in other words we haven't personally made up the validation rules or translated them. So if TV finds a problem on your page (or fails to find one) then it's highly likely that it's in the W3C DTD's and not a mistake in TV." Another interesting one is "For example the W3C validator doesn't check the value of attributes and so will report success even when your page has many mistakes."

Can anyone confirm whether xml:lang="" is valid or not? The XML 1.0 [6] says it's valid but I'm not sure if this applies to XHTML+RDFa. Is the last claim regarding the W3C validator reporting success on invalid markup true?

Steph.

Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Empty xml:lang attributes validation

Johannes Koch-2
Hi Stephane

Stephane Corlosquet schrieb:
> Can anyone confirm whether xml:lang="" is valid or not? The XML 1.0 [6] says
> it's valid but I'm not sure if this applies to XHTML+RDFa. Is the last claim
> regarding the W3C validator reporting success on invalid markup true?
[...]
> [6] http://www.w3.org/TR/REC-xml/#sec-lang-tag

Simple question, long answer (sorry, but sometimes life is not black or
white :-).

Indead, the cited text (<http://www.w3.org/TR/REC-xml/#sec-lang-tag>) says:
| in addition, the empty string may be specified.

and later:
| In particular, the empty value of xml:lang is used on an element B to
| override a specification of xml:lang on an enclosing element A,
| without specifying another language.


However...

For XHTML 1.0 (somewhere in
<http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Strict>):
| xml:lang    language code (as per XML 1.0 spec)

and
| xml:lang    %LanguageCode; #IMPLIED

with
| <!ENTITY % LanguageCode "NMTOKEN">


Looking up NMTOKEN in XML 1.0 (<http://www.w3.org/TR/REC-xml/#nmtok>):
| Values of type NMTOKEN  MUST  match the Nmtoken  production

and
(<http://www.w3.org/TR/REC-xml/#NT-Nmtoken>):
| [7]  Nmtoken  ::=  (NameChar)+

<http://www.w3.org/TR/REC-xml/#NT-NameChar>:
| [4a] NameCha  ::=  NameStartChar  | "-" | "." | [0-9] | #xB7 |
                      [#x0300-#x036F] | [#x203F-#x2040]

(<http://www.w3.org/TR/REC-xml/#NT-NameStartChar>):
| [4]  NameStartChar  ::=  ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
                            [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] |
                            [#x37F-#x1FFF] | [#x200C-#x200D] |
                            [#x2070-#x218F] | [#x2C00-#x2FEF] |
                            [#x3001-#xD7FF] | [#xF900-#xFDCF] |
                            [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

This indicates that (formally) an empty string is not a NMTOKEN and so
is no valid value for the xml:lang attribute as defined in the XHTML 1.0
Strict DTD.


For XHTML languages based on XHTML Modularization (10 April 2001
version), xml:lang is mentioned  in prose in
<http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/abstract_modules.html#s_commonatts>
| xml:lang (NMTOKEN)

and defined in the DTD module
(<http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/dtd_module_defs.html#a_module_XHTML_Common_Attribute_Definitions>)
| xml:lang     %LanguageCode.datatype;  #IMPLIED

with
(<http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/dtd_module_defs.html#dtdentry_LanguageCode.datatype>)
| <!ENTITY % LanguageCode.datatype "NMTOKEN" >

So, same result as for XHTML 1.0.


The revision (XHTML Modularization 1.1), mentions xml:lang in
<http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/abstract_modules.html#s_commonatts>
| xml:lang (CDATA)

and in the DTD module
(<http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/dtd_module_defs.html#a_module_XHTML_Common_Attribute_Definitions>)
| xml:lang     %LanguageCode.datatype;  #IMPLIED

with
(<http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/dtd_module_defs.html#a_module_XHTML_Datatypes>):
| <!ENTITY % LanguageCode.datatype "CDATA" >

The XML schema module references xml:lang in
<http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/schema_module_defs.html#a_module_XHTML_Datatypes>:
| <xs:attribute ref="xml:lang" />

from <http://www.w3.org/2001/xml.xsd>:

| The union allows for the 'un-declaration' of xml:lang with the empty
| string.
|
| Formal declaration in XSD source form
|
| <xs:attribute name="lang">
|  <xs:annotation>
|   <xs:documentation>
|    <div>
|
|      <h3>lang (as an attribute name)</h3>
|      <p>
|       denotes an attribute whose value
|       is a language code for the natural language of the content of
|       any element; its value is inherited.  This name is reserved
|       by virtue of its definition in the XML specification.</p>
|
|    </div>
|    <div>
|     <h4>Notes</h4>
|     <p>
|      Attempting to install the relevant ISO 2- and 3-letter
|      codes as the enumerated possible values is probably never
|      going to be a realistic possibility.
|     </p>
|     <p>
|      See BCP 47 at <a
| href="http://www.rfc-editor.org/rfc/bcp/bcp47.txt">
|       http://www.rfc-editor.org/rfc/bcp/bcp47.txt</a>
|      and the IANA language subtag registry at
|      <a
| href="http://www.iana.org/assignments/language-subtag-registry">
|       http://www.iana.org/assignments/language-subtag-registry</a>
|      for further information.
|     </p>
|     <p>
|      The union allows for the 'un-declaration' of xml:lang with
|      the empty string.
|     </p>
|    </div>
|   </xs:documentation>
|  </xs:annotation>
|  <xs:simpleType>
|   <xs:union memberTypes="xs:language">
|    <xs:simpleType>
|     <xs:restriction base="xs:string">
|      <xs:enumeration value=""/>
|     </xs:restriction>
|    </xs:simpleType>
|   </xs:union>
|  </xs:simpleType>
| </xs:attribute>

So, in languages based on XHTML Modularization 1.1, the empty string is
(formally) DTD-valid and XML-Schema-valid.


In the DTD for "XHTML 1.1 + RDFa"
(<http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd>):

| xml:lang     %LanguageCode.datatype;  #IMPLIED

with (<http://www.w3.org/MarkUp/DTD/xhtml-datatypes-1.mod>)

| <!ENTITY % LanguageCode.datatype "CDATA" >

So, in "XHTML 1.1 + RDFa" the empty string is (formally) DTD-valid.

--
Johannes Koch
In te domine speravi; non confundar in aeternum.
                             (Te Deum, 4th cent.)


Loading...