notation3.py language tags

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

notation3.py language tags

Andreas Radinger
Hi,

the N3 parser is not able to deal with language tags, as described by
BCP 47 [1].
Currently the parser only checks for well-formedness and not validity.
Therefore I suggest to apply the patch in the attachment which works in
the same way but that can handle special language tag constructs too [2].

Examples:
@zh-min-nan
@en-GB-boont-r-extended-sequence-x-private

General syntax for language tags:
language-extlang-script-region-variant-extension-privateuse


Patch for notation3.py (rev 1.201) from CVS tested with cwm 1.197 (from
cwm-1.2.1):

--- notation3.py.orig   2011-12-09 11:56:44.000000000 +0100
+++ notation3.py        2011-12-09 15:02:54.000000000 +0100
@@ -99,7 +99,7 @@
 number_syntax =
re.compile(r'(?P<integer>[-+]?[0-9]+)(?P<decimal>\.[0-9]+)?(?P<exponent>e[-+]?[0-9]+)?')
 digitstring = re.compile(r'[0-9]+')             # Unsigned integer
 interesting = re.compile(r'[\\\r\n\"]')
-langcode = re.compile(r'[a-zA-Z0-9]+(-[a-zA-Z0-9]+)?')
+langcode = re.compile(r'[a-zA-Z]+(-[a-zA-Z0-9]+){0,7}')
 #"


Best regards,
Andreas Radinger

[1] http://www.rfc-editor.org/rfc/bcp/bcp47.txt
[2] http://www.w3.org/International/articles/language-tags/

--
Dipl.-Ing. Andreas Radinger
Professur für Allgemeine BWL, insbesondere E-Business
e-business & web science research group
Universität der Bundeswehr München
 
e-mail: [hidden email]
www:    http://www.unibw.de/ebusiness/



notation3.py.patch (482 bytes) Download Attachment