Simple way to include definitions of all XML entities

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Simple way to include definitions of all XML entities

Jirka Kosek
Hi David,

I'm writing you because your are maintainer of

http://www.w3.org/2003/entities/

As more and more markup languages are not defined in the terms of DTD,
but they use modern schema languages like RELAX NG or W3C XML Schema,
there is known problem with entity definitions. Such entities are
usually defined as a part of DTD and if you are using DTD everything is
OK. However if you don't want rely on DTD you can just include entity
definitions alone, like:

<!DOCTYPE article [
  <!ENTITY % isoamsb PUBLIC
          "ISO 8879:1986//ENTITIES Added Math Symbols: Binary
Operators//EN//XML"
          "http://www.w3.org/2003/entities/iso8879/isoamsb.ent"
        >
        %isoamsb;
]>

Problem is that if you want to include more entity files then one, your
internal set will get quite large.

Do you think that it will be possible to create "master" entity
definition file as part of XML Entity Declarations project? This file
should include all other entity definitions files (or there can be
separate "master" file for each set like XHTML, MathML, ...) so
referencing this file from your internal subset will make all entites
available to document author.

The need for such "master" entity file came from discussion about
DocBook V5.0 combined with MathML.

If you think that it will be possible to extend current entity
definition with this new feature I will be happy to provide your further
input and assistance.

TIA,

                                Jirka

--
------------------------------------------------------------------
   Jirka Kosek     e-mail: [hidden email]     http://www.kosek.cz
------------------------------------------------------------------
   Profesionální školení a poradenství v oblasti technologií XML.
      Podívejte se na náš nově spuštěný web http://DocBook.cz
        Podrobný přehled školení http://xmlguru.cz/skoleni/
------------------------------------------------------------------
Nejbližší termíny školení: XML schémata (včetně RELAX NG) 7.-9.11.
          *** DocBook 5.-7.12. *** XSL-FO 19.-20.12. ***
------------------------------------------------------------------


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Simple way to include definitions of all XML entities

David Carlisle

Hi,

Yes I need to update those files, for unicode 4.x and other things, so
your question is well timed.

Would you want the "master" file for each group of entities to include
the others by external enity reference, much as (say) the XSLT2 map
files are set up with the ISO 8879 set having  iso8879map.xsl which
xsl:includes isoamsamap.xsl and friends. or would you want the "master"
file to have copies of the entity definitions.

Essentially anything along those lines is possible, as you may have seen
the entities are all derived from the unicode.xml file, and how they are
grouped and split into different files is "just a bit of xsl".

If for example docbook wanted some custom set of entities that was
larger than html but smaller than mathml, but consistent with both (so
far as that is possible:-) It would be easy to derive such a set by
adding some annotations to unicode.xml and then cranking the handle...

David

________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Reply | Threaded
Open this post in threaded view
|

Re: Simple way to include definitions of all XML entities

Jirka Kosek
David Carlisle wrote:

> Would you want the "master" file for each group of entities to include
> the others by external enity reference, much as (say) the XSLT2 map
> files are set up with the ISO 8879 set having  iso8879map.xsl which
> xsl:includes isoamsamap.xsl and friends. or would you want the "master"
> file to have copies of the entity definitions.

Actually this doesn't matter. But as you autogenerate entity files it
might be better to create "master" file with copies of all entity
definitions instead of just references. Imagine someone who doesn't use
XML catalogs -- he will benefit from getting all entity definitions in a
single HTTP response from the W3C web-server.

> Essentially anything along those lines is possible, as you may have seen
> the entities are all derived from the unicode.xml file, and how they are
> grouped and split into different files is "just a bit of xsl".

Interesting, you even have TeX mappings. Great piece of work!

> If for example docbook wanted some custom set of entities that was
> larger than html but smaller than mathml, but consistent with both (so
> far as that is possible:-) It would be easy to derive such a set by
> adding some annotations to unicode.xml and then cranking the handle...

I did some research in this area and if the following XPath 2.0
expression isn't broken, I think that there are no incosistencies in
mapping entity names to Unicode characters (STIX entities are excluded).

distinct-values(//character//entity/@id[. != ''][../@set != 'STIX'][ . =
preceding::character/entity/@id[../@set != 'STIX']])

Because it is hard to predict which entites user want to use -- DocBook
uses ISO 8879 entities, but some people want to use MathML combined with
DocBook which leads to entities from ISO 8879 + ISO 9573 + MathML
extensions. It is also possible to mix XHTML (table model and forms)
with DocBook, so someone might cut'n'paste XHTML entity into DocBook file.

Because of this I think that what is really needed here is a real union
of all entity definitions (8879 + 9573(2003) + MathML + XHTML). If I
didn't make error in the XPath expression above, this combined entity
set shouldn't contain any ambiguities.

So for use with DocBook (and probably with other vocabularies) it would
be great if there will be only one entity definition file with all
entities defined in the one place.

I can imagine that other users might benefit also from merged entity
definitions just for MathML, ISO 8879 or XHTML.

Thanks,

                                Jirka

--
------------------------------------------------------------------
   Jirka Kosek     e-mail: [hidden email]     http://www.kosek.cz
------------------------------------------------------------------
   Profesionální školení a poradenství v oblasti technologií XML.
      Podívejte se na náš nově spuštěný web http://DocBook.cz
        Podrobný přehled školení http://xmlguru.cz/skoleni/
------------------------------------------------------------------
Nejbližší termíny školení: XML schémata (včetně RELAX NG) 7.-9.11.
          *** DocBook 5.-7.12. *** XSL-FO 19.-20.12. ***
------------------------------------------------------------------


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Simple way to include definitions of all XML entities

David Carlisle


> Interesting, you even have TeX mappings. Great piece of work!

we (or rather, Sebastian) _started_ with TeX mappings, then SGML
mappings, then finally added some support for this young upstart
XML:-).


> I can imagine that other users might benefit also from merged entity
> definitions just for MathML, ISO 8879 or XHTML.


I'll see if I can steal some time in the next week or so to

(a) update
the master file with some extra character definitions for unicode 4,
some of which were added solely to support these entity sets
eg
0237;LATIN SMALL LETTER DOTLESS J;Ll;0;L;;;;;N;;;;;
or perhaps
1D6A5;MATHEMATICAL ITALIC SMALL DOTLESS J;Ll;0;L;<font> 0237;;;;N;;;;;

and

(b)
generate some "master" combined sets.

David


________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Reply | Threaded
Open this post in threaded view
|

Re: Simple way to include definitions of all XML entities

Jirka Kosek
David Carlisle wrote:

> I'll see if I can steal some time in the next week or so to
 >
> (b)
> generate some "master" combined sets.

Thank you very much. I'm looking forward to new combined sets.

--
------------------------------------------------------------------
   Jirka Kosek     e-mail: [hidden email]     http://www.kosek.cz
------------------------------------------------------------------
   Profesionální školení a poradenství v oblasti technologií XML.
      Podívejte se na náš nově spuštěný web http://DocBook.cz
        Podrobný přehled školení http://xmlguru.cz/skoleni/
------------------------------------------------------------------
Nejbližší termíny školení: XML schémata (včetně RELAX NG) 7.-9.11.
          *** DocBook 5.-7.12. *** XSL-FO 19.-20.12. ***
------------------------------------------------------------------


smime.p7s (4K) Download Attachment