ANN: a tool that makes it easier to extract information out of XML Schemas

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

ANN: a tool that makes it easier to extract information out of XML Schemas

Costello, Roger L.
Hi Folks,

I created a tool that makes it easier for you to extract information
out of XML Schemas.

Here is the tool:

http://www.xfront.com/XML-Schema-Tool-for-Easy-Information-Extraction/index.html 

Motivation for the tool:

Here are a few examples of queries that I've needed to perform
on schemas in the past:

- What are all the elements and attributes that are declared
  to be of type xs:QName (or xs:string, or xs:gYear, etc.)?

- For simpleType A, what are its applicable facets? (Take
  into account the facets in all its ancestor simpleTypes)

- How many element declarations are in the schema? How many
  complexType definitions? simpleTypes? attributes?

- How many lines of schema code are there?

With my tool it is easy to get answers to those questions.

Without this tool, it can be difficult to get the info you desire
from XML Schemas. Here are a few reasons for the difficulty:

1. The schema may be scattered over multiple files. So you have
   to search through multiple files to find the info you want.

2. A simpleType may be part of a long chain of restrictions. And the
   simpleTypes may be scattered over multiple files. That
   makes it difficult to know exactly what is the net value space
   for the simpleType.

3. Likewise a complexType may be part of a long chain of derive-by-
   extensions and derive-by-restrictions. And the complexTypes
   may be scattered over multiple files. That makes it difficult
   to know exactly what is the final set of elements and attributes
   in a complexType.

4. An element may be substituted. So, many different elements may
   be possible at a certain point in a schema.

5. Consider an element declaration with a type attribute. The type
   definition could be located in many places: in the document that
   the element declaration is located in, in a document that it
   includes or imports, or one that they include or import. It
   could be in the document that included the document that
   contains the element declaration. And many more places. Ouch!

6. The elements and attributes in a no-namespace schema are
   part of one namespace when they are included by a schema with
   targetNamespace A and another namespace when they are included
   by a schema with targetNamespace B.

/Roger

Reply | Threaded
Open this post in threaded view
|

Re: ANN: a tool that makes it easier to extract information out of XML Schemas

Michael Kay
Did you look at Saxon's SCM file format? This outputs the schema in the
form of an XML representation of the schema component model. This will
for example give you the expanded content model of a complex type that
is derived by extension. It seems to me that if you want to present
schema information in a processable form, the SCM is the right model to use.

Michael Kay
Saxonica

On 19/05/2012 19:42, Costello, Roger L. wrote:

> Hi Folks,
>
> I created a tool that makes it easier for you to extract information
> out of XML Schemas.
>
> Here is the tool:
>
> http://www.xfront.com/XML-Schema-Tool-for-Easy-Information-Extraction/index.html
>
> Motivation for the tool:
>
> Here are a few examples of queries that I've needed to perform
> on schemas in the past:
>
> - What are all the elements and attributes that are declared
>    to be of type xs:QName (or xs:string, or xs:gYear, etc.)?
>
> - For simpleType A, what are its applicable facets? (Take
>    into account the facets in all its ancestor simpleTypes)
>
> - How many element declarations are in the schema? How many
>    complexType definitions? simpleTypes? attributes?
>
> - How many lines of schema code are there?
>
> With my tool it is easy to get answers to those questions.
>
> Without this tool, it can be difficult to get the info you desire
> from XML Schemas. Here are a few reasons for the difficulty:
>
> 1. The schema may be scattered over multiple files. So you have
>     to search through multiple files to find the info you want.
>
> 2. A simpleType may be part of a long chain of restrictions. And the
>     simpleTypes may be scattered over multiple files. That
>     makes it difficult to know exactly what is the net value space
>     for the simpleType.
>
> 3. Likewise a complexType may be part of a long chain of derive-by-
>     extensions and derive-by-restrictions. And the complexTypes
>     may be scattered over multiple files. That makes it difficult
>     to know exactly what is the final set of elements and attributes
>     in a complexType.
>
> 4. An element may be substituted. So, many different elements may
>     be possible at a certain point in a schema.
>
> 5. Consider an element declaration with a type attribute. The type
>     definition could be located in many places: in the document that
>     the element declaration is located in, in a document that it
>     includes or imports, or one that they include or import. It
>     could be in the document that included the document that
>     contains the element declaration. And many more places. Ouch!
>
> 6. The elements and attributes in a no-namespace schema are
>     part of one namespace when they are included by a schema with
>     targetNamespace A and another namespace when they are included
>     by a schema with targetNamespace B.
>
> /Roger
>
>