Ext/Vers terminology with generic/xml split

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Ext/Vers terminology with generic/xml split

David Orchard

I've updated the terminology section including diagrams to do the generic/xml split. 

1.1 Terminology

The terminology for describing languages, namespaces, constraints, evolvability etc. follows. Let us consider an example. Two systems need to exchange name information. Names may not be the perfect choice of example because of internationalization reasons, but it resonates strongly with a very large audience. The Name language is created to be exchanged. [Definition: A producer is an agent that creates an instance. ][Definition: A Production is the creation of an instance.]. A producer produces an instance for the intent of conveying information. [Definition: A consumer is an agent that consumes an instance.][Definition: A Consumption is the processing of an instance of a language.] A consumer is impacted by the instance that it consumes. That is, it interprets that instance and bases future processing, in part, on the information that it believes was present in that instance. An instance can be consumed many times, by many consumers, and have many different impacts.

Generally, a language has one or more vocabularies that each may have multiple terms. Formally, [Definition: a language is an identifiable set of vocabulary terms with defined syntactic and semantic constraints. ] By language, we mean the set of text that are members of that language and used by a particular application. [Definition: A vocabulary is a set of terms]. The Name language consists of 3 terms: name, first, last. In order to identify the terms in the Name language in XML, a namespace is assigned to the terms. Other examples include the elements and attributes of XHTML 1.0 or the names of built-in functions in XPath 2.0. The Name language could consist of terms from other vocabularies, such as Dublin Core or UBL. These terms each have their own namespaces, illustrating that a language can comprise vocabularies from multiple namespaces.

The name language takes the 3 terms and specifies the constraints: that a name consists of a first and a last. [Definition: A language has a set of constraints that apply to the vocabulary terms in the language. ] These constraints can be defined in machine processable syntactic constraint languages such as XML Schema, human readable textual descriptions such as HTML descriptions, or are embodied in software. Languages may or may not be defined by a schema in any particular schema language. The constraints on a language will govern the membership of instances in the language, which may be considered the set of strings that are in the language.

In general, the intended meaning of a vocabulary term is scoped by the language in which the term is found. However, there is some expectation that terms drawn from a given vocabulary will have a consistent meaning across all languages in which they are used. Confusion often arises when terms have inconsistent meaning across language. The Name terms might be used in other languages, but it is generally expected that they will still be "the same" in some meaningful sense.

[Definition: Text is a specific, discrete sequence of characters]. Given that there are constraints on a language, any particular text may or may not have membership in a language. Indeed, a particular string of characters may be a member of many languages, and there may be many different strings of characters that are members of a given language. The text of the language are the units of exchange. Documents are texts of a language.

These terms and their relationships are shown below

There are many different systems for exchanging texts in languages, such as SQL, Java, XML, ECMAScript, C#. We will briefly describe some key refinements to our lexicon for XML. An XML language has a vocabulary that may use terms from one or more XML Namespaces (or none), each of which has a namespace name. [Definition: An XML language is an identifiable set of vocabulary terms with defined XML syntactic and semantic constraints. ] By XML language, we mean the set of elements and attributes, or instances, used by a particular application. The Name language - consisting of name, first, last - has a namespace is assigned to the terms. We use the prefix "namens" to refer to that namespace. The Name language could consist of terms from other vocabularies, such as Dublin Core or UBL. These terms each have their own namespaces, illustrating that a language can comprise vocabularies from multiple namespaces. An XML Namespace is a convenient container for collecting terms that are intended to be used together within a language or across languages. It provides a mechanism for creating globally unique names.

We shall use the term instance when speaking of sequences of characters (aka text) in XML. [Definition: An instance is a specific, discrete sequence of terms]. Documents are instances of a language. In XML, they must have a root element. A name document might be a name element as the root element. Alternatively, the name vocabulary may be used by a language such as purchase orders. The purchase order documents may contain name elements. Thus instances of a language are always part of a document and may be the entire document. XML instances (and all other instances of markup languages) consist of markup and content. In the name example, the first and last elements including the end markers are the markup. The values between the start and end markers are the content. An instance has an information model. There are a variety of data models within and without the W3C, and the one standardized by the W3C is the XML infoset.

The XML related terms and their relationships are shown below

A stylesheet processor is a consumer of the XML document that it is processing (the producer isn't mentioned); in the Web services context the roles of producer and consumer alternate as messages are passed back and forth.Note that most Web service specifications provide definitions of inputs and outputs. By our definitions, a Web service that updates its output schema is considered a new producer. A service that updates its input schema is a new consumer.

We now return to our discussion of languages in general. Extensibility is a property that enables evolvability of software. It is perhaps the biggest contributor to loose coupling in systems as it enables the independent and potentially compatible evolution of languages. Languages are defined to be [Definition: Extensible if instances of the language can include terms from other vocabularies.]. The name language is extensible if it can include terms from other vocabularies, like a new middle term.

_______________________________________________________________________
Notice:  This email message, together with any attachments, may contain
information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated
entities,  that may be confidential,  proprietary,  copyrighted  and/or
legally privileged, and is intended solely for the use of the individual
or entity named in this message. If you are not the intended recipient,
and have received this message in error, please immediately return this
by email and then delete it.

ext-vers-uml.violet (23K) Download Attachment
ext-vers-xml-uml.png (17K) Download Attachment
ext-vers-generic-uml.png (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ext/Vers terminology with generic/xml split

Hoylen Sue


"David Orchard" <[hidden email]> writes:
> I've updated the terminology section including diagrams to do the
> generic/xml split.

I like the improvements, and wonder if it is valuable to
separate the XML and Name examples even further?  So the
text starts off with generic definitions, followed by
discussions in the context of XML, followed by the Name
example.

---

> [Definition: A producer is an agent...
> [Definition: A consumer is an agent...

These are "is a" relationships.  So "producer" and "consumer" should
be modelled as subtypes of "agent" in the UML diagram.

Then "production" would appear as a method in the Producer class, and
"consume" a method in the Consumer class.  Right now, the UML seems
a bit strange, because processes appear as classes.

The "membership" class seems odd too, since it is a property
rather than an acutal object.  I think it should be
represented as a line between Instance and Language.

---

> Generally, a language has one or more vocabularies that each may
> have multiple terms.

The UML diagram shows a cardinality of zero or more (not
"one or more").

Change the "*" to "1..*".

---

> Formally, [Definition: a language is an identifiable set of
> vocabulary terms...

This text is correct, but the UML diagram does not represent
what the text says.

You want to allow for languages which uses a proper subset of terms
from a vocabulary (i.e. not all the terms from a vocabulary).

The UML diagram shows that a language is made up of vocabularies, and
those vocabularies are made up of terms.  The UML diagram implies that
if a language uses a vocabulary, it (potentially) uses all the terms
from it -- there is no way to identify specific terms from the
vocablary which are included/excluded from a particular language.

I think the arrow should go from Language to Term, not
from Language to Vocabulary.

---

> In general, the intended meaning of a vocabulary term is scoped by
> the language in which the term is found. However, there is some
> expectation that terms drawn from a given vocabulary will have a
> consistent meaning across all languages in which they are
> used. Confusion often arises when terms have inconsistent meaning
> across language. The Name terms might be used in other languages,
> but it is generally expected that they will still be "the same" in
> some meaningful sense.

Readers, who mistakenly believe that one XML namespace has only one
corresponding XML Schema, might find this text reinforcing their
beliefs (they could easily interpret "in general" to mean "in all cases").

How about rewording it as:

  It is recommended practice that ...(above text)...
  However, it is permitted for different languages
  to associate different constraints to the same term.

---

> [Definition: Text is a specific, discrete sequence of
> characters].

Perhaps "text" should be "data" to be more friendly to
instances which are not normally considered as text
(e.g. binary)?  I don't feel strongly either way about this.

---

> [Definition: An instance is a specific, discrete sequence of terms].

This definition should also include content as part of an instance.
As it stands, it sounds like only the terms (i.e. markup) makes up an
instance.

Are you using the word "discrete" to mean "completely separate and
unconnected" or in the mathematical sense of being "finite"?  If the
later, I suggest using the word "finite" to be more precise.

---

In the UML diagrams, add cardinalities to the link from Language to
Constraint.

Language has zero or more constraints.  Zero sounds a bit
strange, but I guess it should be permitted.

---

Will the other terms that are in the current Draft TAG Finding
(namely: "backward compatiable", "forward compatiable",
"closed system" and "open system") be in the new
Terminology section too?

Hoylen
--
____________________________________________________________
[hidden email]                     http://www.hoylen.com/