RE: Component Values Must Be Context Independent

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

RE: Component Values Must Be Context Independent

Jonathan Marsh-2

Thanks for your comment.  The WS Description Working Group tracked this issue as a CR022 [1].

 

I assume you, as the editor who implemented it, are aware of the resolution.

 

Unless you let us know otherwise by the end of October, we will assume you agree with the resolution of this issue.

 

[1] http://www.w3.org/2002/ws/desc/5/cr-issues/issues.html#CR022

 


From: [hidden email] [mailto:[hidden email]] On Behalf Of Arthur Ryman
Sent: Thursday, April 20, 2006 4:20 PM
To: [hidden email]
Subject: Component Values Must Be Context Independent

 


Components can be brought into a component model instance through <import> and <include>. For scalability purposes, it is highly desirable for the value of a component to be independent of the context that it was brought it.

The use case is a development tool for SOA applications that needs to support hundreds or thousands of services. The tool needs to validate the service definitions. The requirement is that the time to do this be linear. We are currently experiencing performance problems validating large sets of WSDL 1.1 documents. We need to have an spec-compliant optimization for WSDL 2.0.

Ideally, a tool should be able to compute the components directly defined in a document without looking at any of the imports or includes. There are two problems now that prevent this:

1. In theory, we allow extensions that could alter the semantics of imported or included components. However, there is no requirement or use case for this flexibility, much less a realistic, compelling one. Note that this is actually a real problem in XML Schema, e.g. due to "features" such as cameleon includes, and <redefine>, you need to know the context in which a document is included.

2. The current definition of component equivalence is recursive in the sense that to test if two components are equivalent, it is necessary to determine if all of the components they refer to are equivalent. In effect this means that you have to construct the entire component model instance in order to resolve the references to the other components.

Since WSDL documents typically include or import others, a collection of WSDL documents is likely to be moderately connected when viewed as a graph. Therefore, when you validate the collection, you end up processing a given document many times in general. You process it a number of times equal to the number of documents that refer to it directly or indirectly (+ 1). This is non-linear. The exact degree of non-linearity depends on how connected the graph is. Consider a simple chain of n WSDL documents.

A1 includes A2 includes A3 includes ... An

Validating A1 requires reading n documents.
Validating A2 requires reading n-1 documents.
...
Validating An requires reading 1 document.

Therefore validating the whole set of documents requires readiing n + (n-1) + ... + 1 = n(n+1)/2 = O(n^2), i.e. this is quadratic, not linear.

On the other hand, if the meaning if each document is independent of how it is used then a smart tool could cache the results and only read n documents.

The fix is as follows:

1. Add the following assertion. An extension MUST NOT affect the value of components that are added to the component model via <import> or <include>.
2. State the definition of component equivalence as follows. Two components are equivalent when:
        A) All of their child components are equivalent.
        B) All of their non-component properties are equal.
        C) All of their non-child component properties refer to components that have the same keys (e.g. names).
The difference is that to test for equivalence, you only have to look at a component's value-based properties and child components. You don't have to traverse the component graph, which might take you into another document. You only have to compare referred to components via their keys.

We then add a statement to each component explicitly stating what its key values are. This is straight-forward. We already implicitly defined keys when stating uniqueness rules, i.e. each Interface component in a Description component must have a unique {name}. The key is usually the {name} property. For Features and Properties, it is the {ref} property. The complete list is:

1. ElementDeclaration: {name}

2. TypeDefinition: {name}

3. Interface: {name}

4. InterfaceFault: {name}

5. InterfaceOperation: {name}

6. InterfaceMessageReference: {message label}

7. InterfaceFaultReference: {interface fault}.{name}. {message label}

8. Binding: {name}

9. BindingFault: {interfaceFault}.{name}

10. BindingOperation: {interfaceOperation}.{name}

11. BindingMessageReference: {interface message reference}.{message label}

12. BindingFaultReference: {interface fault reference}.{interface fault}.{name}, {interface fault reference}.{message label}

13 Service: {name}

14. Endpoint: {name}

15. Feature: {ref}

16. Property: {ref}

In general, any extension component that might be refered to needs to define a key value, since that is how the reference is represented in the XML serialization.

Arthur Ryman,
IBM Software Group, Rational Division

blog: http://ryman.eclipsedevelopersjournal.com/
phone: +1-905-413-3077, TL 969-3077
assistant: +1-905-413-2411, TL 969-2411
fax: +1-905-413-4920, TL 969-4920
mobile: +1-416-939-5063, text: [hidden email]