[Bug 25174] New: Buffering with xsl:try wrapped around xsl:stream or xls:result-document

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 25174] New: Buffering with xsl:try wrapped around xsl:stream or xls:result-document

Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25174

            Bug ID: 25174
           Summary: Buffering with xsl:try wrapped around xsl:stream or
                    xls:result-document
           Product: XPath / XQuery / XSLT
           Version: Last Call drafts
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XSLT 3.0
          Assignee: [hidden email]
          Reporter: [hidden email]
        QA Contact: [hidden email]

This bug report is a result of [1] and the response from the working group in
[2]. The relevant quotes from the latter (minutes of 13 March 2014) are:

Key example (edited to be valid):
<xsl:try>
    <xsl:stream href="foo.xml">
        <xsl:apply-templates mode="streaming"/>
    </xsl:stream>
    <xsl:catch />
</xsl:try>


Key issue:
xsl:try requires stable roll-back in case an error occurs that is caught by
xsl:catch, even if that error is right at the beginning, i.e. when trying to
read the source document.

Quotes from the minutes:
"ABr: but if you only want to catch a failure in opening the document, you've
incurred the buffering cost for no benefit. And there's nowhere else to put the
xsl:try in this case."

"MK: yes, it would be nice to recover from a failure opening the input, without
having to buffer all the output."

"MK: this would suggest solution (3), an on-error output, but not sure how we
would define the semantics. Basically, catching errors that occur before any
output is written."

Suggested solutions (mark #3)
(1) disallow non-motionless expressions in xsl:try, which forces the programmer
to do a copy prior to the xsl:try
(2) define an extra attribute on xsl:output to make rollback behavior on
xsl:try optional
(3) make errors on xsl:stream uncatchable, or catchable only by a special
attribute "on-error"
(4) disallow xsl:stream inside xsl:try


-------------------------------------------------------

The issue with xsl:try wrapped around xsl:stream effectively prevents (output)
streaming, because the whole output is required to be buffered. This is not a
problem if the result set is small, but if it is not, it will blow up
streamability.

The same issue occurs when xsl:try is wrapped around xsl:result-document, with
one difference, the processor is not required to leave the result document in a
stable state, i.e., it is possible to start writing output to a result document
and _not_ rollback in case an error is raised. But this is not ideal either, as
the user will be left with an uncertain state.

Suggestion #3
The suggested resolution #3 above introduces a new attribute on xsl:stream,
on-error, which could take an expression. The special variables defined under
[3], like err:code and err:description are available inside this expression.
Example:

<xsl:stream href="foo.xml" on-error="my:report($err:code)">
    <xsl:apply-templates mode="streaming"/>
</xsl:stream>

There is one additional drawback here, however. The special variables in the
err: namespace are currently lexically scoped (see [3]), which means you will
have to pass each error variable. A solution to this is to (also) allow the
errors to be available as a map (as in $err:bag or $err:map), which gives
advantages in this scenario and related scenarios, or to allow the special
variables to be dynamically scoped, comparable to current-group. In the latter
case you can write:

<xsl:template match="/">
    <xsl:stream href="foo.xml" on-error="my:report-errors()">
        <xsl:apply-templates mode="streaming"/>
    </xsl:stream>
</xsl:template>

<xsl:function name="my:report-errors">
   <xsl:message select="$err:description" />
</xsl:function>

If we decide, however, to keep the current scope and variables for err:xxx, but
we adopt static AVTs from [4], there is another way out for programmers to
write this more effectively:

<xsl:variable name="errorhandler"
    select="'my:report-errors($err:code, $err:description, $err:value)'"
    static="yes" />

<xsl:template match="/">
    <xsl:stream href="foo.xml" on-error="{$errorhandler}">
        <xsl:apply-templates mode="streaming"/>
    </xsl:stream>
</xsl:template>

<xsl:function name="my:report-errors">
   <xsl:param name="errcode" />
   ....
   <xsl:message select="$errcode" />
   ....
</xsl:function>

IMO it is hard not to get enthusiastic about static AVTs, it seems to open up a
whole new level of abstraction through preprocessing macros, that can greatly
reduce many typical use-cases (but that's another subject, again, see [4]).

Semantics for on-error: it will only catch errors that occur prior to starting
reading the document, perhaps up until the root node, which is in line with
current rules (somewhere we say that buffering of DTD and opening comments etc
is required in streaming). Other errors ought to be caught the normal way,
using more fine-grained xsl:try/xsl:catch.

I propose to adopt this for xsl:stream and xsl:result-document, in the latter
only to catch errors occurring from first attempt to writing the result
document.

Recovery actions: when on-error is defined and called, we might introduce a
return value true/false that determines whether further processing should take
place or not, or add one more attribute: on-error-terminate="yes|no". We may
also decide on whether the result of on-error becomes part of the current
result tree or not.

See also: bug 25173.


[1] https://lists.w3.org/Archives/Member/w3c-xsl-wg/2014Mar/0012.html
[2] https://lists.w3.org/Archives/Member/w3c-xsl-wg/2014Mar/0014.html
[3] https://www.w3.org/TR/xslt-30/html/Overview.html#element-try
[4] https://www.w3.org/Bugs/Public/show_bug.cgi?id=24619

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 25174] Buffering with xsl:try wrapped around xsl:stream or xsl:result-document

Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25174

--- Comment #1 from Abel Braaksma <[hidden email]> ---
Following discussion during the telcon of April 10, 2014, the WG asked me to
come up with a proposal. At that telcon, Michael Kay suggested to allow
xsl:catch inside an xsl:stream instruction.

I think that proposal has merit, because it uses existing syntax and has less
side effect issues as the proposal in this bugreport (which adds an attribute
on-error on xsl:stream)

Here's an outline of the xsl:catch proposal:

Allow the xsl:catch instruction to appear as a child of xsl:stream. It must be
the last child element (similar to xsl:try), except for xsl:fallback.

It will have an absent focus for both the sequence constructor and the the
select attribute (reasoning: we try to catch an error on reading the input
stream, so there will be no node to process).

If this element is present, then, prior to processing the sequence constructor
in xsl:stream, the processor must attempt to read and buffer the streamed input
document up until the start of the root element (i.e., it must perform the same
action that would be required for has-children(root()), as described in bug
25173). If it fails, the appropriate error is raised, which can then be caught
by the xsl:catch element.

If the error is not caught, it will bubble up and can be caught by a previous
xsl:try/xsl:catch construct and the xsl:stream instruction is not evaluated
further.

If the error is caught, the sequence constructor of xsl:catch is processed and
the rest of the sequence constructor of xsl:stream will be ignored. No rollback
is necessary, because the body of xsl:stream has not yet been processed.

If no error is raised, the xsl:catch instruction is ignored and processing
continues as normal.

Notes:
Note 1: this is different from using fn:streaming-document-available, which
will not throw an error but simply returns true or false, but has the potential
side effect that upon a subsequent read in an xsl:stream instruction using the
same URI, it may still raise an error.

Note 2: this instruction cannot be used to catch errors raised by the body of
the xsl:stream instruction, it will only catch errors resulting from the
initial reading of the streamed input document, i.e. when it fails to construct
a streamed document node.

Note 3: this instruction is defined to allow graceful degradation without
having to buffer the full result of streamed processing in case of a failure to
read the input document. If you do want to catch errors during streamed
processing, you can wrap the body of an xsl:stream element inside a regular
xsl:try/catch, but this will incur the penalty that the processor will be
required to buffer all output, which may be detrimental in certain streaming
scenarios.

Note 4: because xsl:catch has absent focus, its sweep and posture are
motionless and grounded.

Example:

<xsl:stream href="http://example.org/{$docname}.xml">
    <xsl:value-of select="count(//news)" />
    <xsl:catch errors="err:FODC0005">
       <xsl:text>Invalid docname specified.</xsl:text>
    </xsl:catch>
</xsl:stream>

As an aside: it came to my attention that we do not currently specify the error
conditions for xsl:stream in regards to the input document. I assume they are
the same as for fn:doc?

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 25174] Buffering with xsl:try wrapped around xsl:stream or xsl:result-document

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25174

--- Comment #2 from Michael Kay <[hidden email]> ---
I wonder if this can't be done simply by defining an error code that is
guaranteed to be used exclusively for errors encountered at the start of the
streaming operation, and then catching this specific error with a conventional
try/catch instruction around the xsl:stream? I.e. no new syntax, just new
semantics for a specific error code?

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 25174] Buffering with xsl:try wrapped around xsl:stream or xsl:result-document

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25174

--- Comment #3 from Abel Braaksma <[hidden email]> ---
Re comment 2:

I thought of it, but I think we run into even more trouble, consider:

<xsl:try>
   <xsl:stream href="good.uri">
      <xsl:apply-templates select="x" />
   </xsl:stream>
   <xsl:catch ="err:Special">
      <xsl:message select="'Rolled back'" />
   </xsl:catch>
</xsl:try>

<xsl:template match="x">
   <xsl:try>
      <xsl:value-of select="@y mod @z" />
      <xsl:stream href="{@baduri}">
         <xsl:apply-templates />
      </xsl:stream>
      <xsl:catch select="*">
         <xsl:message select="'What is rolled back?'" />
      </xsl:catch>
   </xsl:try>
</xsl:template>

The second try/catch must somehow differentiate between
1) buffering and rollback current context in case of div by zero
2) poking @baduri for I/O and rolling back without applying templates
3) buffering apply templates and rolling back
4) if halfway streaming getting I/O error, rolling back apply templates

Even though implementations might be able to do this, I don't think it is good
to have one construct follow different semantics by one the same syntax and
different rollback behavior with potentially the same (I/O) error.

Which is why I presently prefer the (slightly) different syntax (xsl:catch as
child of xsl:stream) where the position and focus of xsl:catch makes it
unambiguous what is going on and what is going to be caught and/or rolled back.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 25174] Buffering with xsl:try wrapped around xsl:stream or xsl:result-document

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25174
Bug 25174 depends on bug 25173, which changed state.

Bug 25173 Summary: Test whether a streaming document is available through fn:streaming-document-available()
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25173

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 25174] [XSLT30] Buffering with xsl:try wrapped around xsl:stream or xsl:result-document

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25174

--- Comment #4 from Michael Kay <[hidden email]> ---
In discussion today we came up with the following approach.

We add an attribute recoverable=yes|no to xsl:result-document (and to
xsl:output for the principal result document). The semantics of this attribute
are that if recoverable=yes is specified, then any output written to the result
document during the course of an xsl:try must be "undoable" if an error occurs
during the xsl:try and is caught. (The implementation for undoing the bad
writes might either use a rollback/checkpoint mechanism, or a
buffering/delayed-write mechanism).

But if recoverable="no" is specified, then the following happens: the processor
is allowed to write output to the result document optimistically, and if a
failure occurs at a point where it has written output that needs to be undone,
then despite the fact that the error was caught, the xsl:result-document
instruction itself fails saying in effect that the contents of the
result-document are incorrect and unrecoverable. A try/catch around the
xsl:result-document instruction can catch this error, and determine that the
transformation should continue despite one of its result documents being
unusable.

If recoverable="no" is specified, then the user can still manually prevent
problems by doing local buffering of output using an explicit xsl:variable to
hold temporary results; the potential for try/catch to cause result document
corruptions occurs only when in final output state.

Users can also reduce the risk of xsl:stream causing problems by calling
stream-available() to test whether a stream is available before starting to
process it. There's still a risk of an I/O error on the stream later, and the
choice of whether to buffer output to make this I/O error recoverable is now
made at the level of the xsl:result-document instruction.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 25174] [XSLT30] Buffering with xsl:try wrapped around xsl:stream or xsl:result-document

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25174

--- Comment #5 from Michael Kay <[hidden email]> ---
We could in addition allow recoverable="yes" on xsl:try in the case where the
result document is non-recoverable to indicate that a section of code is
recoverable (i.e buffering is required) even though the document as a whole is
not. (This could also be achieved by using an xsl:variable around that section
of code, but there was a feeling that this approach was too non-obvious).

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 25174] [XSLT30] Buffering with xsl:try wrapped around xsl:stream or xsl:result-document

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25174

Michael Kay <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

--- Comment #6 from Michael Kay <[hidden email]> ---
I have written the solution of comment #4 into the spec for the working group
to review.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 25174] [XSLT30] Buffering with xsl:try wrapped around xsl:stream or xsl:result-document

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25174

--- Comment #7 from Michael Kay <[hidden email]> ---
At the telcon on 14 August we studied the new text, and the question was raised
again of whether to put the new attribute on xsl:result-document, xsl:try, or
both.

On reflection I'm inclined to put it on xsl:try. This has the advantage,
originally pointed out in comment 5, that it becomes more visible what the
recovery units are, and where buffering might be required to enable recovery.

If we were to add the attribute on xsl:result-document as well then I would
propose that this merely acts as a default for the value on xsl:try. However,
since the association of an xsl:result-document instruction to an xsl:try
instruction is dynamic, this adds another piece of dynamic context information
which I think we could well do without. Also, putting the attribute on
xsl:result-document makes it messy to define an equivalent for the principal
result tree: xsl:output is not really the right place as it's all about
serialization. So my proposal is to have it on xsl:try only.

The text of section 24.3 is largely still applicable, though it now becoomes
logical to move it to 8.3.2.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 25174] [XSLT30] Buffering with xsl:try wrapped around xsl:stream or xsl:result-document

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25174

Michael Kay <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #8 from Michael Kay <[hidden email]> ---
The WG accepted the proposal in comment #7.

There was some feeling that the attribute name "recoverable" could be improved.
People would prefer something that linked it to the recoverability of the final
result tree output, for example recover-output or rollback-output were
suggested. It was left to the editor to ponder upon.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 25174] [XSLT30] Buffering with xsl:try wrapped around xsl:stream or xsl:result-document

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25174

Michael Kay <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED

--- Comment #9 from Michael Kay <[hidden email]> ---
The changes have been applied.

--
You are receiving this mail because:
You are the QA Contact for the bug.