[Bug 24317] New: [xslt 3.0] Parallel splitting: the dynamic multi-coloured widgets problem

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 24317] New: [xslt 3.0] Parallel splitting: the dynamic multi-coloured widgets problem

Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=24317

            Bug ID: 24317
           Summary: [xslt 3.0] Parallel splitting: the dynamic
                    multi-coloured widgets problem
           Product: XPath / XQuery / XSLT
           Version: Last Call drafts
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XSLT 3.0
          Assignee: [hidden email]
          Reporter: [hidden email]
        QA Contact: [hidden email]

There is a streaming use case that we currently cannot handle, exemplified by
the following stylesheet:

<xsl:for-each-group select="/*/widget" group-by="colour"
   bind-group="g" bind-grouping-key="k">
  <xsl:result-document href="{$k}.xml">
   <widgets colour="{$k}">
     <xsl:sequence select="$g"/>
   </widgets>
  </xsl:result-document>
</xsl:for-each-group>

It's easy to see that this is in principle capable of executing with bounded
memory (or at worst, memory proportional to the number of colours encountered).

We can handle the case where the set of colours is known statically, by using
xsl:fork, but I don't think we have a solution for the case where the set of
colours is not known in advance.

One solution might simply be to deem xsl:for-each-group/@group-by to be
streamable (grounded) provided that the body of the instruction is an
xsl:result-document instruction.

Or of course, we might decide that this requirement is out of scope.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 24317] [xslt 3.0] Parallel splitting: the dynamic multi-coloured widgets problem

Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=24317

--- Comment #1 from Michael Kay <[hidden email]> ---
To avoid complications (multiple downward selections), I should modify the
example either to do group-by="@colour", or to do select="/*/widget/copy-of()".
This doesn't affect the underlying problem.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 24317] [xslt 3.0] Parallel splitting: the dynamic multi-coloured widgets problem

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=24317

--- Comment #2 from Michael Kay <[hidden email]> ---
I'm a bit inclined to make xsl:for-each-group/@group-by streamable under the
same conditions as xsl:for-each-group/@group-adjacent. This would be consistent
with our approach to distinct-values().

Otherwise there is no convenient way to do something like

<xsl:for-each-group select="*" group-by="colour"
  bind-group="g" bind-grouping-key="k">
  <a colour="{$k}" count="{count($g)}"/>
</xsl:for-each-group>

which is streamable for all practical purposes (it needs memory proportional to
the number of groups, which in many cases will be constant memory independent
of the size of the population).

In essence this is the same as xsl:fork, except the number of "prongs" in the
fork is not fixed statically by the stylesheet author, but instead is
one-per-group based on the number of distinct grouping key values found in the
data.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 24317] [xslt 3.0] Parallel splitting: the dynamic multi-coloured widgets problem

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=24317

C. M. Sperberg-McQueen <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #3 from C. M. Sperberg-McQueen <[hidden email]> ---
We discussed this at the Prague face to face.

The parallel with xsl:fork seems a strong one, but the requirement to buffer
what might be a great deal of output worried some WG members.  

Pavel Labath suggested possibly an attribute on for-each-group to signal a
request for streaming treatment; an analogy with 'treat as' for static typing
was detected.

There was a good deal of sentiment for accepting the proposal made here, but we
decided to sleep on it and come back to this bug before making a final
resolution.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 24317] [xslt 3.0] Parallel splitting: the dynamic multi-coloured widgets problem

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=24317

Abel Braaksma <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #4 from Abel Braaksma <[hidden email]> ---
I have some trouble understanding the implications mentioned in this bug report
without an example of input data. Is it possible to update with a (small) XML
source document?

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 24317] [xslt 3.0] Parallel splitting: the dynamic multi-coloured widgets problem

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=24317

--- Comment #5 from Michael Kay <[hidden email]> ---
In response to comment 4, a typical use case is to take an input document of
the form:

<widgets>
  <widget colour="red" affability="3.2"/>
  <widget colour="blue" affability="3.3"/>
  <widget colour="orange" affability="3.1"/>
  <widget colour="lilac" affability="3.5"/>
  <widget colour="red" affability="3.6"/>
  <widget colour="green" affability="3.1"/>
  <widget colour="green" affability="3.2"/>
  <widget colour="blue" affability="3.4"/>
</widgets>

where the set of colours is not known in advance, and produce output of the
form

<out>
  <group colour="red" count="2" average-affability="3.4"/>
  <group colour="blue" count="2" average-affability="3.3"/>
  <group colour="orange" count="1" average-affability="3.1"/>
  <group colour="lilac" count="1" average-affability="3.5"/>
  <group colour="green" count="2" average-affability="3.15"/>
</out>

A particular variant of the problem is where the output of processing each
group is a separate result document; in this case there is a streaming strategy
that needs no buffering (except I/O buffer space proportional to the number of
groups).

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 24317] [xslt 3.0] Parallel splitting: the dynamic multi-coloured widgets problem

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=24317

Michael Kay <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

--- Comment #6 from Michael Kay <[hidden email]> ---
The WG asked me to make a concrete proposal to meet this requirement.

I propose to use the construct:

<xsl:fork>
  <xsl:for-each-group select="/*/widget" group-by="colour">
    <xsl:result-document href="{current-grouping-key()}.xml">
      <widgets colour="{current-grouping-key()}">
        <xsl:sequence select="current-group()"/>
      </widgets>
    </xsl:result-document>
  </xsl:for-each-group>
</xsl:fork>

for this use case. The content model of xsl:fork is changed to allow either its
current content (xsl:sequence+), or a single xsl:for-each-group element;
wrapping xsl:for-each-group inside xsl:fork acts as a signal that the user
recognizes that buffering of output will be needed and accepts the
consequences.

The rules for streamability of xsl:for-each-group (19.4.8.19) rules 2 and 3
become:

2. If there is a group-by attribute and the xsl:for-each-group instruction is
not a child of xsl:fork, then roaming and free-ranging.
3. If there is a group-adjacent or group-by attribute that is not motionless,
then roaming and free-ranging.

The rules for streamability of xsl:fork (19.4.8.20) change to add a rule 0:

0. If the content of the xsl:fork instruction comprises a single
xsl:for-each-group instruction, then the posture and sweep of that instruction.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 24317] [xslt 3.0] Parallel splitting: the dynamic multi-coloured widgets problem

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=24317

--- Comment #7 from Michael Kay <[hidden email]> ---
Proposal accepted, but with a possible addition to be investigated: allow
xsl:for-each inside the xsl:fork as well as xsl:for-each-group.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 24317] [xslt 3.0] Parallel splitting: the dynamic multi-coloured widgets problem

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=24317

--- Comment #8 from Michael Kay <[hidden email]> ---
I'm going to push back on the idea of allowing xsl:for-each as a child
instruction. I don't think it achieves anything useful; I don't think the
streamability rules would be any different for an xsl:for-each appearing as a
child of xsl:fork than for any other xsl:for-each instruction.

I have applied the changes to allow xsl:for-each-group as a child of xsl:fork.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 24317] [xslt 3.0] Parallel splitting: the dynamic multi-coloured widgets problem

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=24317

--- Comment #9 from Abel Braaksma <[hidden email]> ---
Upon reconsidering my suggestion during the latests telcon on xsl:for-each and
xsl:apply-templates as a child of xsl:fork - while I do think it has _some_
merit in explicitly telling the processor "I want to do a dynamic fork" - I
agree with your conclusion.

So I'll pushback on my own suggestion as well ;). Let's not extend the syntax
further unless absolutely necessary. If someone really wants to fork on each
node, you can now use a workaround through xsl:for-each-group with a group-size
of one.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 24317] [xslt 3.0] Parallel splitting: the dynamic multi-coloured widgets problem

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=24317

Michael Kay <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #10 from Michael Kay <[hidden email]> ---
Looking at the discussion, we decided that we have reached consensus and can
mark this resolved.

--
You are receiving this mail because:
You are the QA Contact for the bug.