FF Priority experiment

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

FF Priority experiment

Greg Wilkins-3

Patrick,

As a h2 server developer, I am definitely reading your blog http://bitsup.blogspot.it/2015/01/http2-dependency-priorities-in-firefox.html

I thought I'd post some feedback here as even if it is too late for this draft I think it is good to capture the server side view of your experiment.   Sorry this is a bit long.... but hard stuff to summarise.

If I have understood your blog correctly, your strategy is to create 5 streams with priority frames that are not used for request/response streams, but are instead used as kind of fixed reference points against which you can set the relative priorities of all other streams on the connection.       I think that given the design of the priority mechanism that it is a sensible approach to try and looks like it will greatly simplify priority handling.

However,  I'd like to use your proposal to illustrate that the priority mechanism does not so great from the server side.

Essentially this approach is converting h2 relative priorities into absolute priorities.  But instead of being absolute for all browsers and all connections, these 5 absolute priorities will be defined for each and every connection from a FF-37 browser.    Let's assume that all other browsers adopt a similar but varied approach, then for every connection accepted by the server, it can expect to have approximately 5 streams created to be held for the entire life span of the connection, simply to define it's own absolute priorities.

For a large server, that maybe 50,000, or 500,000 or even 5,000,000 complex stream objects created and held for some time, just to mimic a simple enum of absolute priority.  Worse still, since they are held for a while, in java they will migrate from young generation, to old generation, so garbage collection of these objects is going to be problematic, requiring full stop the world GC's rather than any no-pause incremental approach.

If these 5 fixed priority points were defined in the protocol, then none of these objects and the frames to create them would be needed.   The wasted effort is such that we will probably have to look at some way of creating shared fake streams so that 5 objects can be shared by all connections.  Problem then is that we then need to come up with browser specific mappings for FF-37, FF-38, IE-xx, CR-yy etc. 

In short if it is simplest for the browser to reason about priority with several absolute priority points, why do we have a relative priority mechanism in the protocol?  Why not just define 5, 10 or 256 priorities and save the create of fake streams to fill up the servers memory.   It does not matter that these values can't be compared between connections, because they never are anyway!

Now after these base priorities have been established (either by fake streams or hopefully eventually by protocol mandate), the server will eventually see some request frames.  Let's say the first request is for a lower priority.   Having parsed the request, should the server wait to see if the next request is higher priority? No - server really wants to handle it ASAP, because it's cache is hot with the request headers, so ideally we are going to execute it immediately and in parallel  continue to parsing and handling subsequent requests.

We are certainly not going to queue it, just so we can parse some more frames looking for higher priority requests to handle, as that will just fill up our memory, add latency and let our caches get cold before handling the requests.

Now some time later, we will parse another request off the connection. Let's say this in a high priority request.  What are we to do with the lower priority request already being handled?  Suspend it?  block it by de prioritising it's output?     No. The  low priority request will already have resources allocated to it: buffers, thread, database connection, stack space, application locks etc. and the thread handling it will have a hot cache.    From the servers point of view, it will only harm total throughput if it de-prioritises the request as that will waste the resources and again let the cache get cold.    If enough low priority requests are de-prioritised, then we could have problems with database connection pool exhaustion etc.

More over, lets say the client has sent a batch of 20 requests in random priority order.    The absolutely last thing the server wants to do is to dispatch all 20 requests and let an output throttle (which is what the h2 priority mechanism is) determine which will progress.     This will result in a single connection having 20 threads from the same client, smearing themselves over all the servers cores and dirtying the caches of all the cores.  They will then contend for the same locks and data structures so false sharing and parallel slow down will result.

The ideal situation for a server is that a single thread/core can parse handle parse handle parse handle all the requests from a single connection in order.  This would allow a single core and hot cache to handle all the requests without any contention or false sharing.   The other cores on the sever should be busy handling other connections.  Of course the world is never ideal and some requests will take longer than others to handle and serve, so that out of order processing must be allowed to happen.   So servers are going to need to use multiple threads to handle the connection, but potentially in a small sub pool or with work stealing rather than mass dispatch.    Servers will be aiming to use <=4 threads to handle a connection, so they might be able to keep it on the same socket or CPU.   Thus of a 20 request batch, it may be that only 1 to 4 requests have actually been parsed and their priorities known.  In such a situation, the small high priority 20th request is rarely going to overtake the large low priority first request unless the server has lots of spare capacity to steal lots of work.

Perhaps instead of expecting the server to create an ordering of all requests in a batch, the client could itself refrain from sending requests as it parses HTML.  Instead it could collect the list itself, order them and then send a batch of requests in priority order.  Yes I know that means some extra latency, but if you want the server to order them then it is going to have to assemble the list anyway and that will involve latency and other issues.

One good feature of the 5 "fixed" node proposal is that it will allow priorities to work better for push.   The server will hopefully soon learn that the 20 requests are associated and thus be able to push them rather than wait for a batch of requests.   If every connection has the same 5 fixed nodes, then the push can use them to assign priorities to the push resources and push them in priority order.    Of course the problem is that if the "fixed" priority nodes are different from browser to browser, then the server will have to again look at synthetic absolute priorities and coming up with a per browser mapping back to their "fixed" priority nodes.

Anyway, enough rambling.    My objections are not specifically at your 5 node idea.  I think it is a reasonable way to use the proposed mechanism.  Rather I see your proposed usage as a example of how the current mechanism is not suitable for scalable servers.

Jetty is currently intending on ignoring priority.   We'd really like to give a client it's data in priority order, but typically we have many thousands of other clients to worry about and we are more concerned with fair share and maximum throughput between clients rather than micro managing the progress between requests from the same client. 

cheers









 








--
Greg Wilkins <[hidden email]>  @  Webtide - an Intalio subsidiary
http://eclipse.org/jetty HTTP, SPDY, Websocket server and client that scales
http://www.webtide.com  advice and support for jetty and cometd.
Reply | Threaded
Open this post in threaded view
|

Re: FF Priority experiment

Martin Thomson-3
On 8 January 2015 at 08:43, Greg Wilkins <[hidden email]> wrote:
> it can expect to have approximately 5 streams created to be held for the
> entire life span of the connection, simply to define it's own absolute
> priorities.

Why not hold two structures?  Separating active streams and stream
priorities seems fairly easy to do.  Active streams are strictly
bounded, whereas stream priorities need garbage collection and all
that mess.

And I don't think that anyone is going to expect a server to hold
responses.  The point of priority is to manage the case where all
those responses don't fit into the pipe.  That said, part of the point
of prioritization is to allow the server to take more responsibility
for performance at the client end.  If you want to do that, then at
least you have some information.

(Point taken about the complexity, btw.)

Reply | Threaded
Open this post in threaded view
|

Re: FF Priority experiment

Greg Wilkins-3

Martin,

well there are options to try to optimise the situation, but as you say I was generally trying to illustrate the complexity.

So to consider your suggestion of separating active from passive streams,  that is something that we could consider.      However initially all we will see is a priority frame and we will not know if that is going to be for a passive or an active stream, so we'd have to create a passive stream.  Then if it turns out to be an active stream, we will have to convert it from passive to active, so if we have more active than passive streams we may create more garbage and hit common stream maps more often etc.

I also understand that the intent of the mechanism is targeted at when the available resources exceed the pipe.   But if I've got 5 streams of a 20 request batch active and they are saturating the pipe, then the last thing I want to do is launch the 15 other requests to be handled on the chance that one of them might be higher priority than the 5 that have already saturated the pipe.   I may do so if my server is idle and I've got nothing else to do, but the chances are that those first 5 are all that the server will consider, so if the top priority is in the next 15, then tough luck for the client.

Building a scalable server is all about garbage collection, mechanical sympathy, scheduling strategies, resource management, lock contention (or lock free algorithms) and these are harder in http2 but the fewer connections gives the potential for a better outcome, so I think it will be worth it.   Adding browser resource priority to the long list of consideration is just a long way down the list of server priorities.

If we just had simple low/normal/high absolute priorities on streams, then they would be much easier to apply when a server has the situation of 5 active requests saturating the link.   Patricks proposal does help in this regard as once we learn the FF-37 pattern we would be able to apply his 5 basic priorities to the currently active requests.    The problem is that it is a per browser/connection pattern.   

If the browsers are simplifying the problem to few priority buckets and the servers can only really apply a few priority buckets, then it seams strange for the clients buckets to talk to the servers buckets via a weighted directed acyclic graph.

cheers





On 8 January 2015 at 18:02, Martin Thomson <[hidden email]> wrote:
On 8 January 2015 at 08:43, Greg Wilkins <[hidden email]> wrote:
> it can expect to have approximately 5 streams created to be held for the
> entire life span of the connection, simply to define it's own absolute
> priorities.

Why not hold two structures?  Separating active streams and stream
priorities seems fairly easy to do.  Active streams are strictly
bounded, whereas stream priorities need garbage collection and all
that mess.

And I don't think that anyone is going to expect a server to hold
responses.  The point of priority is to manage the case where all
those responses don't fit into the pipe.  That said, part of the point
of prioritization is to allow the server to take more responsibility
for performance at the client end.  If you want to do that, then at
least you have some information.

(Point taken about the complexity, btw.)



--
Greg Wilkins <[hidden email]>  @  Webtide - an Intalio subsidiary
http://eclipse.org/jetty HTTP, SPDY, Websocket server and client that scales
http://www.webtide.com  advice and support for jetty and cometd.
Reply | Threaded
Open this post in threaded view
|

Re: FF Priority experiment

Jason Greene
In reply to this post by Greg Wilkins-3
Hi Greg,

I had a similar conversation with Patrick over twitter (which I should have posted here instead, since this topic is not compatible with a character limit!) https://twitter.com/jtgreene/status/553047149449973760

For our server (and proxy) implementation we have been assuming (hoping) priority was more of a micro-optimization, to give a slight advantage to a set of streams. However, this article gave me the impression that clients will be expecting the server to provide the same benefits that come from only asking for what you need when you actually need it (and in the order you need it). In particular, the notion of a speculative stream was worrisome, as not only will the server be misallocating resources to low priority work, but to potentially unnecessary work as well.

The strategy we were originally planning, was to map the tree to numeric weights, and simply prioritize concurrent writes according to those weights at any given time. In practice though this will mean that there will still be lots of interleaving of low and high priority responses. A hypothetical example to consider is the simple static content file serving case, where you have 2 competing threads doing file read calls, one associated with a low priority stream, and the other with a high priority stream. The file system will be replying simultaneously (or at least near simultaneously), which means the resulting output will be interleaved, unless the server buffers the low priority thread, or it backs off/ suspends the task. The former has obvious memory limits that prevent it from being practical. The latter is possible but depending on the ratio of data frame size to content size, may or may not yield tangible benefits. A secondary issue with the latter is that the task is likely holding on to other server resources, minimally memory, but potentially other contended items like the connection pool example you used. Another potential problem, is that we support limiting concurrent executions separate from stream limitations, which is yet another way that low priority requests can execute over high priority requests.

Patrick’s reply seems to indicate this is expected (I’m sure he will correct me otherwise :). However, I am still worried that clients will have the wrong expectations here.

I am curious how other server vendors plan to handle priority.

> On Jan 8, 2015, at 10:43 AM, Greg Wilkins <[hidden email]> wrote:
>
>
> Patrick,
>
> As a h2 server developer, I am definitely reading your blog http://bitsup.blogspot.it/2015/01/http2-dependency-priorities-in-firefox.html
>
> I thought I'd post some feedback here as even if it is too late for this draft I think it is good to capture the server side view of your experiment.   Sorry this is a bit long.... but hard stuff to summarise.
>
> If I have understood your blog correctly, your strategy is to create 5 streams with priority frames that are not used for request/response streams, but are instead used as kind of fixed reference points against which you can set the relative priorities of all other streams on the connection.       I think that given the design of the priority mechanism that it is a sensible approach to try and looks like it will greatly simplify priority handling.
>
> However,  I'd like to use your proposal to illustrate that the priority mechanism does not so great from the server side.
>
> Essentially this approach is converting h2 relative priorities into absolute priorities.  But instead of being absolute for all browsers and all connections, these 5 absolute priorities will be defined for each and every connection from a FF-37 browser.    Let's assume that all other browsers adopt a similar but varied approach, then for every connection accepted by the server, it can expect to have approximately 5 streams created to be held for the entire life span of the connection, simply to define it's own absolute priorities.
>
> For a large server, that maybe 50,000, or 500,000 or even 5,000,000 complex stream objects created and held for some time, just to mimic a simple enum of absolute priority.  Worse still, since they are held for a while, in java they will migrate from young generation, to old generation, so garbage collection of these objects is going to be problematic, requiring full stop the world GC's rather than any no-pause incremental approach.
>
> If these 5 fixed priority points were defined in the protocol, then none of these objects and the frames to create them would be needed.   The wasted effort is such that we will probably have to look at some way of creating shared fake streams so that 5 objects can be shared by all connections.  Problem then is that we then need to come up with browser specific mappings for FF-37, FF-38, IE-xx, CR-yy etc.  
>
> In short if it is simplest for the browser to reason about priority with several absolute priority points, why do we have a relative priority mechanism in the protocol?  Why not just define 5, 10 or 256 priorities and save the create of fake streams to fill up the servers memory.   It does not matter that these values can't be compared between connections, because they never are anyway!
>
> Now after these base priorities have been established (either by fake streams or hopefully eventually by protocol mandate), the server will eventually see some request frames.  Let's say the first request is for a lower priority.   Having parsed the request, should the server wait to see if the next request is higher priority? No - server really wants to handle it ASAP, because it's cache is hot with the request headers, so ideally we are going to execute it immediately and in parallel  continue to parsing and handling subsequent requests.
>
> We are certainly not going to queue it, just so we can parse some more frames looking for higher priority requests to handle, as that will just fill up our memory, add latency and let our caches get cold before handling the requests.
>
> Now some time later, we will parse another request off the connection. Let's say this in a high priority request.  What are we to do with the lower priority request already being handled?  Suspend it?  block it by de prioritising it's output?     No. The  low priority request will already have resources allocated to it: buffers, thread, database connection, stack space, application locks etc. and the thread handling it will have a hot cache.    From the servers point of view, it will only harm total throughput if it de-prioritises the request as that will waste the resources and again let the cache get cold.    If enough low priority requests are de-prioritised, then we could have problems with database connection pool exhaustion etc.
>
> More over, lets say the client has sent a batch of 20 requests in random priority order.    The absolutely last thing the server wants to do is to dispatch all 20 requests and let an output throttle (which is what the h2 priority mechanism is) determine which will progress.     This will result in a single connection having 20 threads from the same client, smearing themselves over all the servers cores and dirtying the caches of all the cores.  They will then contend for the same locks and data structures so false sharing and parallel slow down will result.
>
> The ideal situation for a server is that a single thread/core can parse handle parse handle parse handle all the requests from a single connection in order.  This would allow a single core and hot cache to handle all the requests without any contention or false sharing.   The other cores on the sever should be busy handling other connections.  Of course the world is never ideal and some requests will take longer than others to handle and serve, so that out of order processing must be allowed to happen.   So servers are going to need to use multiple threads to handle the connection, but potentially in a small sub pool or with work stealing rather than mass dispatch.    Servers will be aiming to use <=4 threads to handle a connection, so they might be able to keep it on the same socket or CPU.   Thus of a 20 request batch, it may be that only 1 to 4 requests have actually been parsed and their priorities known.  In such a situation, the small high priority 20th request is rarely going to overtake the large low priority first request unless the server has lots of spare capacity to steal lots of work.
>
> Perhaps instead of expecting the server to create an ordering of all requests in a batch, the client could itself refrain from sending requests as it parses HTML.  Instead it could collect the list itself, order them and then send a batch of requests in priority order.  Yes I know that means some extra latency, but if you want the server to order them then it is going to have to assemble the list anyway and that will involve latency and other issues.
>
> One good feature of the 5 "fixed" node proposal is that it will allow priorities to work better for push.   The server will hopefully soon learn that the 20 requests are associated and thus be able to push them rather than wait for a batch of requests.   If every connection has the same 5 fixed nodes, then the push can use them to assign priorities to the push resources and push them in priority order.    Of course the problem is that if the "fixed" priority nodes are different from browser to browser, then the server will have to again look at synthetic absolute priorities and coming up with a per browser mapping back to their "fixed" priority nodes.
>
> Anyway, enough rambling.    My objections are not specifically at your 5 node idea.  I think it is a reasonable way to use the proposed mechanism.  Rather I see your proposed usage as a example of how the current mechanism is not suitable for scalable servers.
>
> Jetty is currently intending on ignoring priority.   We'd really like to give a client it's data in priority order, but typically we have many thousands of other clients to worry about and we are more concerned with fair share and maximum throughput between clients rather than micro managing the progress between requests from the same client.  
>
> cheers
>
>
>
>
>
>
>
>
>
>  
>
>
>
>
>
>
>
>
> --
> Greg Wilkins <[hidden email]>  @  Webtide - an Intalio subsidiary
> http://eclipse.org/jetty HTTP, SPDY, Websocket server and client that scales
> http://www.webtide.com  advice and support for jetty and cometd.

--
Jason T. Greene
WildFly Lead / JBoss EAP Platform Architect
JBoss, a division of Red Hat


Reply | Threaded
Open this post in threaded view
|

Re: FF Priority experiment

Patrick McManus-3
In reply to this post by Greg Wilkins-3
Hey Greg,

Before delving into the details of your post I wanted to clarify something that was I think generally misunderstood (the failure in communication no doubt being mine) about the post you mentioned. My closing thoughts tried to point out that a sender cannot implement a simplified version of priority by globally comparing weights across streams without also considering dependencies - It just doesn't do what you might think at first glance. It isn't immediately obvious to a couple of implementers I've worked with. I wasn't using that post advocating particular server implementations, I was just pointing out that dependency and weight aren't separable for the receiver.

On Thu, Jan 8, 2015 at 11:43 AM, Greg Wilkins <[hidden email]> wrote:

Essentially this approach is converting h2 relative priorities into absolute priorities. 

I'm not quite sure what you mean by that. H2 expresses priority via both dependencies and weights. The blog post uses them both.
 
But instead of being absolute for all browsers and all connections, these 5 absolute priorities will be defined for each and every connection from a FF-37 browser.    Let's assume that all other browsers adopt a similar but varied approach, then for every connection accepted by the server, it can expect to have approximately 5 streams created to be held for the entire life span of the connection, simply to define it's own absolute priorities.

there isn't any particular reason that a node in a dependency tree needs more than 5 bytes of information (plus pointers).
 

In short if it is simplest for the browser to reason about priority with several absolute priority points, why do we have a relative priority mechanism in the protocol?  Why not just define 5, 10 or 256 priorities and save the create of fake streams to fill up the servers memory.   It does not matter that these values can't be compared between connections, because they never are anyway!


I think you've oversimplified the design. It allows the client to express "Let group {L, F}, proceed in parallel with group U but within group {L, F} L should always take precedence over F." Whether that's crazy-good or crazy-bad is something time will tell - but it isn't accurate to map it to a small set of priority points afaict.

Anyhow, that's somewhat beside the point because the most convincing argument for an arbitrary dependency tree is the aggregator (not my use case from the blog) - or how to fairly combine multiple users. That's what sold me on it as a design. I'm just trying to meaningfully apply what we ended up with to what I know is a huge opportunity for responsiveness.

Fewer round trips in h2 will make it much more effective at utilizing available bandwidth and we'll see those gains quickly. But the priority mechanism is I think the real long term opportunity for responsive feel and its going to take multiple iterations from both servers and clients who want to invest in that process. The nice thing is I think we built a mechanism capable of expressing those iterations without per-defining them now. That will help us evolve. In the absence of the aggregation use case I would have wanted to do that by extension, but not supporting aggregation was just a bug that had to be solved.
 
We are certainly not going to queue it, just so we can parse some more frames looking for higher priority requests to handle, as that will just fill up our memory, add latency and let our caches get cold before handling the requests.

I agree - priority is about choosing which of N things to transmit when you have more than 1 ready to go. Nobody wants idle bandwidth. There is nothing wrong with replying to a lower priority request before a higher one if the higher priority response is not available. Is the text unclear on that? Obviously, you also shouldn't over-fill your socket buffers or you won't be able to react well when a higher priority item becomes available to transmit. (this is a variation on the necessity of using modest frame sizes and more broadly of bufferbloat(tm)).
 

Now some time later, we will parse another request off the connection. Let's say this in a high priority request.  What are we to do with the lower priority request already being handled?  Suspend it?  block it by de prioritising it's output? 

another option is buffering its output for at least a little while. This is what happens in H1 afterall, it just happens in parallel socket buffers there (which are eventually emptied without priority) but buffering is buffering.

The extreme case is the low priority request is a 4GB iso and the subsequent high priority request is for 30KB of html. H2 defines multiplexing and priority to deal with that and implementations ignore it at their peril.
 
The ideal situation for a server is that a single thread/core can parse handle parse handle parse handle all the requests from a single connection in order.

A server that implements that approach still probably generates output faster than the network can absorb it - that makes either a queue or backpressure. The priority information lets the server know how it can reorder the queue to give the end user the best experience. What the server does with that is a quality of implementation item.
 
Yes I know that means some extra latency, but if you want the server to order them then it is going to have to assemble the list anyway and that will involve latency and other issues.


I believe the best implementation does not add latency - it reorders the output queue and parallelizes/prioritizes execution of the transactions to the extent its quotas, priority information, and resources let it do so.
 

Reply | Threaded
Open this post in threaded view
|

Re: FF Priority experiment

Greg Wilkins-3


On 8 January 2015 at 22:07, Patrick McManus <[hidden email]> wrote:
Hey Greg,

Before delving into the details of your post I wanted to clarify something that was I think generally misunderstood (the failure in communication no doubt being mine).

I was over simplifying in my response,  but yes I understand that weights are only meaningful between siblings and not globally applicable.

I was really just using your blog as an example to expand on my vague thoughts of server side issues.



 
On Thu, Jan 8, 2015 at 11:43 AM, Greg Wilkins <[hidden email]> wrote:
Essentially this approach is converting h2 relative priorities into absolute priorities. 
I'm not quite sure what you mean by that. H2 expresses priority via both dependencies and weights. The blog post uses them both.

I think you are describing how you will use 5 fixed nodes in the tree to hang off your actual stream dependencies.  So I'm saying that those 5 nodes are essentially absolute priorities, expressed in both weights and relationships to each other. 

This is a good thing as it may allow a server to reverse engineer a simple picture.  In this case, if all the streams from FF are given priorities relative to the 5 nodes, then that is something we might be able to use.   We might have a simple mapping of those 5 nodes to priorities URGENT, HIGH, NORMAL, LOW, BACKGROUND and then use that simple linear rendering when picking between which streams to allow to use a limited transmit window.

The server doesn't really want a complete picture of priority.   All it really wants is a simple and cheap way of comparing the priorities of the few streams that have got through parsing and the application has produced some output.

Relating those few streams back to 5 fixed nodes is a good way of doing that, but it would be better if those 5 nodes were fixed for all connections not just 1 connection.


 
 there isn't any particular reason that a node in a dependency tree needs more than 5 bytes of information (plus pointers).


Theoretically no, but reality is rarely very close to theory.   We could store the dependency tree is a parallel data structure to the actual streams, but then we have problems with double work and atomic updates, memory barriers etc.   In java, if you want to write lock free algorithms then it is very difficult do deal with 2 objects instead of 1.   Or we could just create a fully capable stream object on the first frame.

My point is that in a server where every lock, every bad branch prediction, every map look up and every memory barrier can be multiplied by 100s of thousands, it is rarely just a matter of saying oh it is just 5 more nodes of 5 bytes each per connection.



I think you've oversimplified the design.

On purpose:)   The chances of the server being able to implement something simple is slim. something complex has got no chance.


The extreme case is the low priority request is a 4GB iso and the subsequent high priority request is for 30KB of html. H2 defines multiplexing and priority to deal with that and implementations ignore it at their peril.

That's not an extreme case.   I think of that as the simple case and a server definitely has to let the 30KB response overtake the 4GB response.   That is basic multiplexing and should happen regardless of priority!

The extreme case is 8 x 0.5GB low priority requests followed by high priority 30KB request.  Currently a server is committed to 6 fold parallelism per client, as that is the defacto connection limit of browsers.   The fact that h2 gives a multiplexed single connection does not mean that servers will be prepared to commit much more parallelism to an individual connection.     If after parsing and launching 8 requests that are more than saturating the connection, the server is justified in using it's memory and CPUs to process other connections rather than commit more resources in the case that a high priority request might be following.  It may not even bother parsing the 9th request until progress is made on the first 8 (or until it has idle threads that might work steal).

Basically a client that is rendering a HTML page and firing off h2 requests for associated resources as it parses them cannot expect the server to completely unmuddle any inverted priorities in the ordering of those requests.

I think the client is going to have to at least do some basic ordering if it expects a busy server to have a chance of applying priorities.
 

The ideal situation for a server is that a single thread/core can parse handle parse handle parse handle all the requests from a single connection in order.

A server that implements that approach still probably generates output faster than the network can absorb it - that makes either a queue or backpressure. The priority information lets the server know how it can reorder the queue to give the end user the best experience. What the server does with that is a quality of implementation item.

The point I'm making here is that a scalable server is going to use back pressure so that the queuing of requests is not necessarily all at the output stage where priorities can be applied.   In a batch of 20 requests you may have 5 queued to send output, 5 still in the application determining what the content is,  5 parsed but not yet dispatched to the application and 5 unparsed in the original IO buffer.     The server does not have a smorgasbord of 20 response ready streams to pick from in priority order.     In this case it has only 5 requests to pick from and only 15 that it actually knows the priority of.    The unknown 5 requests are going to have to wait, regardless of their priority) for other requests to complete or hope the server implements work stealing when idle.  It is pretty much the same for any high priority requests in the 5 request queued for the application.

Basically clients can't just throw batches of requests at the server in arbitrary priority order and expect the server to be able to unscramble the order just because they also gave a detailed high resolution picture of the relative priorities/weights of those requests.

Throttling low priority requests may be counter productive as they end up starving high priority requests of scarce resources of which bandwidth is only 1.


cheers











--
Greg Wilkins <[hidden email]>  @  Webtide - an Intalio subsidiary
http://eclipse.org/jetty HTTP, SPDY, Websocket server and client that scales
http://www.webtide.com  advice and support for jetty and cometd.