multiplexing -- don't do it

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
215 messages Options
1234 ... 11
Reply | Threaded
Open this post in threaded view
|

multiplexing -- don't do it

Peter Lepeska
I'm new to this list but have been studying web performance over high latency networks for many years and multiplexing seems to me like the wrong way to go. The main benefit of multiplexing is to work around the 6 connections per domain limit but it reduces transparency on the network, decreases the granularity/modularity of load balancing and increases object processing latency in general on the back end as everything has to pass through the same multiplexer, and introduces its own intractable inefficiencies. In particular the handling of a low priority in flight object ahead of a high priority object when packet loss is present is a step backwards from what we have today for sites that get beyond the 6 connections per domain limit via domain sharding. Why not just introduce an option in HTTP 2.0 that allows clients and servers to negotiate max concurrent connections per domain? When web sites shard domains, aren't they essentially telling the browser that they will happily accept lots more connections? I'm sure this suggestion has long since been shot  down but browsing around on the web I'm not finding it.

As for header compression, again this is a trade-off between transparency/multiple streams and bandwidth savings. But I'd think this group could come up with ways to reduce the bytes in the protocol (including cookies) without requiring the use of a single compression history, resulting in an order-sensitive multiplexed stream.

Thanks,

Peter


On Thu, Mar 29, 2012 at 9:26 AM, Mike Belshe <[hidden email]> wrote:
I thought the goal was to figure out HTTP/2.0; I hope that the goals of SPDY are in-line with the goals of HTTP/2.0, and that ultimately SPDY just goes away.

Mike


On Thu, Mar 29, 2012 at 2:22 PM, Willy Tarreau <[hidden email]> wrote:
Hello,

after seeing all the disagreements that were expressed on the list these
days (including from me) about what feature from SPDY we'd like to have
mandatory or not in HTTP, I'm thinking that part of the issue comes from
the fact that there are a number of different usages of HTTP right now,
all of them fairly legitimate.

First I think that everyone here agrees that something needs to be done
to improve end user experience especially in the mobile networks. And
this is reflected by all proposals, including the http-ng draft from
14 years ago!

Second, the privacy issues are a mess because we try to address a social
problem by technical means. It's impossible to decide on a protocol if
we all give an example of what we'd like to protect and what we'd prefer
not to protect because it is useless and possibly counter-productive.

And precisely, some of the disagreement comes from the fact that we're
trying to see these impacts on the infrastructure we know today, which
would obviously be a total breakage. As PHK said it, a number of sites
will not want to afford crypto for privacy. I too know some sites which
would significantly increase their operating costs by doing so. But
what we're designing is not for now but for tomorrow.

What I think is that anyway we need a smooth upgrade path from current
HTTP/1.1 infrastructure and what will constitute the web tomorrow without
making any bigbang.

SPDY specifically addresses issues observed between the browser and the
server-side infrastructure. Some of its mandatory features are probably
not desirable past the server-side frontend *right now* (basically
whatever addresses latency and privacy concerns). Still, it would be
too bad not to make the server side infrastructure benefit from a good
lifting by progressively migrating from 1.1 to 2.0.

What does this mean ? Simply that we have to consider HTTP/2.0 as a
subset of SPDY or that SPDY should be an add-on to HTTP. And that
makes a lot of sense. First, SPDY already is an optimized messaging
alternative to HTTP. It carries HTTP/1.1, it can as well carry HTTP/2.0
since we're supposed to maintain compatible semantics.

We could then get to a point where :
 - an http:// scheme indicates a connection to HTTP/1.x or 2.x server
 - an https:// scheme indicates a connection to HTTP/1.x or 2.x server
   via an SSL/TLS layer
 - a spdy:// scheme indicates a connection to HTTP/1.x or 2.x server
   via a SPDY layer

By having HTTP/2.0 upgradable from 1.1, this split is natural :

       +----------------------------+
       |       Application          |
       +----+-----------------------+
       | WS |     HTTP/2.0          |
       +----+--------------+        |
       |      HTTP/1.1     |        |
       |         +-----+---+--------+
       |         | TLS | SPDY       |
       +---------+-----+------------+   server-side
           ^        ^        ^
           |        |        |
           |        |        |
           |        |        |
       +---------+-----+------------+  user-agent
       |         | TLS | SPDY       |
       |         +-----+-------+----+
       |  HTTP/1.1, 2.0        |    |
       +-------------------+---+    |
       |                   |   WS   |
       |  Applications     +--------+
       |                            |
       +----------------------------+

The upgrade path would then be much easier :

 1) have browsers, intermediaries and servers progressively
    adopt HTTP/2.0 and support a seamless upgrade

 2) have browsers, some intermediaries and some servers
    progressively adopt SPDY for the front-line

 3) have a lot of web sites offer URLs as spdy:// instead of http://,
    and implement mandatory redirects from http:// to spdy:// like a
    few sites are currently doing (eg: twitter)

 4) have browsers at some point use the SPDY as the default scheme
    for any domain name typed on the URL bar.

 5) have browsers at one point disable by default transparent support
    for the old http:// scheme (eg: put a warning or have to tweak
    some settings for this). This will probably 10-20 years from now.

Before we get to point 5, we'd have a number of sites running on the
new protocol, with an efficient HTTP/2.0 deployed at many places
including the backoffice, and with SPDY used by web browsers for
improved performance/privacy. That will not prevent specific agents
from still only using a simpler HTTP/2.0 for some uses.

So I think that what we should do is to distinguish between what is
really desirable to have in HTTP and what is contentious. Everything
which increases costs or causes trouble for *some* use cases should
not be mandatory in HTTP but would be in the SPDY layer (as it is
today BTW).

I think that the current SPDY+HTTP mix has shown that the two protocols
are complementary and can be efficient together. Still we can significantly
improve HTTP to make both benefit from this, starting with the backoffice
infrastructure where most of the requests lie.

Willy




Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Brian Pane-2
On Thu, Mar 29, 2012 at 7:07 PM, Peter L <[hidden email]> wrote:

> I'm new to this list but have been studying web performance over high
> latency networks for many years and multiplexing seems to me like the wrong
> way to go. The main benefit of multiplexing is to work around the 6
> connections per domain limit but it reduces transparency on the network,
> decreases the granularity/modularity of load balancing and increases object
> processing latency in general on the back end as everything has to pass
> through the same multiplexer, and introduces its own intractable
> inefficiencies. In particular the handling of a low priority in flight
> object ahead of a high priority object when packet loss is present is a step
> backwards from what we have today for sites that get beyond the 6
> connections per domain limit via domain sharding. Why not just introduce an
> option in HTTP 2.0 that allows clients and servers to negotiate max
> concurrent connections per domain? When web sites shard domains, aren't they
> essentially telling the browser that they will happily accept lots more
> connections? I'm sure this suggestion has long since been shot  down but
> browsing around on the web I'm not finding it.

There are a couple of practical problems that happen upon increasing
the number of concurrent connections:

- With N connections, the server or proxy has to be prepared to allocate
  N * receive_window_size bytes of memory for incoming packets.  Large
  values of N thus have a disadvantage for people operating high-traffic
  sites.

- Congestion control happens independently for each of N connections.
  While I'm not a proponent of artificially throttling HTTP to compensate
  for "bufferbloat," multiplexing N HTTP streams over one TCP connection
  does make for easier congestion control (especially in low-end client
  devices) than running N TCP connections.

- If the client and server have to negotiate a value for N, it may add
  an additional round trip.

Brian

Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Ross Nicoll-2
In reply to this post by Peter Lepeska
I would consider the congestion/window benefits that Brian raised the main benefits, personally. To address a couple of the issues you've raised:

Transparency; whatever tools you're using to monitor network traffic will already be pulling individual connections out of a mixture of packets. It does mean those tools will need to pull apart another layer to be able to see the individual content being sent, but I'm not imagining this to be a significant hurdle. I would also point out that there's discussion about HTTP 2.0 being SSL only (and I believe "encouraged to be SSL where at all possible" is the main alternative), and that would impact network transparency a lot more than multiplexing!

Load balancing; do you have cases where individual user load requires balancing? I'm not aware of any common cases where high web server load is caused by individual users, but typically by a large number of users in aggregate. Multiplexing would mean each user's requests would likely go to a single server, but users should still be easily re-distributable across a cluster of servers.

Increased object processing latency; sorry, I'm not sure why this would be the case?

Ross


On 30/03/2012 03:07, Peter L wrote:
I'm new to this list but have been studying web performance over high latency networks for many years and multiplexing seems to me like the wrong way to go. The main benefit of multiplexing is to work around the 6 connections per domain limit but it reduces transparency on the network, decreases the granularity/modularity of load balancing and increases object processing latency in general on the back end as everything has to pass through the same multiplexer, and introduces its own intractable inefficiencies. In particular the handling of a low priority in flight object ahead of a high priority object when packet loss is present is a step backwards from what we have today for sites that get beyond the 6 connections per domain limit via domain sharding. Why not just introduce an option in HTTP 2.0 that allows clients and servers to negotiate max concurrent connections per domain? When web sites shard domains, aren't they essentially telling the browser that they will happily accept lots more connections? I'm sure this suggestion has long since been shot  down but browsing around on the web I'm not finding it.

As for header compression, again this is a trade-off between transparency/multiple streams and bandwidth savings. But I'd think this group could come up with ways to reduce the bytes in the protocol (including cookies) without requiring the use of a single compression history, resulting in an order-sensitive multiplexed stream.

Thanks,

Peter


On Thu, Mar 29, 2012 at 9:26 AM, Mike Belshe <[hidden email]> wrote:
I thought the goal was to figure out HTTP/2.0; I hope that the goals of SPDY are in-line with the goals of HTTP/2.0, and that ultimately SPDY just goes away.

Mike


On Thu, Mar 29, 2012 at 2:22 PM, Willy Tarreau <[hidden email]> wrote:
Hello,

after seeing all the disagreements that were expressed on the list these
days (including from me) about what feature from SPDY we'd like to have
mandatory or not in HTTP, I'm thinking that part of the issue comes from
the fact that there are a number of different usages of HTTP right now,
all of them fairly legitimate.

First I think that everyone here agrees that something needs to be done
to improve end user experience especially in the mobile networks. And
this is reflected by all proposals, including the http-ng draft from
14 years ago!

Second, the privacy issues are a mess because we try to address a social
problem by technical means. It's impossible to decide on a protocol if
we all give an example of what we'd like to protect and what we'd prefer
not to protect because it is useless and possibly counter-productive.

And precisely, some of the disagreement comes from the fact that we're
trying to see these impacts on the infrastructure we know today, which
would obviously be a total breakage. As PHK said it, a number of sites
will not want to afford crypto for privacy. I too know some sites which
would significantly increase their operating costs by doing so. But
what we're designing is not for now but for tomorrow.

What I think is that anyway we need a smooth upgrade path from current
HTTP/1.1 infrastructure and what will constitute the web tomorrow without
making any bigbang.

SPDY specifically addresses issues observed between the browser and the
server-side infrastructure. Some of its mandatory features are probably
not desirable past the server-side frontend *right now* (basically
whatever addresses latency and privacy concerns). Still, it would be
too bad not to make the server side infrastructure benefit from a good
lifting by progressively migrating from 1.1 to 2.0.

What does this mean ? Simply that we have to consider HTTP/2.0 as a
subset of SPDY or that SPDY should be an add-on to HTTP. And that
makes a lot of sense. First, SPDY already is an optimized messaging
alternative to HTTP. It carries HTTP/1.1, it can as well carry HTTP/2.0
since we're supposed to maintain compatible semantics.

We could then get to a point where :
 - an <a class="moz-txt-link-freetext" href="http://">http:// scheme indicates a connection to HTTP/1.x or 2.x server
 - an https:// scheme indicates a connection to HTTP/1.x or 2.x server
   via an SSL/TLS layer
 - a spdy:// scheme indicates a connection to HTTP/1.x or 2.x server
   via a SPDY layer

By having HTTP/2.0 upgradable from 1.1, this split is natural :

       +----------------------------+
       |       Application          |
       +----+-----------------------+
       | WS |     HTTP/2.0          |
       +----+--------------+        |
       |      HTTP/1.1     |        |
       |         +-----+---+--------+
       |         | TLS | SPDY       |
       +---------+-----+------------+   server-side
           ^        ^        ^
           |        |        |
           |        |        |
           |        |        |
       +---------+-----+------------+  user-agent
       |         | TLS | SPDY       |
       |         +-----+-------+----+
       |  HTTP/1.1, 2.0        |    |
       +-------------------+---+    |
       |                   |   WS   |
       |  Applications     +--------+
       |                            |
       +----------------------------+

The upgrade path would then be much easier :

 1) have browsers, intermediaries and servers progressively
    adopt HTTP/2.0 and support a seamless upgrade

 2) have browsers, some intermediaries and some servers
    progressively adopt SPDY for the front-line

 3) have a lot of web sites offer URLs as spdy:// instead of <a class="moz-txt-link-freetext" href="http://">http://,
    and implement mandatory redirects from <a class="moz-txt-link-freetext" href="http://">http:// to spdy:// like a
    few sites are currently doing (eg: twitter)

 4) have browsers at some point use the SPDY as the default scheme
    for any domain name typed on the URL bar.

 5) have browsers at one point disable by default transparent support
    for the old <a class="moz-txt-link-freetext" href="http://">http:// scheme (eg: put a warning or have to tweak
    some settings for this). This will probably 10-20 years from now.

Before we get to point 5, we'd have a number of sites running on the
new protocol, with an efficient HTTP/2.0 deployed at many places
including the backoffice, and with SPDY used by web browsers for
improved performance/privacy. That will not prevent specific agents
from still only using a simpler HTTP/2.0 for some uses.

So I think that what we should do is to distinguish between what is
really desirable to have in HTTP and what is contentious. Everything
which increases costs or causes trouble for *some* use cases should
not be mandatory in HTTP but would be in the SPDY layer (as it is
today BTW).

I think that the current SPDY+HTTP mix has shown that the two protocols
are complementary and can be efficient together. Still we can significantly
improve HTTP to make both benefit from this, starting with the backoffice
infrastructure where most of the requests lie.

Willy





Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Mike Belshe
In reply to this post by Peter Lepeska


On Fri, Mar 30, 2012 at 4:07 AM, Peter L <[hidden email]> wrote:
I'm new to this list but have been studying web performance over high latency networks for many years and multiplexing seems to me like the wrong way to go. The main benefit of multiplexing is to work around the 6 connections per domain limit but it reduces transparency on the network, decreases the granularity/modularity of load balancing and increases object processing latency in general on the back end as everything has to pass through the same multiplexer, and introduces its own intractable inefficiencies.

The CPU processing at the server is one thing we could optimize for.  Or we could optimize for user's getting their pages faster.

Data suggests that your claims of inefficiency are simply incorrect.  But if you have a benchmark to report upon, we could discuss that.


 
In particular the handling of a low priority in flight object ahead of a high priority object when packet loss is present is a step backwards from what we have today for sites that get beyond the 6 connections per domain limit via domain sharding. Why not just introduce an option in HTTP 2.0 that allows clients and servers to negotiate max concurrent connections per domain?

As you can see from data, websites are not having any trouble getting around the 6 connection limit already.  

We could do this, but it would do nothing to make pages load faster or be lighter weight on the network.

 
When web sites shard domains, aren't they essentially telling the browser that they will happily accept lots more connections? I'm sure this suggestion has long since been shot  down but browsing around on the web I'm not finding it.

As for header compression, again this is a trade-off between transparency/multiple streams and bandwidth savings. But I'd think this group could come up with ways to reduce the bytes in the protocol (including cookies) without requiring the use of a single compression history, resulting in an order-sensitive multiplexed stream.

I'm not sure why you are opposed to compression.  We could reduce the bytes as well, and nobody is against that.

What is "transparency on the wire"?  You mean an ascii protocol that you can read?  I don't think this is a very interesting goal, as most people don't look at the wire.  Further, if we make it a secure protocol, its a moot point, since the wire is clearly not human readable.

mike




 

Thanks,

Peter


On Thu, Mar 29, 2012 at 9:26 AM, Mike Belshe <[hidden email]> wrote:
I thought the goal was to figure out HTTP/2.0; I hope that the goals of SPDY are in-line with the goals of HTTP/2.0, and that ultimately SPDY just goes away.

Mike


On Thu, Mar 29, 2012 at 2:22 PM, Willy Tarreau <[hidden email]> wrote:
Hello,

after seeing all the disagreements that were expressed on the list these
days (including from me) about what feature from SPDY we'd like to have
mandatory or not in HTTP, I'm thinking that part of the issue comes from
the fact that there are a number of different usages of HTTP right now,
all of them fairly legitimate.

First I think that everyone here agrees that something needs to be done
to improve end user experience especially in the mobile networks. And
this is reflected by all proposals, including the http-ng draft from
14 years ago!

Second, the privacy issues are a mess because we try to address a social
problem by technical means. It's impossible to decide on a protocol if
we all give an example of what we'd like to protect and what we'd prefer
not to protect because it is useless and possibly counter-productive.

And precisely, some of the disagreement comes from the fact that we're
trying to see these impacts on the infrastructure we know today, which
would obviously be a total breakage. As PHK said it, a number of sites
will not want to afford crypto for privacy. I too know some sites which
would significantly increase their operating costs by doing so. But
what we're designing is not for now but for tomorrow.

What I think is that anyway we need a smooth upgrade path from current
HTTP/1.1 infrastructure and what will constitute the web tomorrow without
making any bigbang.

SPDY specifically addresses issues observed between the browser and the
server-side infrastructure. Some of its mandatory features are probably
not desirable past the server-side frontend *right now* (basically
whatever addresses latency and privacy concerns). Still, it would be
too bad not to make the server side infrastructure benefit from a good
lifting by progressively migrating from 1.1 to 2.0.

What does this mean ? Simply that we have to consider HTTP/2.0 as a
subset of SPDY or that SPDY should be an add-on to HTTP. And that
makes a lot of sense. First, SPDY already is an optimized messaging
alternative to HTTP. It carries HTTP/1.1, it can as well carry HTTP/2.0
since we're supposed to maintain compatible semantics.

We could then get to a point where :
 - an http:// scheme indicates a connection to HTTP/1.x or 2.x server
 - an https:// scheme indicates a connection to HTTP/1.x or 2.x server
   via an SSL/TLS layer
 - a spdy:// scheme indicates a connection to HTTP/1.x or 2.x server
   via a SPDY layer

By having HTTP/2.0 upgradable from 1.1, this split is natural :

       +----------------------------+
       |       Application          |
       +----+-----------------------+
       | WS |     HTTP/2.0          |
       +----+--------------+        |
       |      HTTP/1.1     |        |
       |         +-----+---+--------+
       |         | TLS | SPDY       |
       +---------+-----+------------+   server-side
           ^        ^        ^
           |        |        |
           |        |        |
           |        |        |
       +---------+-----+------------+  user-agent
       |         | TLS | SPDY       |
       |         +-----+-------+----+
       |  HTTP/1.1, 2.0        |    |
       +-------------------+---+    |
       |                   |   WS   |
       |  Applications     +--------+
       |                            |
       +----------------------------+

The upgrade path would then be much easier :

 1) have browsers, intermediaries and servers progressively
    adopt HTTP/2.0 and support a seamless upgrade

 2) have browsers, some intermediaries and some servers
    progressively adopt SPDY for the front-line

 3) have a lot of web sites offer URLs as spdy:// instead of http://,
    and implement mandatory redirects from http:// to spdy:// like a
    few sites are currently doing (eg: twitter)

 4) have browsers at some point use the SPDY as the default scheme
    for any domain name typed on the URL bar.

 5) have browsers at one point disable by default transparent support
    for the old http:// scheme (eg: put a warning or have to tweak
    some settings for this). This will probably 10-20 years from now.

Before we get to point 5, we'd have a number of sites running on the
new protocol, with an efficient HTTP/2.0 deployed at many places
including the backoffice, and with SPDY used by web browsers for
improved performance/privacy. That will not prevent specific agents
from still only using a simpler HTTP/2.0 for some uses.

So I think that what we should do is to distinguish between what is
really desirable to have in HTTP and what is contentious. Everything
which increases costs or causes trouble for *some* use cases should
not be mandatory in HTTP but would be in the SPDY layer (as it is
today BTW).

I think that the current SPDY+HTTP mix has shown that the two protocols
are complementary and can be efficient together. Still we can significantly
improve HTTP to make both benefit from this, starting with the backoffice
infrastructure where most of the requests lie.

Willy





Reply | Threaded
Open this post in threaded view
|

RE: multiplexing -- don't do it

Peter Lepeska
In reply to this post by Ross Nicoll-2

Responding to Ross and Brian's posts mainly here...

 

I agree that increasing concurrent connections will increase the burden on web servers and that is a serious issue for sure but since so many sites are already working around the 6 per domain limit via sharding, most site owners are willing to accept higher numbers of TCP connections if it results in faster page loads. Prevalence of domain sharding is a kind of vote in the direction of increasing the per domain limit.

 

Transparency:

·         SPDY compresses HTTP headers using an LZ history based algorithm, which means that previous bytes are used to compress subsequent bytes. So any packet capture that does not include all the traffic sent over that connection will be completely opaque -- no mathematical way to decode the HTTP. Even with all the traffic, a stream decoder will be a tricky thing to build b/c packets depend on each other.

·         Loss of transparency impacts intermediary devices (reverse proxies, caches, layer 7 switches, load balancers) as much as it does packet capture analysis. For load balancing, multiplexing requires maintaining state from one request to the next so individual object requests from a given user will need to be handled by the same de-multiplexing server. In general, increasing session orientation reduces the scalability of the overall service. Also, failover is less graceful as a load balancer will want to be more sure that the previously used server is in fact unavailable before routing to a new server.

·         SSL kills transparency at the network level completely but also I think that SSL should be considered as an orthogonal thing to performance. So that site owners can make a decision based on the cost, security, performance tradeoffs of going to all encrypted traffic. So while I agree it's related, it seems like we have to consider these things independently.

 

Increased Object Processing Latency:

·         Multiplexing requires that objects are encoded serially -- encode (Object1), encode (Object2), encode (Object3) -- and then decoded in that same order. On a multi-core server, the three objects arrive truly concurrently, but due to multiplexing Object2 and Object3 will need to wait while Object1 is encoded. For SPDY, that encode step involves running an LZ-type coding function including searching the recent bytes for matches so even on an unloaded server this can add ~milliseconds of latency.

·         Multiplexing creates the need for session state. Access to this state needs to be synchronized, thread synchronization reduces parallelism and so impacts server scalability and per object latency.

·         CPU gains are increasingly achieved by adding cores and not making existing cores go faster. So processes that can run concurrently are friendly to these advances (such as increasing concurrent TCP connections) and multiplexing goes in the opposite direction -- requiring thread synchronization and so increasing serialization, and context switching.

 

Thanks,

 

Peter

 

 

From: J Ross Nicoll [mailto:[hidden email]]
Sent: Friday, March 30, 2012 5:47 AM
To: Peter L
Cc: [hidden email]
Subject: Re: multiplexing -- don't do it

 

I would consider the congestion/window benefits that Brian raised the main benefits, personally. To address a couple of the issues you've raised:

Transparency; whatever tools you're using to monitor network traffic will already be pulling individual connections out of a mixture of packets. It does mean those tools will need to pull apart another layer to be able to see the individual content being sent, but I'm not imagining this to be a significant hurdle. I would also point out that there's discussion about HTTP 2.0 being SSL only (and I believe "encouraged to be SSL where at all possible" is the main alternative), and that would impact network transparency a lot more than multiplexing!

Load balancing; do you have cases where individual user load requires balancing? I'm not aware of any common cases where high web server load is caused by individual users, but typically by a large number of users in aggregate. Multiplexing would mean each user's requests would likely go to a single server, but users should still be easily re-distributable across a cluster of servers.

Increased object processing latency; sorry, I'm not sure why this would be the case?

Ross


On 30/03/2012 03:07, Peter L wrote:

I'm new to this list but have been studying web performance over high latency networks for many years and multiplexing seems to me like the wrong way to go. The main benefit of multiplexing is to work around the 6 connections per domain limit but it reduces transparency on the network, decreases the granularity/modularity of load balancing and increases object processing latency in general on the back end as everything has to pass through the same multiplexer, and introduces its own intractable inefficiencies. In particular the handling of a low priority in flight object ahead of a high priority object when packet loss is present is a step backwards from what we have today for sites that get beyond the 6 connections per domain limit via domain sharding. Why not just introduce an option in HTTP 2.0 that allows clients and servers to negotiate max concurrent connections per domain? When web sites shard domains, aren't they essentially telling the browser that they will happily accept lots more connections? I'm sure this suggestion has long since been shot  down but browsing around on the web I'm not finding it.

 

As for header compression, again this is a trade-off between transparency/multiple streams and bandwidth savings. But I'd think this group could come up with ways to reduce the bytes in the protocol (including cookies) without requiring the use of a single compression history, resulting in an order-sensitive multiplexed stream.

 

Thanks,

 

Peter

 

On Thu, Mar 29, 2012 at 9:26 AM, Mike Belshe <[hidden email]> wrote:

I thought the goal was to figure out HTTP/2.0; I hope that the goals of SPDY are in-line with the goals of HTTP/2.0, and that ultimately SPDY just goes away.

 

Mike

 

On Thu, Mar 29, 2012 at 2:22 PM, Willy Tarreau <[hidden email]> wrote:

Hello,

after seeing all the disagreements that were expressed on the list these
days (including from me) about what feature from SPDY we'd like to have
mandatory or not in HTTP, I'm thinking that part of the issue comes from
the fact that there are a number of different usages of HTTP right now,
all of them fairly legitimate.

First I think that everyone here agrees that something needs to be done
to improve end user experience especially in the mobile networks. And
this is reflected by all proposals, including the http-ng draft from
14 years ago!

Second, the privacy issues are a mess because we try to address a social
problem by technical means. It's impossible to decide on a protocol if
we all give an example of what we'd like to protect and what we'd prefer
not to protect because it is useless and possibly counter-productive.

And precisely, some of the disagreement comes from the fact that we're
trying to see these impacts on the infrastructure we know today, which
would obviously be a total breakage. As PHK said it, a number of sites
will not want to afford crypto for privacy. I too know some sites which
would significantly increase their operating costs by doing so. But
what we're designing is not for now but for tomorrow.

What I think is that anyway we need a smooth upgrade path from current
HTTP/1.1 infrastructure and what will constitute the web tomorrow without
making any bigbang.

SPDY specifically addresses issues observed between the browser and the
server-side infrastructure. Some of its mandatory features are probably
not desirable past the server-side frontend *right now* (basically
whatever addresses latency and privacy concerns). Still, it would be
too bad not to make the server side infrastructure benefit from a good
lifting by progressively migrating from 1.1 to 2.0.

What does this mean ? Simply that we have to consider HTTP/2.0 as a
subset of SPDY or that SPDY should be an add-on to HTTP. And that
makes a lot of sense. First, SPDY already is an optimized messaging
alternative to HTTP. It carries HTTP/1.1, it can as well carry HTTP/2.0
since we're supposed to maintain compatible semantics.

We could then get to a point where :
 - an <a href="http://">http:// scheme indicates a connection to HTTP/1.x or 2.x server
 - an https:// scheme indicates a connection to HTTP/1.x or 2.x server
   via an SSL/TLS layer
 - a spdy:// scheme indicates a connection to HTTP/1.x or 2.x server
   via a SPDY layer

By having HTTP/2.0 upgradable from 1.1, this split is natural :

       +----------------------------+
       |       Application          |
       +----+-----------------------+
       | WS |     HTTP/2.0          |
       +----+--------------+        |
       |      HTTP/1.1     |        |
       |         +-----+---+--------+
       |         | TLS | SPDY       |
       +---------+-----+------------+   server-side
           ^        ^        ^
           |        |        |
           |        |        |
           |        |        |
       +---------+-----+------------+  user-agent
       |         | TLS | SPDY       |
       |         +-----+-------+----+
       |  HTTP/1.1, 2.0        |    |
       +-------------------+---+    |
       |                   |   WS   |
       |  Applications     +--------+
       |                            |
       +----------------------------+

The upgrade path would then be much easier :

 1) have browsers, intermediaries and servers progressively
    adopt HTTP/2.0 and support a seamless upgrade

 2) have browsers, some intermediaries and some servers
    progressively adopt SPDY for the front-line

 3) have a lot of web sites offer URLs as spdy:// instead of <a href="http://">http://,
    and implement mandatory redirects from <a href="http://">http:// to spdy:// like a
    few sites are currently doing (eg: twitter)

 4) have browsers at some point use the SPDY as the default scheme
    for any domain name typed on the URL bar.

 5) have browsers at one point disable by default transparent support
    for the old <a href="http://">http:// scheme (eg: put a warning or have to tweak
    some settings for this). This will probably 10-20 years from now.

Before we get to point 5, we'd have a number of sites running on the
new protocol, with an efficient HTTP/2.0 deployed at many places
including the backoffice, and with SPDY used by web browsers for
improved performance/privacy. That will not prevent specific agents
from still only using a simpler HTTP/2.0 for some uses.

So I think that what we should do is to distinguish between what is
really desirable to have in HTTP and what is contentious. Everything
which increases costs or causes trouble for *some* use cases should
not be mandatory in HTTP but would be in the SPDY layer (as it is
today BTW).

I think that the current SPDY+HTTP mix has shown that the two protocols
are complementary and can be efficient together. Still we can significantly
improve HTTP to make both benefit from this, starting with the backoffice
infrastructure where most of the requests lie.

Willy

 

 

 

Reply | Threaded
Open this post in threaded view
|

RE: multiplexing -- don't do it

Peter Lepeska
In reply to this post by Peter Lepeska

Hi Steve,

 

I think I replied to your server load point in the other thread.

 

In response to this point on prioritization and SPDY...

 

SPDY multiplexes the streams with consideration to priority, so this
situation wouldn't happen (afaik).  The high priority object (coming
over a higher priority stream) would preempt the lower-priority
in-flight object as soon as it was requested.

I think this actually may be better than existing HTTP:

1) In existing http, if you have 5 low-priority sessions fetching
objects and you need to fetch 1 high priority object, you can create a
high-priority session to fetch it.  By default (assuming no tcp window
size or other manipulation) that tcp connection gets 1/6th the
bandwidth (~17%).

2) With SPDY, assuming the same situation -- the high priority session
would pre-empt all low-priority streams, so the high-priority stream
would be getting ~100% of the bandwidth.

SPDY sits above a single TCP connection. So the order that it sends data to the TCP stack is the order that the receiver must read it in on the other side. This means that for in-flight data (which can easily be an entire web page worth on high latency / bandwidth product links), there is no prioritization. For example, let's say SPDY receives a small low priority object from the back end web server and pushes it to the TCP stack and then receives a small high priority object and pushes that to the TCP stack.  Assuming the objects are small, these objects will go out as two separate TCP packets -- low and then high. If the first packet gets dropped, then even if the second packet makes it to the other side, it cannot be delivered to the browser until the first packet is retransmitted and received at the user device. In fact, because multiplexing makes the traffic opaque to intermediary devices, layer 7 switches that perform differential shaping for web performance (javascript before images say) cannot enforce prioritization in the network when congested. Much better would be to have the low and high priority objects on two separate TCP connections so that when congested the switch can still provide bandwidth for the high priority object.

 

A multiplexer sitting on top of TCP can only apply prioritization in the case where it is processing two objects simultaneously but since most web objects are small this is relatively rare.

 

Thanks,

 

Peter

 

 

 

 

 

 

From: Steve Padgett [mailto:[hidden email]]
Sent: Friday, March 30, 2012 11:31 AM
To: Peter L
Subject: Re: multiplexing -- don't do it

 

Sure, no problem.

On Mar 30, 2012 4:34 PM, "Peter L" <[hidden email]> wrote:

Hi Steve,

 

Do you mind if I reply to your email and CC the list?

 

Thanks,

 

Peter

On Fri, Mar 30, 2012 at 3:49 AM, Steve Padgett <[hidden email]> wrote:

On Fri, Mar 30, 2012 at 4:07 AM, Peter L <[hidden email]> wrote:
> I'm new to this list but have been studying web performance over high
> latency networks for many years and multiplexing seems to me like the wrong
> way to go. The main benefit of multiplexing is to work around the 6
> connections per domain limit but it reduces transparency on the network,
> decreases the granularity/modularity of load balancing

On sites that have enough traffic to need load balancing, I suspect
that the # of concurrent client connections is several orders of
magnitude higher than the # of web servers - so the load should still
be evenly distributed.  Plus, one of the primary bottlenecks in load
balancers is the # of connections (both concurrent & new-per-second)
so having a 4x to 6x decrease in this would likely actually save a lot
of load balancer resources.


> and increases object
> processing latency in general on the back end as everything has to pass
> through the same multiplexer, and introduces its own intractable
> inefficiencies. In particular the handling of a low priority in flight
> object ahead of a high priority object when packet loss is present is a step
> backwards from what we have today for sites that get beyond the 6
> connections per domain limit via domain sharding.

SPDY multiplexes the streams with consideration to priority, so this
situation wouldn't happen (afaik).  The high priority object (coming
over a higher priority stream) would preempt the lower-priority
in-flight object as soon as it was requested.

I think this actually may be better than existing HTTP:

1) In existing http, if you have 5 low-priority sessions fetching
objects and you need to fetch 1 high priority object, you can create a
high-priority session to fetch it.  By default (assuming no tcp window
size or other manipulation) that tcp connection gets 1/6th the
bandwidth (~17%).

2) With SPDY, assuming the same situation -- the high priority session
would pre-empt all low-priority streams, so the high-priority stream
would be getting ~100% of the bandwidth.

I also agree with Brian on the additional issues that exist due to the
increasing the # of concurrent sessions...

Steve

 

Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Brian Pane-2
In reply to this post by Peter Lepeska
On Friday, March 30, 2012, Peter L wrote:

Responding to Ross and Brian's posts mainly here...

 

I agree that increasing concurrent connections will increase the burden on web servers and that is a serious issue for sure but since so many sites are already working around the 6 per domain limit via sharding, most site owners are willing to accept higher numbers of TCP connections if it results in faster page loads. Prevalence of domain sharding is a kind of vote in the direction of increasing the per domain limit.


What I've found empirically is that most sites suffer from request serialization--i.e., insufficient parallelism--despite all the investment in domain sharding and image spriting. My article in last December's PerfPlanet calendar 
presents the data.

Transparency:

·         SPDY compresses HTTP headers using an LZ history based algorithm, which means that previous bytes are used to compress subsequent bytes. So any packet capture that does not include all the traffic sent over that connection will be completely opaque -- no mathematical way to decode the HTTP. Even with all the traffic, a stream decoder will be a tricky thing to build b/c packets depend on each other.


I know there's a SPDY decoder plugin for Wireshark, but I'll defer to people more knowledgeable about packet analysis tools to cover that area.
 

·         Loss of transparency impacts intermediary devices (reverse proxies, caches, layer 7 switches, load balancers) as much as it does packet capture analysis. For load balancing, multiplexing requires maintaining state from one request to the next so individual object requests from a given user will need to be handled by the same de-multiplexing server.


For load balancing, you just have to ensure that all packets from the same TCP connection go to the same place for L6-7 decoding. But that's already required for HTTP/1.x.  A L7 proxy or load balancer that terminates either HTTP or SPDY is then free to dispatch successive requests from the same client to different backend servers.

 In general, increasing session orientation reduces the scalability of the overall service. Also, failover is less graceful as a load balancer will want to be more sure that the previously used server is in fact unavailable before routing to a new server.

·         SSL kills transparency at the network level completely but also I think that SSL should be considered as an orthogonal thing to performance. So that site owners can make a decision based on the cost, security, performance tradeoffs of going to all encrypted traffic. So while I agree it's related, it seems like we have to consider these things independently.

 

Increased Object Processing Latency:

·         Multiplexing requires that objects are encoded serially -- encode (Object1), encode (Object2), encode (Object3) -- and then decoded in that same order.


Object1, Object2, and Object3 need not be entire HTTP messages, though. In SPDY, unlike pipelined HTTP/1.1, a server can interleave little chunks of different responses.  That's what I consider SPDY's key design concept: not just multiplexing, but interleaving.

 On a multi-core server, the three objects arrive truly concurrently, but due to multiplexing Object2 and Object3 will need to wait while Object1 is encoded. For SPDY, that encode step involves running an LZ-type coding function including searching the recent bytes for matches so even on an unloaded server this can add ~milliseconds of latency.


The last time I looked at gzip perf, the cost was on the order of 50 clock cycles/byte on x86_64. (Anybody who's studied LZ perf more deeply, please jump in with more precise numbers.) Given 1KB of response headers, that works out to ~25 microseconds of latency at 2GHz, not milliseconds.

Having worked at a load balancer company in the past, I do agree that 25us is a material CPU cost, but it's nowhere near milliseconds.

·         Multiplexing creates the need for session state. Access to this state needs to be synchronized, thread synchronization reduces parallelism and so impacts server scalability and per object latency.

·         CPU gains are increasingly achieved by adding cores and not making existing cores go faster. So processes that can run concurrently are friendly to these advances (such as increasing concurrent TCP connections) and multiplexing goes in the opposite direction -- requiring thread synchronization and so increasing serialization, and context switching.


With separate connections, though, you still have a serialization bottleneck at the NIC. The locking neede to serialized writes to the network doesn't go away if you forego multiplexing in favor of lots of connections; it just moves to the other side of the kernel/userspace boundary.

-Brian

Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Roberto Peon-2


On Fri, Mar 30, 2012 at 6:17 PM, Brian Pane <[hidden email]> wrote:
On Friday, March 30, 2012, Peter L wrote:

Responding to Ross and Brian's posts mainly here...

 

I agree that increasing concurrent connections will increase the burden on web servers and that is a serious issue for sure but since so many sites are already working around the 6 per domain limit via sharding, most site owners are willing to accept higher numbers of TCP connections if it results in faster page loads. Prevalence of domain sharding is a kind of vote in the direction of increasing the per domain limit.


What I've found empirically is that most sites suffer from request serialization--i.e., insufficient parallelism--despite all the investment in domain sharding and image spriting. My article in last December's PerfPlanet calendar 
presents the data.

Prioritization is key in efficiently utilizing any for of parallel-requests. It happens to be much, much more difficult if you're using separate connections because there is no guarantee that they go to the same machine. As a result, your background image gets to clog the pipe instead of your browser getting the HTTP and JS it needs to do initial layout, rendering, and resource discovery.

1 connection means that it becomes trivial to do prioritization properly.
I know that it was argued that 1 connection makes it more difficult because there is buffering and you can't revoke a write() to a socket. Experience so far hasn't bourne this fear out, and even if it was true, if you can get some idea about the depth of your buffer, you can pace your output to ensure that you're never adding too much buffer-depth at any point in time.
 

Transparency:

·         SPDY compresses HTTP headers using an LZ history based algorithm, which means that previous bytes are used to compress subsequent bytes. So any packet capture that does not include all the traffic sent over that connection will be completely opaque -- no mathematical way to decode the HTTP. Even with all the traffic, a stream decoder will be a tricky thing to build b/c packets depend on each other.


I know there's a SPDY decoder plugin for Wireshark, but I'll defer to people more 
knowledgeable about packet analysis tools to cover that area.

The OP is right about this, btw. Technically it is possible that you've flushed the window after 2k of completely new data, but there is no guarantee and so interpreting a stream in the  middle may be extremely difficult.

Seems like a fine tradeoff for the latency savings that we get on low-BW links, though.
 
 

·         Loss of transparency impacts intermediary devices (reverse proxies, caches, layer 7 switches, load balancers) as much as it does packet capture analysis. For load balancing, multiplexing requires maintaining state from one request to the next so individual object requests from a given user will need to be handled by the same de-multiplexing server.


For load balancing, you just have to ensure that all packets from the same TCP connection go to the same place for L6-7 decoding. But that's already required for HTTP/1.x.  A L7 proxy or load balancer that terminates either HTTP or SPDY is then free to dispatch successive requests from the same client to different backend servers.

Note that 'the same place' probably means the same IP, but there is no assurance that the same IP will mean the same machine or network adapter. With multiplexing over TCP (or any equivalent like SCTP), you're either guaranteed or at least much more likely to get locality for that user on one loadbalancer or machine.
 
Using fewer connections decreases vastly the amount of state necessary to do proper demux to the right server, and, as noted before, allows the LB or server to trivially do prioritization.


 In general, increasing session orientation reduces the scalability of the overall service. Also, failover is less graceful as a load balancer will want to be more sure that the previously used server is in fact unavailable before routing to a new server.

·         SSL kills transparency at the network level completely but also I think that SSL should be considered as an orthogonal thing to performance. So that site owners can make a decision based on the cost, security, performance tradeoffs of going to all encrypted traffic. So while I agree it's related, it seems like we have to consider these things independently.

 

Increased Object Processing Latency:

·         Multiplexing requires that objects are encoded serially -- encode (Object1), encode (Object2), encode (Object3) -- and then decoded in that same order.


Object1, Object2, and Object3 need not be entire HTTP messages, though. In SPDY, unlike pipelined HTTP/1.1, a server can interleave little chunks of different responses.  That's what I consider SPDY's key design concept: not just multiplexing, but interleaving.

Multiplexing doesn't have any effect on encoding/decoding unless you're using something requires serialization (such as the gzip compressor in SPDY). In the case of SPDY, if you can jettison the header-stream compression or better, find some compression method that doesn't have as stringent a compression requirement, you can avoid this issue.

Note anyway that the serialization requirement that gzip imposes in SPDY only affects the headers, and not the data for the request or response.
 

 On a multi-core server, the three objects arrive truly concurrently, but due to multiplexing Object2 and Object3 will need to wait while Object1 is encoded. For SPDY, that encode step involves running an LZ-type coding function including searching the recent bytes for matches so even on an unloaded server this can add ~milliseconds of latency.


The last time I looked at gzip perf, the cost was on the order of 50 clock cycles/byte on x86_64. (Anybody who's studied LZ perf more deeply, please jump in with more precise numbers.) Given 1KB of response headers, that works out to ~25 microseconds of latency at 2GHz, not milliseconds.

Having worked at a load balancer company in the past, I do agree that 25us is a material CPU cost, but it's nowhere near milliseconds.

·         Multiplexing creates the need for session state. Access to this state needs to be synchronized, thread synchronization reduces parallelism and so impacts server scalability and per object latency.

There are a lot of ways to skin a cat. One need not always resort to critical-section based synchronization. You could use epoll() instead, for instance, and process the client's connection within on thread.

·         CPU gains are increasingly achieved by adding cores and not making existing cores go faster. So processes that can run concurrently are friendly to these advances (such as increasing concurrent TCP connections) and multiplexing goes in the opposite direction -- requiring thread synchronization and so increasing serialization, and context switching.


With separate connections, though, you still have a serialization bottleneck at the NIC. The locking neede to serialized writes to the network doesn't go away if you forego multiplexing in favor of lots of connections; it just moves to the other side of the kernel/userspace boundary.

Most of the improvements you would see here for the multi-CPU case should come in the form of multiqueue NICs which hash based on the TCP tuple (or similar) and stably select a queue which feeds a non-overlapping subset of CPUs to do processing of interrupts, etc.

Again, multiplexing as it has been done with SPDY, since it is on one connection, requires less synchronization than it does for HTTP (the kernel has to do synchronization of various interactions as you increase the FD count) to handle the increased parallelism. Even in the case where you decide to use more than a constant number of threads per core (anything else will suffer in throughput compared to that design on current kernels, hardware from my experience), you will still have less contention because you can manage it yourself with domain knowledge about the connection, user, problem, method, etc. that the kernel isn't privy to and should probably never be privy to.

-=R
 

-Brian


Reply | Threaded
Open this post in threaded view
|

RE: multiplexing -- don't do it

Peter Lepeska
In reply to this post by Brian Pane-2

I agree that increasing concurrent connections will increase the burden on web servers and that is a serious issue for sure but since so many sites are already working around the 6 per domain limit via sharding, most site owners are willing to accept higher numbers of TCP connections if it results in faster page loads. Prevalence of domain sharding is a kind of vote in the direction of increasing the per domain limit.

What I've found empirically is that most sites suffer from request serialization--i.e., insufficient parallelism--despite all the investment in domain sharding and image spriting. My article in last December's PerfPlanet calendar 

presents the data.

 

Thanks for pointing me to your article. It's cool the way you attempt to filter out content interdependencies by looking at images only, though I'm sure that's not always accurate b/c page load logic can be built to wait for images to load before issuing subsequent requests. It might be interesting to include all content types but to count outstanding transactions per host and only include serialized sequences that have 6 outstanding simultaneously to filter your data. In any case, your result confirms what SPDY's test results suggest -- browsers are still bumping up against the per domain limits.

 

----

On a multi-core server, the three objects arrive truly concurrently, but due to multiplexing Object2 and Object3 will need to wait while Object1 is encoded. For SPDY, that encode step involves running an LZ-type coding function including searching the recent bytes for matches so even on an unloaded server this can add ~milliseconds of latency.

The last time I looked at gzip perf, the cost was on the order of 50 clock cycles/byte on x86_64. (Anybody who's studied LZ perf more deeply, please jump in with more precise numbers.) Given 1KB of response headers, that works out to ~25 microseconds of latency at 2GHz, not milliseconds.

 

Having worked at a load balancer company in the past, I do agree that 25us is a material CPU cost, but it's nowhere near milliseconds.

 

I ran gzip at its default speed setting of 6 to encode 100 MB of highly compressible text (similar to HTTP) and encoded it in about 9 seconds with a 2.4 GHz CPU, so you are right it only takes about 90us for a 1 KB response header. And this would be better at faster gzip settings. I was thinking of the time required to gzip the objects and not just the headers. I agree this seems small for the gzipping alone.

 

----

·         Multiplexing creates the need for session state. Access to this state needs to be synchronized, thread synchronization reduces parallelism and so impacts server scalability and per object latency.

·         CPU gains are increasingly achieved by adding cores and not making existing cores go faster. So processes that can run concurrently are friendly to these advances (such as increasing concurrent TCP connections) and multiplexing goes in the opposite direction -- requiring thread synchronization and so increasing serialization, and context switching.

With separate connections, though, you still have a serialization bottleneck at the NIC. The locking needed to serialized writes to the network doesn't go away if you forego multiplexing in favor of lots of connections; it just moves to the other side of the kernel/userspace boundary.

 

I hadn't thought of it this way. At the NIC level, does it matter that packets are associated with different TCP connections? I'd think the serialization to the network would be insensitive to L4 headers but don't know enough about how this works.

 

Thanks,

 

Peter

 

 

From: Brian Pane [mailto:[hidden email]]
Sent: Friday, March 30, 2012 12:17 PM
To: Peter L
Cc: [hidden email]
Subject: Re: multiplexing -- don't do it

 

On Friday, March 30, 2012, Peter L wrote:

Responding to Ross and Brian's posts mainly here...

 

I agree that increasing concurrent connections will increase the burden on web servers and that is a serious issue for sure but since so many sites are already working around the 6 per domain limit via sharding, most site owners are willing to accept higher numbers of TCP connections if it results in faster page loads. Prevalence of domain sharding is a kind of vote in the direction of increasing the per domain limit.

 

What I've found empirically is that most sites suffer from request serialization--i.e., insufficient parallelism--despite all the investment in domain sharding and image spriting. My article in last December's PerfPlanet calendar 

presents the data.

 

Transparency:

·         SPDY compresses HTTP headers using an LZ history based algorithm, which means that previous bytes are used to compress subsequent bytes. So any packet capture that does not include all the traffic sent over that connection will be completely opaque -- no mathematical way to decode the HTTP. Even with all the traffic, a stream decoder will be a tricky thing to build b/c packets depend on each other.

 

I know there's a SPDY decoder plugin for Wireshark, but I'll defer to people more knowledgeable about packet analysis tools to cover that area.

 

·         Loss of transparency impacts intermediary devices (reverse proxies, caches, layer 7 switches, load balancers) as much as it does packet capture analysis. For load balancing, multiplexing requires maintaining state from one request to the next so individual object requests from a given user will need to be handled by the same de-multiplexing server.

 

For load balancing, you just have to ensure that all packets from the same TCP connection go to the same place for L6-7 decoding. But that's already required for HTTP/1.x.  A L7 proxy or load balancer that terminates either HTTP or SPDY is then free to dispatch successive requests from the same client to different backend servers.

 

 In general, increasing session orientation reduces the scalability of the overall service. Also, failover is less graceful as a load balancer will want to be more sure that the previously used server is in fact unavailable before routing to a new server.

·         SSL kills transparency at the network level completely but also I think that SSL should be considered as an orthogonal thing to performance. So that site owners can make a decision based on the cost, security, performance tradeoffs of going to all encrypted traffic. So while I agree it's related, it seems like we have to consider these things independently.

 

Increased Object Processing Latency:

·         Multiplexing requires that objects are encoded serially -- encode (Object1), encode (Object2), encode (Object3) -- and then decoded in that same order.

 

Object1, Object2, and Object3 need not be entire HTTP messages, though. In SPDY, unlike pipelined HTTP/1.1, a server can interleave little chunks of different responses.  That's what I consider SPDY's key design concept: not just multiplexing, but interleaving.

 

 On a multi-core server, the three objects arrive truly concurrently, but due to multiplexing Object2 and Object3 will need to wait while Object1 is encoded. For SPDY, that encode step involves running an LZ-type coding function including searching the recent bytes for matches so even on an unloaded server this can add ~milliseconds of latency.

 

The last time I looked at gzip perf, the cost was on the order of 50 clock cycles/byte on x86_64. (Anybody who's studied LZ perf more deeply, please jump in with more precise numbers.) Given 1KB of response headers, that works out to ~25 microseconds of latency at 2GHz, not milliseconds.

 

Having worked at a load balancer company in the past, I do agree that 25us is a material CPU cost, but it's nowhere near milliseconds.

 

·         Multiplexing creates the need for session state. Access to this state needs to be synchronized, thread synchronization reduces parallelism and so impacts server scalability and per object latency.

·         CPU gains are increasingly achieved by adding cores and not making existing cores go faster. So processes that can run concurrently are friendly to these advances (such as increasing concurrent TCP connections) and multiplexing goes in the opposite direction -- requiring thread synchronization and so increasing serialization, and context switching.

 

With separate connections, though, you still have a serialization bottleneck at the NIC. The locking neede to serialized writes to the network doesn't go away if you forego multiplexing in favor of lots of connections; it just moves to the other side of the kernel/userspace boundary.

 

-Brian

Reply | Threaded
Open this post in threaded view
|

RE: multiplexing -- don't do it

Peter Lepeska
In reply to this post by Roberto Peon-2

 

" Prioritization is key in efficiently utilizing any for of parallel-requests. It happens to be much, much more difficult if you're using separate connections because there is no guarantee that they go to the same machine. As a result, your background image gets to clog the pipe instead of your browser getting the HTTP and JS it needs to do initial layout, rendering, and resource discovery."

The fact that SPDY enables the browser to tag a request as high priority when in fact a resource is blocking is awesome. This should be built into HTTP 2.0 with or without the multiplexing piece. It allows either SPDY server, or other intermediaries in the network (assuming the HTTP is not opaque) that are doing shaping to make better decisions. And only the browser has this information.

 

1 connection means that it becomes trivial to do prioritization properly.

I know that it was argued that 1 connection makes it more difficult because there is buffering and you can't revoke a write() to a socket. Experience so far hasn't bourne this fear out, and even if it was true, if you can get some idea about the depth of your buffer, you can pace your output to ensure that you're never adding too much buffer-depth at any point in time.

Have you done much testing with packet loss? As expressed earlier, my biggest concern is head-of-line blocking when low priority objects are at the head of the line. But it makes sense that you could mitigate this somewhat by making sure you don't buffer too much in the network.  Also, the approach I like of allowing a larger pool of persistent connections per domain increases the likelihood that a SYN or SYN-ACK will be dropped, which results in the disastrous 3 second reconnect delay.

Re: no decoder plugin for wireshark.

"Seems like a fine tradeoff for the latency savings that we get on low-BW links, though."

I agree if SPDY could be applied to problematic links only. Otherwise it's a lot to give up if you spend your days troubleshooting networks and application performance.

"Again, multiplexing as it has been done with SPDY, since it is on one connection, requires less synchronization than it does for HTTP (the kernel has to do synchronization of various interactions as you increase the FD count) to handle the increased parallelism. Even in the case where you decide to use more than a constant number of threads per core (anything else will suffer in throughput compared to that design on current kernels, hardware from my experience), you will still have less contention because you can manage it yourself with domain knowledge about the connection, user, problem, method, etc. that the kernel isn't privy to and should probably never be privy to."

Are you saying a SPDY-enabled web server outperforms one without SPDY for a given web page b/c it converts many TCP connections into a single connection and this actually decreases kernel level thread contention more than the contention added by SPDY's user mode serialization logic? I wonder if this is what the adopters (Google and Twitter) have found so far.

Thanks,

Peter

 

 

On Fri, Mar 30, 2012 at 6:17 PM, Brian Pane <[hidden email]> wrote:

On Friday, March 30, 2012, Peter L wrote:

Responding to Ross and Brian's posts mainly here...

 

I agree that increasing concurrent connections will increase the burden on web servers and that is a serious issue for sure but since so many sites are already working around the 6 per domain limit via sharding, most site owners are willing to accept higher numbers of TCP connections if it results in faster page loads. Prevalence of domain sharding is a kind of vote in the direction of increasing the per domain limit.

 

What I've found empirically is that most sites suffer from request serialization--i.e., insufficient parallelism--despite all the investment in domain sharding and image spriting. My article in last December's PerfPlanet calendar 

presents the data.

 

Prioritization is key in efficiently utilizing any for of parallel-requests. It happens to be much, much more difficult if you're using separate connections because there is no guarantee that they go to the same machine. As a result, your background image gets to clog the pipe instead of your browser getting the HTTP and JS it needs to do initial layout, rendering, and resource discovery.

 

1 connection means that it becomes trivial to do prioritization properly.

I know that it was argued that 1 connection makes it more difficult because there is buffering and you can't revoke a write() to a socket. Experience so far hasn't bourne this fear out, and even if it was true, if you can get some idea about the depth of your buffer, you can pace your output to ensure that you're never adding too much buffer-depth at any point in time.

 

 

Transparency:

·         SPDY compresses HTTP headers using an LZ history based algorithm, which means that previous bytes are used to compress subsequent bytes. So any packet capture that does not include all the traffic sent over that connection will be completely opaque -- no mathematical way to decode the HTTP. Even with all the traffic, a stream decoder will be a tricky thing to build b/c packets depend on each other.

 

I know there's a SPDY decoder plugin for Wireshark, but I'll defer to people more 

knowledgeable about packet analysis tools to cover that area.

 

The OP is right about this, btw. Technically it is possible that you've flushed the window after 2k of completely new data, but there is no guarantee and so interpreting a stream in the  middle may be extremely difficult.

 

Seems like a fine tradeoff for the latency savings that we get on low-BW links, though.

 

 

·         Loss of transparency impacts intermediary devices (reverse proxies, caches, layer 7 switches, load balancers) as much as it does packet capture analysis. For load balancing, multiplexing requires maintaining state from one request to the next so individual object requests from a given user will need to be handled by the same de-multiplexing server.

 

For load balancing, you just have to ensure that all packets from the same TCP connection go to the same place for L6-7 decoding. But that's already required for HTTP/1.x.  A L7 proxy or load balancer that terminates either HTTP or SPDY is then free to dispatch successive requests from the same client to different backend servers.

 

Note that 'the same place' probably means the same IP, but there is no assurance that the same IP will mean the same machine or network adapter. With multiplexing over TCP (or any equivalent like SCTP), you're either guaranteed or at least much more likely to get locality for that user on one loadbalancer or machine.

 

Using fewer connections decreases vastly the amount of state necessary to do proper demux to the right server, and, as noted before, allows the LB or server to trivially do prioritization.

 

 

 In general, increasing session orientation reduces the scalability of the overall service. Also, failover is less graceful as a load balancer will want to be more sure that the previously used server is in fact unavailable before routing to a new server.

·         SSL kills transparency at the network level completely but also I think that SSL should be considered as an orthogonal thing to performance. So that site owners can make a decision based on the cost, security, performance tradeoffs of going to all encrypted traffic. So while I agree it's related, it seems like we have to consider these things independently.

 

Increased Object Processing Latency:

·         Multiplexing requires that objects are encoded serially -- encode (Object1), encode (Object2), encode (Object3) -- and then decoded in that same order.

 

Object1, Object2, and Object3 need not be entire HTTP messages, though. In SPDY, unlike pipelined HTTP/1.1, a server can interleave little chunks of different responses.  That's what I consider SPDY's key design concept: not just multiplexing, but interleaving.

 

Multiplexing doesn't have any effect on encoding/decoding unless you're using something requires serialization (such as the gzip compressor in SPDY). In the case of SPDY, if you can jettison the header-stream compression or better, find some compression method that doesn't have as stringent a compression requirement, you can avoid this issue.

 

Note anyway that the serialization requirement that gzip imposes in SPDY only affects the headers, and not the data for the request or response.

 

 

 On a multi-core server, the three objects arrive truly concurrently, but due to multiplexing Object2 and Object3 will need to wait while Object1 is encoded. For SPDY, that encode step involves running an LZ-type coding function including searching the recent bytes for matches so even on an unloaded server this can add ~milliseconds of latency.

 

The last time I looked at gzip perf, the cost was on the order of 50 clock cycles/byte on x86_64. (Anybody who's studied LZ perf more deeply, please jump in with more precise numbers.) Given 1KB of response headers, that works out to ~25 microseconds of latency at 2GHz, not milliseconds.

 

Having worked at a load balancer company in the past, I do agree that 25us is a material CPU cost, but it's nowhere near milliseconds.

 

·         Multiplexing creates the need for session state. Access to this state needs to be synchronized, thread synchronization reduces parallelism and so impacts server scalability and per object latency.

There are a lot of ways to skin a cat. One need not always resort to critical-section based synchronization. You could use epoll() instead, for instance, and process the client's connection within on thread.

·         CPU gains are increasingly achieved by adding cores and not making existing cores go faster. So processes that can run concurrently are friendly to these advances (such as increasing concurrent TCP connections) and multiplexing goes in the opposite direction -- requiring thread synchronization and so increasing serialization, and context switching.

 

With separate connections, though, you still have a serialization bottleneck at the NIC. The locking neede to serialized writes to the network doesn't go away if you forego multiplexing in favor of lots of connections; it just moves to the other side of the kernel/userspace boundary.

 

Most of the improvements you would see here for the multi-CPU case should come in the form of multiqueue NICs which hash based on the TCP tuple (or similar) and stably select a queue which feeds a non-overlapping subset of CPUs to do processing of interrupts, etc.

 

Again, multiplexing as it has been done with SPDY, since it is on one connection, requires less synchronization than it does for HTTP (the kernel has to do synchronization of various interactions as you increase the FD count) to handle the increased parallelism. Even in the case where you decide to use more than a constant number of threads per core (anything else will suffer in throughput compared to that design on current kernels, hardware from my experience), you will still have less contention because you can manage it yourself with domain knowledge about the connection, user, problem, method, etc. that the kernel isn't privy to and should probably never be privy to.

 

-=R

 

 

-Brian

 

Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Adrien de Croy
In reply to this post by Roberto Peon-2

------ Original Message ------
From: "Roberto Peon" [hidden email]

·         SPDY compresses HTTP headers using an LZ history based algorithm, which means that previous bytes are used to compress subsequent bytes. So any packet capture that does not include all the traffic sent over that connection will be completely opaque -- no mathematical way to decode the HTTP. Even with all the traffic, a stream decoder will be a tricky thing to build b/c packets depend on each other.


I know there's a SPDY decoder plugin for Wireshark, but I'll defer to people more 
knowledgeable about packet analysis tools to cover that area.

The OP is right about this, btw. Technically it is possible that you've flushed the window after 2k of completely new data, but there is no guarantee and so interpreting a stream in the  middle may be extremely difficult.

Seems like a fine tradeoff for the latency savings that we get on low-BW links, though.
 
I think it basically means compression or any transport level transform needs to be able to be switched off when debugging.  Which means optional/negotiated.
 
I have to analyse packet dumps of HTTP most days, as I'm sure do many others on this list.  We haven't yet evolved as a species to the stage where we don't make mistakes.
 
I think it's a vitally important facility for discovering implementation errors, which is required in many cases to resolve issues.
 
Adrien
 

 
Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Alexey Melnikov
On 31/03/2012 00:52, Adrien W. de Croy wrote:

------ Original Message ------
From: "Roberto Peon" [hidden email]

·         SPDY compresses HTTP headers using an LZ history based algorithm, which means that previous bytes are used to compress subsequent bytes. So any packet capture that does not include all the traffic sent over that connection will be completely opaque -- no mathematical way to decode the HTTP. Even with all the traffic, a stream decoder will be a tricky thing to build b/c packets depend on each other.


I know there's a SPDY decoder plugin for Wireshark, but I'll defer to people more 
knowledgeable about packet analysis tools to cover that area.

The OP is right about this, btw. Technically it is possible that you've flushed the window after 2k of completely new data, but there is no guarantee and so interpreting a stream in the  middle may be extremely difficult.

Seems like a fine tradeoff for the latency savings that we get on low-BW links, though.
 
I think it basically means compression or any transport level transform needs to be able to be switched off when debugging.  Which means optional/negotiated.
I think it should be mandatory to implement (so no discovery of the feature is needed), but optional to use.

I have to analyse packet dumps of HTTP most days, as I'm sure do many others on this list.  We haven't yet evolved as a species to the stage where we don't make mistakes.
 
I think it's a vitally important facility for discovering implementation errors, which is required in many cases to resolve issues.
 
Adrien
 

 

Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Adrien de Croy

------ Original Message ------
From: "Alexey Melnikov" <[hidden email]>
To: "Adrien W. de Croy" <[hidden email]>
Cc: "Roberto Peon" <[hidden email]>;"[hidden email]" <[hidden email]>
Sent: 31/03/2012 12:12:18 p.m.
Subject: Re: multiplexing -- don't do it
On 31/03/2012 00:52, Adrien W. de Croy wrote:

------ Original Message ------
From: "Roberto Peon" [hidden email]

·         SPDY compresses HTTP headers using an LZ history based algorithm, which means that previous bytes are used to compress subsequent bytes. So any packet capture that does not include all the traffic sent over that connection will be completely opaque -- no mathematical way to decode the HTTP. Even with all the traffic, a stream decoder will be a tricky thing to build b/c packets depend on each other.


I know there's a SPDY decoder plugin for Wireshark, but I'll defer to people more 
knowledgeable about packet analysis tools to cover that area.

The OP is right about this, btw. Technically it is possible that you've flushed the window after 2k of completely new data, but there is no guarantee and so interpreting a stream in the  middle may be extremely difficult.

Seems like a fine tradeoff for the latency savings that we get on low-BW links, though.
 
I think it basically means compression or any transport level transform needs to be able to be switched off when debugging.  Which means optional/negotiated.
I think it should be mandatory to implement (so no discovery of the feature is needed), but optional to use.
for something like TLS or gzip I've absolutely no problem with that. 
 

I have to analyse packet dumps of HTTP most days, as I'm sure do many others on this list.  We haven't yet evolved as a species to the stage where we don't make mistakes.
 
I think it's a vitally important facility for discovering implementation errors, which is required in many cases to resolve issues.
 
Adrien
 

 

Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Mike Belshe


On Sat, Mar 31, 2012 at 1:19 AM, Adrien W. de Croy <[hidden email]> wrote:

------ Original Message ------
From: "Alexey Melnikov" <[hidden email]>
To: "Adrien W. de Croy" <[hidden email]>
Cc: "Roberto Peon" <[hidden email]>;"[hidden email]" <[hidden email]>
Sent: 31/03/2012 12:12:18 p.m.
Subject: Re: multiplexing -- don't do it
On 31/03/2012 00:52, Adrien W. de Croy wrote:

------ Original Message ------
From: "Roberto Peon" [hidden email][hidden email]

·         SPDY compresses HTTP headers using an LZ history based algorithm, which means that previous bytes are used to compress subsequent bytes. So any packet capture that does not include all the traffic sent over that connection will be completely opaque -- no mathematical way to decode the HTTP. Even with all the traffic, a stream decoder will be a tricky thing to build b/c packets depend on each other.


I know there's a SPDY decoder plugin for Wireshark, but I'll defer to people more 
knowledgeable about packet analysis tools to cover that area.

The OP is right about this, btw. Technically it is possible that you've flushed the window after 2k of completely new data, but there is no guarantee and so interpreting a stream in the  middle may be extremely difficult.

Seems like a fine tradeoff for the latency savings that we get on low-BW links, though.
 
I think it basically means compression or any transport level transform needs to be able to be switched off when debugging.  Which means optional/negotiated.
I think it should be mandatory to implement (so no discovery of the feature is needed), but optional to use.
for something like TLS or gzip I've absolutely no problem with that. 

Before thinking this way we should look at how well other mandatory but optional to use features have turned out.

One such example is pipelining.  Mandatory for a decade, but optional to implement. We still can't turn it on.

Another is chunked uploads.  Hugely valuable for data uploads from the client, mandatory to implement, but completely broken and unusable by browsers because of it.

Options simply don't work - we need to make this stuff mandatory from the get-go or it  is very likely to have the same result that we've seen in the past.

Finally, debugging is the wrong reason to make things optional at the protocol level. If you are in a position to turn of f the flags, turn them off via cmdline tricks or what not.  It's fine.  If you're running a production trace, and we want the feature on (presumably the option to turn off was not used, right?), you're still going to need real debugging tools.  The right answer here is to invest in the tools - not to neuter the protocol features.  

BTW - check out chromes about:net-internals.  It's not perfect, but it takes care of all this for you so that you can see the spdy frames fully decoded.  It's not a panacea of tools - but this stuff is very implementable.

Mike

 
 

I have to analyse packet dumps of HTTP most days, as I'm sure do many others on this list.  We haven't yet evolved as a species to the stage where we don't make mistakes.
 
I think it's a vitally important facility for discovering implementation errors, which is required in many cases to resolve issues.
 
Adrien
 

 


Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Julian Reschke
On 2012-03-31 01:53, Mike Belshe wrote:
> ...
> Before thinking this way we should look at how well other mandatory but
> optional to use features have turned out.
>
> One such example is pipelining.  Mandatory for a decade, but optional to
> implement. We still can't turn it on.
> ...

But then many people have it turned on, and it seems to be on by default
in Safari mobile. Maybe the situation is much better than you think.

> ...
> Options simply don't work - we need to make this stuff mandatory from
> the get-go or it  is very likely to have the same result that we've seen
> in the past.
> ...

I don't think that's correct.

Options do not work if and only if they are usually not switched on.

For instance, if header compression is optional, but common UAs will use
it by default, it *will* be implemented.

Also, this is a feature that can trivially tested in a test suite.

Best regards, Julian

Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Willy Tarreau-3
Hi Julian,

On Sat, Mar 31, 2012 at 08:57:03AM +0200, Julian Reschke wrote:

> On 2012-03-31 01:53, Mike Belshe wrote:
> >...
> >Before thinking this way we should look at how well other mandatory but
> >optional to use features have turned out.
> >
> >One such example is pipelining.  Mandatory for a decade, but optional to
> >implement. We still can't turn it on.
> >...
>
> But then many people have it turned on, and it seems to be on by default
> in Safari mobile. Maybe the situation is much better than you think.
>
> >...
> >Options simply don't work - we need to make this stuff mandatory from
> >the get-go or it  is very likely to have the same result that we've seen
> >in the past.
> >...
>
> I don't think that's correct.
>
> Options do not work if and only if they are usually not switched on.
>
> For instance, if header compression is optional, but common UAs will use
> it by default, it *will* be implemented.

Mike suggested that mandatory features must not necessarily be used but at
least be implemented. And in fact, whatever concerns the connection setup
from the client to the server has to be implemented ; if a browser sends
compressed headers to a gateway that cannot decompress them, it will fail
(which was my reason to try something cheaper than zlib from the server's
point of view). Not supporting the other direction is not an issue however
since the server knows what the client supports.

Best regards,
Willy


Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Poul-Henning Kamp
In message <[hidden email]>, Willy Tarreau writes:

>> For instance, if header compression is optional, but common UAs will use
>> it by default, it *will* be implemented.

I think for a facility like compression, it would be prefectly justified
to make it a "default-on" feature, which may cost a RTT to disable for
clients which don't grok it.

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
[hidden email]         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Willy Tarreau-3
On Sat, Mar 31, 2012 at 07:34:47AM +0000, Poul-Henning Kamp wrote:
> In message <[hidden email]>, Willy Tarreau writes:
>
> >> For instance, if header compression is optional, but common UAs will use
> >> it by default, it *will* be implemented.
>
> I think for a facility like compression, it would be prefectly justified
> to make it a "default-on" feature, which may cost a RTT to disable for
> clients which don't grok it.

I have no issue with that provided it's cheap. I'd like to note that the
additional RTT happens only if it's not supported by the server, so that
the client's request is not understood. If we make compression cheap enough
for both ends, as we proposed as well as Roy proposed in Waka (which looks
more advanced BTW), then there might not be any reason not to support it
and it would be even easier.

Willy


Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Ian Fette (イアンフェッティ)

I am not sure what "easy" means. Easy to implement? Both are straightforward enough. Easy to debug? Meh. Our goal should be performance for billions of users, not the convenience of many orders of magnitude less.

On Mar 31, 2012 10:04 AM, "Willy Tarreau" <[hidden email]> wrote:
On Sat, Mar 31, 2012 at 07:34:47AM +0000, Poul-Henning Kamp wrote:
> In message <[hidden email]>, Willy Tarreau writes:
>
> >> For instance, if header compression is optional, but common UAs will use
> >> it by default, it *will* be implemented.
>
> I think for a facility like compression, it would be prefectly justified
> to make it a "default-on" feature, which may cost a RTT to disable for
> clients which don't grok it.

I have no issue with that provided it's cheap. I'd like to note that the
additional RTT happens only if it's not supported by the server, so that
the client's request is not understood. If we make compression cheap enough
for both ends, as we proposed as well as Roy proposed in Waka (which looks
more advanced BTW), then there might not be any reason not to support it
and it would be even easier.

Willy


Reply | Threaded
Open this post in threaded view
|

Re: multiplexing -- don't do it

Willy Tarreau-3
On Sat, Mar 31, 2012 at 01:26:19AM -0700, Ian Fette (????????????????????????) wrote:
> I am not sure what "easy" means. Easy to implement? Both are
> straightforward enough. Easy to debug? Meh. Our goal should be performance
> for billions of users, not the convenience of many orders of magnitude less.

Sorry Ian, I meant "easy to process", which clearly fits in the performance
category.

Willy


1234 ... 11