draft-ietf-httpbis-cache-digest-00 comments

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

draft-ietf-httpbis-cache-digest-00 comments

Martin Thomson-3
As I've said before, this is really interesting work, I'm very much
interested in seeing this progress.  However, I found a lot of issues
with the current draft.

The latest version seems to be a bit of a regression.  In particular,
the addition of all the flags makes it a lot more complicated, and I'm
already concerned about managing complexity here, especially since
this is an optimization.

The draft doesn't actually say where this frame should be sent - on a
stream that carries a request, or on stream 0.  This is important
because there are several mentions of origin.  For instance, the new
RESET flag talks about clearing digests for the "applicable origin".
That establishes a large point of confusion about the scope that a
digest is assumed to apply to; by their nature, this isn't necessarily
fatal, until you want to talk about RESET and COMPLETE.

To up-level a bit on this general issue, I'd like to see a better
formulated description of the information that clients and servers are
expected to maintain.  There seem to be multiple objects that are
stored, but I'm not sure what scope they are maintained in; is the
scope an origin?

Assuming a particular scope, are there two objects, or four?  That is,
is there could be four stores:

1. assumed fresh by URL
2. assumed fresh by URL and etag
3. assumed stale by URL
4. assumed stale by URL and etag

Or are 1+2 and 3+4 combined?  The definition of RESET implies that all
four stores are cleared.  The definition of COMPLETE implies that only
1+2 or 3+4 are affected.

The draft doesn't talk about URL normalization.  That is a risk to the
feasibility of this; fail to do something sensible here and you could
get a lot of spurious misses.  Given that it is just an optimization,
we don't need 100% agreement for this to work, but saying something is
probably wise.  We can probably get away with making some lame
recommendations about how to encode a URL.  Here's a rough cut of
something that didn't make the draft deadline this time around:
https://martinthomson.github.io/http-miser/#rfc.section.2.1

I don't see any value in COMPLETE.  Even if we accept that there is
only one connection from this client to this server, the benefit in
knowing that the digest is complete is marginal at best.  Is there
just one extra resource missing, or thousands.  As such, it changes
the probability by some unknown quantity, which isn't actionable.

Can a frame with the RESET flag include a digest as well?

N and P could fit into a single octet.  Since you are into the flags
on the frame anyway, reduce N and P to 4 bits apiece and use flags to
fill the upper bits as needed.  But I don't think that they will be
needed.  At the point that you have more than 2^16 entries in the
digest, you are probably not going to want to use this.  Even with a
tiny P=3 - which is too high a false positive probability to be useful
- with N=2^16 you still need 32K to send the digest.  You could safely
increase the floor for P before you might need or want higher bits
(and make the minimum higher than 2^0, which is probably too high a
false-positive probability in any case).

Is the calculation of N really round(log2(urls.length)).  I thought
that you would want to use ceil() instead.  Is the word "up" missing
from step 1 in Section 2.1.1?

The draft never actually mentions that it uses [Rice-]Golomb-coding
until the appendix.  Including a reference to the Rice paper would
help people implementing this understand where this comes from, as
well as leading them to being able to find the relevant research.
(nit: Spelling Golomb correctly would help here.)

Reply | Threaded
Open this post in threaded view
|

Re: draft-ietf-httpbis-cache-digest-00 comments

Kazuho Oku
Hi,

Thank you for your comments.

The comments below are mine, and Mark might have different opinions.

2016-07-13 11:18 GMT+09:00 Martin Thomson <[hidden email]>:

> As I've said before, this is really interesting work, I'm very much
> interested in seeing this progress.  However, I found a lot of issues
> with the current draft.
>
> The latest version seems to be a bit of a regression.  In particular,
> the addition of all the flags makes it a lot more complicated, and I'm
> already concerned about managing complexity here, especially since
> this is an optimization.
>
> The draft doesn't actually say where this frame should be sent - on a
> stream that carries a request, or on stream 0.

In section 2.1 the draft states: A CACHE_DIGEST frame can be sent from
a client to a server on any stream in the “open” state. My
understanding is that it would be enough to indicate that the frame
should be sent on a stream that carries a request as well as when it
should be sent.

> This is important
> because there are several mentions of origin.  For instance, the new
> RESET flag talks about clearing digests for the "applicable origin".
> That establishes a large point of confusion about the scope that a
> digest is assumed to apply to; by their nature, this isn't necessarily
> fatal, until you want to talk about RESET and COMPLETE.
>
> To up-level a bit on this general issue, I'd like to see a better
> formulated description of the information that clients and servers are
> expected to maintain.  There seem to be multiple objects that are
> stored, but I'm not sure what scope they are maintained in; is the
> scope an origin?

Yes.

> Assuming a particular scope, are there two objects, or four?  That is,
> is there could be four stores:
>
> 1. assumed fresh by URL
> 2. assumed fresh by URL and etag
> 3. assumed stale by URL
> 4. assumed stale by URL and etag
>
> Or are 1+2 and 3+4 combined?  The definition of RESET implies that all
> four stores are cleared.  The definition of COMPLETE implies that only
> 1+2 or 3+4 are affected.

There are four objects, which are grouped into two.

Your reading is correct that RESET flag clears all of them, and that
the COMPLETE flag implies to either 1+2 or 3+4.

> The draft doesn't talk about URL normalization.  That is a risk to the
> feasibility of this; fail to do something sensible here and you could
> get a lot of spurious misses.  Given that it is just an optimization,
> we don't need 100% agreement for this to work, but saying something is
> probably wise.  We can probably get away with making some lame
> recommendations about how to encode a URL.  Here's a rough cut of
> something that didn't make the draft deadline this time around:
> https://martinthomson.github.io/http-miser/#rfc.section.2.1

Thank you for the suggestion.

I have a mixed feeling about this; in section 2.2.1 the current draft
says "Effective Request URI of a cached response" should be used.

So the cache digest would work without URL normalization if both of
the following conditions are met:
* if the client caches a response NOT normalizing the request URI into some form
* if the server looks up the cache digest using a URI that a client would send

For example, if a HTML with a script tag specifying /%7Efoo/script.js
is served to the client, then the draft excepts the client to use that
value (including %7E) to be used as the key, and that the server
should test the digest using the exact same form.

The pros of this approach would be that it would be easier to
implement. The cons is that it would be fragile due to no
normalization.

And I agree with you that in case we go without normalization we
should warn the users that the paths should be same in terms of
octets.

> I don't see any value in COMPLETE.  Even if we accept that there is
> only one connection from this client to this server, the benefit in
> knowing that the digest is complete is marginal at best.  Is there
> just one extra resource missing, or thousands.  As such, it changes
> the probability by some unknown quantity, which isn't actionable.

I do find value in COMPLETE.

For a server with the primary goal to minimize B/W consumption and the
second goal to minimize latency, it is wise to push responses that are
known NOT to be cached by a client.

That's what the COMPLETE flag can be used for. Without the flag, a
server can only tell if a response is already cached or _might_ by
cached.

> Can a frame with the RESET flag include a digest as well?

Yes. That is the intention of the draft.

> N and P could fit into a single octet.  Since you are into the flags
> on the frame anyway, reduce N and P to 4 bits apiece and use flags to
> fill the upper bits as needed.  But I don't think that they will be
> needed.  At the point that you have more than 2^16 entries in the
> digest, you are probably not going to want to use this.  Even with a
> tiny P=3 - which is too high a false positive probability to be useful
> - with N=2^16 you still need 32K to send the digest.  You could safely
> increase the floor for P before you might need or want higher bits
> (and make the minimum higher than 2^0, which is probably too high a
> false-positive probability in any case).

I would argue that P=1 would still be useful in some cases. For
example if 10 resources are missing on the client side, it would mean
that a server can detect 5 of them missing and push them in case P=1
is used.

And considering the fact that we would nevertheless have read-n-bits
operation while decoding the Golomb-encoded values, I do not see
strong reason to squash N and P into a single octet.

> Is the calculation of N really round(log2(urls.length)).  I thought
> that you would want to use ceil() instead.  Is the word "up" missing
> from step 1 in Section 2.1.1?

That draft has intentionally been written to use round.

The numbers that matter when using Golomb-coded sets are:

    P: divisor used to divide bits that are unary-encoded and binary-encode
    N*P: range of the encoded values

For efficiency, both P and N*P must be powers of 2.

To encode effectively, the real probability should be near to the
value of P. And that in turn means that N*P should be
round_to_power_of_two(urls.length * P) rather than
round_up_to_power_of_two(urls.length * P).

> The draft never actually mentions that it uses [Rice-]Golomb-coding
> until the appendix.  Including a reference to the Rice paper would
> help people implementing this understand where this comes from, as
> well as leading them to being able to find the relevant research.
> (nit: Spelling Golomb correctly would help here.)

I agree. Thank you for noticing that!

--
Kazuho Oku

Reply | Threaded
Open this post in threaded view
|

Re: draft-ietf-httpbis-cache-digest-00 comments

Mark Nottingham-2
Sorry for the delay, been travelling, then stuck, then sick, then catching up.


> On 14 Jul 2016, at 5:24 PM, Kazuho Oku <[hidden email]> wrote:
>
> Hi,
>
> Thank you for your comments.
>
> The comments below are mine, and Mark might have different opinions.
>
> 2016-07-13 11:18 GMT+09:00 Martin Thomson <[hidden email]>:
>> As I've said before, this is really interesting work, I'm very much
>> interested in seeing this progress.  However, I found a lot of issues
>> with the current draft.
>>
>> The latest version seems to be a bit of a regression.  In particular,
>> the addition of all the flags makes it a lot more complicated, and I'm
>> already concerned about managing complexity here, especially since
>> this is an optimization.
>>
>> The draft doesn't actually say where this frame should be sent - on a
>> stream that carries a request, or on stream 0.
>
> In section 2.1 the draft states: A CACHE_DIGEST frame can be sent from
> a client to a server on any stream in the “open” state. My
> understanding is that it would be enough to indicate that the frame
> should be sent on a stream that carries a request as well as when it
> should be sent.

Right after that, it says:

> ... and conveys a digest of the contents of the client’s cache for associated stream.

That probably should say "contents of the client's cache for the *origin* of the associated stream.

The other obvious design would be to put them on stream 0 and then have an explicit Origin field.

Do we anticipate C_D being sent before a stream is opened for a given origin?


>> This is important
>> because there are several mentions of origin.  For instance, the new
>> RESET flag talks about clearing digests for the "applicable origin".
>> That establishes a large point of confusion about the scope that a
>> digest is assumed to apply to; by their nature, this isn't necessarily
>> fatal, until you want to talk about RESET and COMPLETE.
>>
>> To up-level a bit on this general issue, I'd like to see a better
>> formulated description of the information that clients and servers are
>> expected to maintain.  There seem to be multiple objects that are
>> stored, but I'm not sure what scope they are maintained in; is the
>> scope an origin?
>
> Yes.

+1. We should rewrite to clarify this.See also <https://github.com/httpwg/http-extensions/issues/216>.


>> Assuming a particular scope, are there two objects, or four?  That is,
>> is there could be four stores:
>>
>> 1. assumed fresh by URL
>> 2. assumed fresh by URL and etag
>> 3. assumed stale by URL
>> 4. assumed stale by URL and etag
>>
>> Or are 1+2 and 3+4 combined?  The definition of RESET implies that all
>> four stores are cleared.  The definition of COMPLETE implies that only
>> 1+2 or 3+4 are affected.
>
> There are four objects, which are grouped into two.
>
> Your reading is correct that RESET flag clears all of them, and that
> the COMPLETE flag implies to either 1+2 or 3+4.

+1


>> The draft doesn't talk about URL normalization.  That is a risk to the
>> feasibility of this; fail to do something sensible here and you could
>> get a lot of spurious misses.  Given that it is just an optimization,
>> we don't need 100% agreement for this to work, but saying something is
>> probably wise.  We can probably get away with making some lame
>> recommendations about how to encode a URL.  Here's a rough cut of
>> something that didn't make the draft deadline this time around:
>> https://martinthomson.github.io/http-miser/#rfc.section.2.1
>
> Thank you for the suggestion.
>
> I have a mixed feeling about this; in section 2.2.1 the current draft
> says "Effective Request URI of a cached response" should be used.
>
> So the cache digest would work without URL normalization if both of
> the following conditions are met:
> * if the client caches a response NOT normalizing the request URI into some form
> * if the server looks up the cache digest using a URI that a client would send
>
> For example, if a HTML with a script tag specifying /%7Efoo/script.js
> is served to the client, then the draft excepts the client to use that
> value (including %7E) to be used as the key, and that the server
> should test the digest using the exact same form.
>
> The pros of this approach would be that it would be easier to
> implement. The cons is that it would be fragile due to no
> normalization.
>
> And I agree with you that in case we go without normalization we
> should warn the users that the paths should be same in terms of
> octets.

My inclination would be to do no more normalisation than caches are normally doing, at least to start with.


>> I don't see any value in COMPLETE.  Even if we accept that there is
>> only one connection from this client to this server, the benefit in
>> knowing that the digest is complete is marginal at best.  Is there
>> just one extra resource missing, or thousands.  As such, it changes
>> the probability by some unknown quantity, which isn't actionable.
>
> I do find value in COMPLETE.
>
> For a server with the primary goal to minimize B/W consumption and the
> second goal to minimize latency, it is wise to push responses that are
> known NOT to be cached by a client.
>
> That's what the COMPLETE flag can be used for. Without the flag, a
> server can only tell if a response is already cached or _might_ by
> cached.
>
>> Can a frame with the RESET flag include a digest as well?
>
> Yes. That is the intention of the draft.
>
>> N and P could fit into a single octet.  Since you are into the flags
>> on the frame anyway, reduce N and P to 4 bits apiece and use flags to
>> fill the upper bits as needed.  But I don't think that they will be
>> needed.  At the point that you have more than 2^16 entries in the
>> digest, you are probably not going to want to use this.  Even with a
>> tiny P=3 - which is too high a false positive probability to be useful
>> - with N=2^16 you still need 32K to send the digest.  You could safely
>> increase the floor for P before you might need or want higher bits
>> (and make the minimum higher than 2^0, which is probably too high a
>> false-positive probability in any case).
>
> I would argue that P=1 would still be useful in some cases. For
> example if 10 resources are missing on the client side, it would mean
> that a server can detect 5 of them missing and push them in case P=1
> is used.
>
> And considering the fact that we would nevertheless have read-n-bits
> operation while decoding the Golomb-encoded values, I do not see
> strong reason to squash N and P into a single octet.

+1

>> Is the calculation of N really round(log2(urls.length)).  I thought
>> that you would want to use ceil() instead.  Is the word "up" missing
>> from step 1 in Section 2.1.1?
>
> That draft has intentionally been written to use round.
>
> The numbers that matter when using Golomb-coded sets are:
>
>    P: divisor used to divide bits that are unary-encoded and binary-encode
>    N*P: range of the encoded values
>
> For efficiency, both P and N*P must be powers of 2.
>
> To encode effectively, the real probability should be near to the
> value of P. And that in turn means that N*P should be
> round_to_power_of_two(urls.length * P) rather than
> round_up_to_power_of_two(urls.length * P).
>
>> The draft never actually mentions that it uses [Rice-]Golomb-coding
>> until the appendix.  Including a reference to the Rice paper would
>> help people implementing this understand where this comes from, as
>> well as leading them to being able to find the relevant research.
>> (nit: Spelling Golomb correctly would help here.)
>
> I agree. Thank you for noticing that!

+1, see <https://github.com/httpwg/http-extensions/issues/230>

>
> --
> Kazuho Oku

--
Mark Nottingham   https://www.mnot.net/