cache freshness / age calcs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

cache freshness / age calcs

Adrien de Croy

Hi all

I've been poring through draft-ietf-httpbis-p6-cache-07 trying to figure
out what to do in a particular case.

This case is where the origin server has a clock that is way out of
whack, and specifies expiry times close to the level of error.  No Age
header, so presumably not from some intermediary cache.

The documentation for calculation of the apparent age states:

"A response's age can be calculated in two entirely independent ways:

   1.  now minus date_value, if the local clock is reasonably well
       synchronized to the origin server's clock.  If the result is
       negative, the result is replaced by zero.

   2.  age_value, if all of the caches along the response path implement
       HTTP/1.1."

The obvious question being what if the clocks are not reasonably well
synchronised, and 2 doesn't hold either (either not all HTTP/1.1 or no
Age header)?  How can you even tell if a clock is synchronised or not?  
In that case does the spec not attempt to specify how to calculate age?

In my problem case, using now - date_value places a huge skew on the
time at which the stored response can no longer be considered fresh.  It
significantly reduces the effectiveness of the cache.  Obviously the
resolution is to get the server admin to fix their clock, but that's an
uphill battle.

I would presume that if you get a response with a Date header, and no
Age header from a 1.1 O-S, then you should presume that the Date header
is an indication of the local clock at that server (ignoring RTT and
time to generate).  If there is an age header, you should consider the
Date header to be some date in the past from which the Age was
subsequently calculated (e.g. caches don't update Date headers to their
own value when serving from cache - or do they?).   Is this why the age
is calculated as the larger of the received age value vs the apparent
age rather than the sum of the 2?

If a response didn't come from a cache, it cannot have been generated
before it was requested.  Therefore calculating an apparent age is
misguided if there is no age header.  A conservative view of the age of
a response not from cache should therefore be bounded by the age of the
request, rather than the difference in clocks (which can be large).

What do browsers commonly do in this case?

Regards

Adrien

--
Adrien de Croy - WinGate Proxy Server - http://www.wingate.com


Reply | Threaded
Open this post in threaded view
|

Re: cache freshness / age calcs

Bugzilla from nicolas.alvarez@gmail.com
Adrien de Croy wrote:
> This case is where the origin server has a clock that is way out of
> whack, and specifies expiry times close to the level of error.  No Age
> header, so presumably not from some intermediary cache.

Thanks to our stupid government, expect to have that kind of problems from
the user agent side soon.

This month, Argentina switch to DST, once again giving the exact switch date
one or two days in advance. Some provinces will refuse; it's still
uncertain which, they're still discussing it.

Last year, there was *no* update for Microsoft Windows to fix the timezone
database for Argentina; Microsoft couldn't develop, test, and deploy an
update out in time. And even if they did, Windows has one timezone for the
entire country, so there is just no way it would automatically work for all
provinces.

The tzdata database used in most Unix systems has always had one timezone
per province. But the tzdata maintainers will have to (again!) be watching
news and the congress website frequently to get an update as fast as
possible once the final law is out. Last year one province gave the final
notice saying they wouldn't switch a few hours in advance, keeping the
tzdata maintainers quite busy... And then more time will pass till Linux
distros package the update, and users upgrade.

Guess what most people will do, especially those who use Windows (vast
majority) and who probably won't get an update? Change the *clock* manually
(not the timezone), screwing up the computer's notion of what UTC is.


And I'm sure HTTP caches and cookie expiration timestamps would be some of
the many things that will have problems. Do major browsers currently adjust
cookie expiration timestamps for differences between local time and server
time? And would they adjust for differences as large as *two* hours?



Reply | Threaded
Open this post in threaded view
|

Re: cache freshness / age calcs

Mark Nottingham-4
In reply to this post by Adrien de Croy
Hey,

I've been meaning to raise an issue along these lines recently as  
well, so your message is very well-timed.

WRT "well-synchronised" -- A lot of the text in 2616's caching section  
-- especially around age calculation -- was explanatory, not spec  
text. Most of it has been removed in p6-07, but some still remains,  
and my take is that this is one example of this.

As you've noticed, the algorithm is very conservative; i.e., it will  
always err on the side of considering something older than it actually  
is. This is annoying when the freshness lifetime is short (or the skew  
very large), as you point out making things that could have been  
cacheable uncachable.

Since transit time is already accounted for here (related issue: <http://tools.ietf.org/wg/httpbis/trac/ticket/29 
 >), it seems like the wild card that you have to deal with is HTTP/
1.0 caches not emitting Age headers.

Just having a 1.1 origin isn't necessarily good enough, because there  
could (in theory) be a 1.0 cache interposed somewhere along the way;  
remember, they aren't required to emit Age nor Via, and while the next  
hop towards the UA *should* record it in the Via header (presuming  
it's 1.1), as we know not everyone sends them.

I'm wondering if it's good enough to specify that if:
    - your next hop is a proxy AND it sends a Via header that's all  
1.1, OR
    - your next hop is the origin, and if the Via header is present  
it's all 1.1
you can calculate age using the age header without trying to account  
for hidden 1.0 caches in the chain (using Date).

This does have the potential to mess up in a few circumstances, but  
AFAIK 1.0 caches will produce Age anyway; e.g. Squid. Most of the  
other caches deployed are going to be either accelerators/CDNs (which  
already do unholy things with Date and Age; see Edith Cohen's paper  
from a while back), or interception caches, which are responsible for  
any problems they cause anyway.

Thoughts? An alternative would be to reduce the Date portion of the  
calculation to a SHOULD-level requirement.






On 12/10/2009, at 2:12 PM, Adrien de Croy wrote:

>
> Hi all
>
> I've been poring through draft-ietf-httpbis-p6-cache-07 trying to  
> figure out what to do in a particular case.
>
> This case is where the origin server has a clock that is way out of  
> whack, and specifies expiry times close to the level of error.  No  
> Age header, so presumably not from some intermediary cache.
>
> The documentation for calculation of the apparent age states:
>
> "A response's age can be calculated in two entirely independent ways:
>
>  1.  now minus date_value, if the local clock is reasonably well
>      synchronized to the origin server's clock.  If the result is
>      negative, the result is replaced by zero.
>
>  2.  age_value, if all of the caches along the response path implement
>      HTTP/1.1."
>
> The obvious question being what if the clocks are not reasonably  
> well synchronised, and 2 doesn't hold either (either not all HTTP/
> 1.1 or no Age header)?  How can you even tell if a clock is  
> synchronised or not?  In that case does the spec not attempt to  
> specify how to calculate age?
>
> In my problem case, using now - date_value places a huge skew on the  
> time at which the stored response can no longer be considered  
> fresh.  It significantly reduces the effectiveness of the cache.  
> Obviously the resolution is to get the server admin to fix their  
> clock, but that's an uphill battle.
>
> I would presume that if you get a response with a Date header, and  
> no Age header from a 1.1 O-S, then you should presume that the Date  
> header is an indication of the local clock at that server (ignoring  
> RTT and time to generate).  If there is an age header, you should  
> consider the Date header to be some date in the past from which the  
> Age was subsequently calculated (e.g. caches don't update Date  
> headers to their own value when serving from cache - or do they?).    
> Is this why the age is calculated as the larger of the received age  
> value vs the apparent age rather than the sum of the 2?
>
> If a response didn't come from a cache, it cannot have been  
> generated before it was requested.  Therefore calculating an  
> apparent age is misguided if there is no age header.  A conservative  
> view of the age of a response not from cache should therefore be  
> bounded by the age of the request, rather than the difference in  
> clocks (which can be large).
>
> What do browsers commonly do in this case?
>
> Regards
>
> Adrien
>
> --
> Adrien de Croy - WinGate Proxy Server - http://www.wingate.com
>
>

--
Mark Nottingham       [hidden email]