Firefox addin to replace 404 pages with archived pages from wayback machine

Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Firefox addin to replace 404 pages with archived pages from wayback machine

Noah Mendelsohn
See [1].

I thought this might be of some interest to the TAG. Seems to me that this
is OK insofar as the addin is a modification to a user agent, and is
presumably activated only with the user's consent.

Nonethess, this seems to embody a slightly skewed view of Web protocols: if
I as a URI authority serve a new or updated page, your browser will do what
I intend and show the user that new content. If I delete a page, the
browser will not honor that deletion, but will show content anyway. This
seems to me just a bit of a slippery slope. A 404 is just as meaningful in
Web protocols (no such page) as a 200 IMO.

I'm not proposing that the TAG do anything about this or devote significant
time to it right now, just pointing it out in case it's of interest.

Thank you.

Noah


[1]
http://gadgets.ndtv.com/apps/news/firefox-will-try-to-show-you-saved-archive-of-a-page-instead-of-404-error-869482

Reply | Threaded
Open this post in threaded view
|

Re: Firefox addin to replace 404 pages with archived pages from wayback machine

Brian Kardell


On Fri, Aug 5, 2016 at 12:23 PM, Noah Mendelsohn <[hidden email]> wrote:
See [1].

I thought this might be of some interest to the TAG. Seems to me that this is OK insofar as the addin is a modification to a user agent, and is presumably activated only with the user's consent.

Nonethess, this seems to embody a slightly skewed view of Web protocols: if I as a URI authority serve a new or updated page, your browser will do what I intend and show the user that new content. If I delete a page, the browser will not honor that deletion, but will show content anyway. This seems to me just a bit of a slippery slope. A 404 is just as meaningful in Web protocols (no such page) as a 200 IMO.

I'm not proposing that the TAG do anything about this or devote significant time to it right now, just pointing it out in case it's of interest.

Thank you.

Noah


[1] http://gadgets.ndtv.com/apps/news/firefox-will-try-to-show-you-saved-archive-of-a-page-instead-of-404-error-869482



Noah,

The UA actually shows a prompt when it encounters a 404 if there is a version in wayback[1].  It seems that both wayback and the UA are acting entirely within their appropriate boundaries to me, does it not? Your deletion is indeed honored, but if someone archived that it is indeed archived.  If you setup your server not to be, it shouldn't be (though it really still could be).  If the UA offers help in finding that, that seems really not a lot different than all sorts of a lot of browser features (like a search toolbar).  Am I misunderstanding something?


Reply | Threaded
Open this post in threaded view
|

Re: Firefox addin to replace 404 pages with archived pages from wayback machine

Herbert van de Sompel
Hi all,

I would like to take the opportunity to mention a few things with this regard:

(*) The plug-in uses the Internet Archive's Wayback collection to find old pages, routinely called Mementos in the web archiving community. Note that there are many more web archives around the world and that an aggregator service exists; see http://timtravel.mementoweb.org. This aggregator also supports the Memento "Time Travel for the Web" protocol (RFC7089) and exposes APIs that allow looking for Mementos across many archives; see http://timetravel.mementoweb.org/guide/api/ . In order to get broader web archive coverage, the plug-in could use this aggregator service.

(*) In the Hiberlink project, we studied approaches to ameliorate the link rot problem. One outcome was the notion of Robust Links, basically an approach to decorate links in HTML as a means to allow revisiting linked content in case a link has died or when the linked content had changed. The link decoration uses HTML5 data- attributes, which can be made actionable using simple JavaScript. 
- For a motivation regarding Robust Links, see http://robustlinks.mementoweb.org/about/
- For the Link Decoration spec, see http://robustlinks.mementoweb.org/spec/
- For an example paper that shows Robust Links at work, see http://dx.doi.org/10.1045/november2015-vandesompel

Cheers

Herbert

On Fri, Aug 5, 2016 at 10:56 AM, Brian Kardell <[hidden email]> wrote:


On Fri, Aug 5, 2016 at 12:23 PM, Noah Mendelsohn <[hidden email]> wrote:
See [1].

I thought this might be of some interest to the TAG. Seems to me that this is OK insofar as the addin is a modification to a user agent, and is presumably activated only with the user's consent.

Nonethess, this seems to embody a slightly skewed view of Web protocols: if I as a URI authority serve a new or updated page, your browser will do what I intend and show the user that new content. If I delete a page, the browser will not honor that deletion, but will show content anyway. This seems to me just a bit of a slippery slope. A 404 is just as meaningful in Web protocols (no such page) as a 200 IMO.

I'm not proposing that the TAG do anything about this or devote significant time to it right now, just pointing it out in case it's of interest.

Thank you.

Noah


[1] http://gadgets.ndtv.com/apps/news/firefox-will-try-to-show-you-saved-archive-of-a-page-instead-of-404-error-869482



Noah,

The UA actually shows a prompt when it encounters a 404 if there is a version in wayback[1].  It seems that both wayback and the UA are acting entirely within their appropriate boundaries to me, does it not? Your deletion is indeed honored, but if someone archived that it is indeed archived.  If you setup your server not to be, it shouldn't be (though it really still could be).  If the UA offers help in finding that, that seems really not a lot different than all sorts of a lot of browser features (like a search toolbar).  Am I misunderstanding something?





--
Herbert Van de Sompel
Digital Library Research & Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/
http://orcid.org/0000-0002-0715-6126

==
Reply | Threaded
Open this post in threaded view
|

Re: Firefox addin to replace 404 pages with archived pages from wayback machine

Felix Miata-2
In reply to this post by Noah Mendelsohn
Noah Mendelsohn composed on 2016-08-05 12:23 (UTC-0400):
> See [1].

> I thought this might be of some interest to the TAG. Seems to me that this
> is OK insofar as the addin is a modification to a user agent, and is
> presumably activated only with the user's consent.
 
> Nonethess, this seems to embody a slightly skewed view of Web protocols: if
> I as a URI authority serve a new or updated page, your browser will do what
> I intend and show the user that new content. If I delete a page, the

Though you are within your rights to do so, nevertheless, you shouldn't cause
URIs to disappear:
https://www.w3.org/Provider/Style/URI.html

Those who *need* to be able to make URIs disappear, need to prevent them from
being archived in the first place.

> browser will not honor that deletion, but will show content anyway. This
> seems to me just a bit of a slippery slope. A 404 is just as meaningful in
> Web protocols (no such page) as a 200 IMO.
 
> I'm not proposing that the TAG do anything about this or devote significant
> time to it right now, just pointing it out in case it's of interest.
 
> Thank you.
 
> Noah

> [1]
> http://gadgets.ndtv.com/apps/news/firefox-will-try-to-show-you-saved-archive-of-a-page-instead-of-404-error-869482

"No More 404s" that page describes doesn't even show up in addon search results
in Firefox ESR 45.

There are alternatives, e.g.:
https://addons.mozilla.org/en-US/firefox/addon/open-in-wayback-machine/
--
"The wise are known for their understanding, and pleasant
words are persuasive." Proverbs 16:21 (New Living Translation)

 Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata  ***  http://fm.no-ip.com/

Reply | Threaded
Open this post in threaded view
|

Re: Firefox addin to replace 404 pages with archived pages from wayback machine

Patrick McManus

On Fri, Aug 5, 2016 at 3:47 PM, Felix Miata <[hidden email]> wrote:
"No More 404s" that page describes doesn't even show up in addon search results
in Firefox ESR 45.


No-More-404s is part of test pilot - https://testpilot.firefox.com/

Test pilot is sort of an aggregator addon for features that are being evaluated and evolved for possible inclusion in mainline firefox at a later time. Early adopter stuff.

-Patrick

Reply | Threaded
Open this post in threaded view
|

Re: Firefox addin to replace 404 pages with archived pages from wayback machine

Herbert van de Sompel
In reply to this post by Noah Mendelsohn
On Fri, Aug 5, 2016 at 10:23 AM, Noah Mendelsohn <[hidden email]> wrote:
See [1].

I thought this might be of some interest to the TAG. Seems to me that this is OK insofar as the addin is a modification to a user agent, and is presumably activated only with the user's consent.

Nonethess, this seems to embody a slightly skewed view of Web protocols: if I as a URI authority serve a new or updated page, your browser will do what I intend and show the user that new content. If I delete a page, the browser will not honor that deletion, but will show content anyway. This seems to me just a bit of a slippery slope. A 404 is just as meaningful in Web protocols (no such page) as a 200 IMO.


The Memento Extension for Chrome (http://bit.ly/memento-for-chrome) handles 404 and much more. It covers archived resources in many web archives, see http://timetravel.mementoweb.org/about/. And its behavior is completely under control of the user because it works by right-clicking links or pages. 

Right clicking yields a Memento menu with several options:
* Get near current date: Retrieves the most recently archived resource, and hence can be used to address 404. 
* Get near saved date: retrieves an archived resource with archival datetime closest to the date set in a calendar picker
* Get near memento-datetime: if the page is itself an archived resource in a web archive, retrieves an archived resource of a linked resource with archival datetime closest to the date expressed in the page's Memento-Datetime header. 
* Get near page date: retrieves an archived resource with a datetime closest to the page datetime if it is provided in a machine-readable manner
* Get near link date: retrieves an archived resource with a datetime closest to date expressed in the data-versiondate link decoration attribute, as defined in http://robustlinks.mementoweb.org/spec/ 

Note that the Memento protocol is not only for web archives. It can also be supported by version control systems, wikis, etc. For example, the W3C wiki and all versions of the W3C specs are accessible using Memento. Using, e.g. Memento for Chrome, one can seamlessly navigate to the version of a wiki page or W3C spec as it was at a certain date. And, of course to versions of linked resources, using right-click as described above. Using the Time Travel API, see http://timetravel.mementoweb.org/guide/api/, one can use a URI of this form to get to a version of a W3C spec as it existed at a given date: http://timetravel.mementoweb.org/memento/20031112/https://www.w3.org/TR/webarch/

Cheers

Herbert


 
I'm not proposing that the TAG do anything about this or devote significant time to it right now, just pointing it out in case it's of interest.

Thank you.

Noah


[1] http://gadgets.ndtv.com/apps/news/firefox-will-try-to-show-you-saved-archive-of-a-page-instead-of-404-error-869482




--
Herbert Van de Sompel
Digital Library Research & Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/
http://orcid.org/0000-0002-0715-6126

==
Reply | Threaded
Open this post in threaded view
|

Re: Firefox addin to replace 404 pages with archived pages from wayback machine

Noah Mendelsohn
Just to be clear, I said in my initial post that as long as the addin or
feature was running at the user's request with clear indication of which
content is from 404 pages, I don't think there's a violation of Web arch.

My point was that: whether or not deleting pages (I.e. taking URIs that
were 200 or 3XX and making them 404) is something we discourage on policy
grounds, it's a supported and important part of Web arch. When I make a
page 404 I usually have good reasons, and in general I expect users to see
the page I return with the 404. I believe that's that the pertinent
specifications call for, and should remain the default behavior of user agents.

Thank you.

Noah

On 8/5/2016 4:58 PM, Herbert Van de Sompel wrote:

> On Fri, Aug 5, 2016 at 10:23 AM, Noah Mendelsohn <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     See [1].
>
>     I thought this might be of some interest to the TAG. Seems to me that
>     this is OK insofar as the addin is a modification to a user agent, and
>     is presumably activated only with the user's consent.
>
>     Nonethess, this seems to embody a slightly skewed view of Web
>     protocols: if I as a URI authority serve a new or updated page, your
>     browser will do what I intend and show the user that new content. If I
>     delete a page, the browser will not honor that deletion, but will show
>     content anyway. This seems to me just a bit of a slippery slope. A 404
>     is just as meaningful in Web protocols (no such page) as a 200 IMO.
>
>
> The Memento Extension for Chrome (http://bit.ly/memento-for-chrome) handles
> 404 and much more. It covers archived resources in many web archives, see
> http://timetravel.mementoweb.org/about/. And its behavior is completely
> under control of the user because it works by right-clicking links or pages.
>
> Right clicking yields a Memento menu with several options:
> * Get near current date: Retrieves the most recently archived resource, and
> hence can be used to address 404.
> * Get near saved date: retrieves an archived resource with archival
> datetime closest to the date set in a calendar picker
> * Get near memento-datetime: if the page is itself an archived resource in
> a web archive, retrieves an archived resource of a linked resource with
> archival datetime closest to the date expressed in the page's
> Memento-Datetime header.
> * Get near page date: retrieves an archived resource with a datetime
> closest to the page datetime if it is provided in a machine-readable manner
> * Get near link date: retrieves an archived resource with a datetime
> closest to date expressed in the data-versiondate link decoration
> attribute, as defined in http://robustlinks.mementoweb.org/spec/
>
> Note that the Memento protocol is not only for web archives. It can also be
> supported by version control systems, wikis, etc. For example, the W3C wiki
> and all versions of the W3C specs are accessible using Memento. Using, e.g.
> Memento for Chrome, one can seamlessly navigate to the version of a wiki
> page or W3C spec as it was at a certain date. And, of course to versions of
> linked resources, using right-click as described above. Using the Time
> Travel API, see http://timetravel.mementoweb.org/guide/api/, one can use a
> URI of this form to get to a version of a W3C spec as it existed at a given
> date:
> http://timetravel.mementoweb.org/memento/20031112/https://www.w3.org/TR/webarch/
>
> Cheers
>
> Herbert
>
>
>
>
>     I'm not proposing that the TAG do anything about this or devote
>     significant time to it right now, just pointing it out in case it's of
>     interest.
>
>     Thank you.
>
>     Noah
>
>
>     [1]
>     http://gadgets.ndtv.com/apps/news/firefox-will-try-to-show-you-saved-archive-of-a-page-instead-of-404-error-869482
>     <http://gadgets.ndtv.com/apps/news/firefox-will-try-to-show-you-saved-archive-of-a-page-instead-of-404-error-869482>
>
>
>
>
> --
> Herbert Van de Sompel
> Digital Library Research & Prototyping
> Los Alamos National Laboratory, Research Library
> http://public.lanl.gov/herbertv/
> http://orcid.org/0000-0002-0715-6126
>
> ==

Reply | Threaded
Open this post in threaded view
|

Re: Firefox addin to replace 404 pages with archived pages from wayback machine

Melvin Carvalho


On 8 August 2016 at 17:02, Noah Mendelsohn <[hidden email]> wrote:
Just to be clear, I said in my initial post that as long as the addin or feature was running at the user's request with clear indication of which content is from 404 pages, I don't think there's a violation of Web arch.

My point was that: whether or not deleting pages (I.e. taking URIs that were 200 or 3XX and making them 404) is something we discourage on policy grounds, it's a supported and important part of Web arch. When I make a page 404 I usually have good reasons, and in general I expect users to see the page I return with the 404. I believe that's that the pertinent specifications call for, and should remain the default behavior of user agents.

Let's say a user has bookmarked a page for reference.  And that page has moved, but is yet archived.  I can see value for a user to see the material that she had seen before, from an archived version.

4xx is indicated to the user agent, and I think that fundamentally in web arch the user is the ultimate curator of the content presented.
 

Thank you.

Noah


On 8/5/2016 4:58 PM, Herbert Van de Sompel wrote:
On Fri, Aug 5, 2016 at 10:23 AM, Noah Mendelsohn <[hidden email]
<mailto:[hidden email]>> wrote:

    See [1].

    I thought this might be of some interest to the TAG. Seems to me that
    this is OK insofar as the addin is a modification to a user agent, and
    is presumably activated only with the user's consent.

    Nonethess, this seems to embody a slightly skewed view of Web
    protocols: if I as a URI authority serve a new or updated page, your
    browser will do what I intend and show the user that new content. If I
    delete a page, the browser will not honor that deletion, but will show
    content anyway. This seems to me just a bit of a slippery slope. A 404
    is just as meaningful in Web protocols (no such page) as a 200 IMO.


The Memento Extension for Chrome (http://bit.ly/memento-for-chrome) handles
404 and much more. It covers archived resources in many web archives, see
http://timetravel.mementoweb.org/about/. And its behavior is completely
under control of the user because it works by right-clicking links or pages.

Right clicking yields a Memento menu with several options:
* Get near current date: Retrieves the most recently archived resource, and
hence can be used to address 404.
* Get near saved date: retrieves an archived resource with archival
datetime closest to the date set in a calendar picker
* Get near memento-datetime: if the page is itself an archived resource in
a web archive, retrieves an archived resource of a linked resource with
archival datetime closest to the date expressed in the page's
Memento-Datetime header.
* Get near page date: retrieves an archived resource with a datetime
closest to the page datetime if it is provided in a machine-readable manner
* Get near link date: retrieves an archived resource with a datetime
closest to date expressed in the data-versiondate link decoration
attribute, as defined in http://robustlinks.mementoweb.org/spec/

Note that the Memento protocol is not only for web archives. It can also be
supported by version control systems, wikis, etc. For example, the W3C wiki
and all versions of the W3C specs are accessible using Memento. Using, e.g.
Memento for Chrome, one can seamlessly navigate to the version of a wiki
page or W3C spec as it was at a certain date. And, of course to versions of
linked resources, using right-click as described above. Using the Time
Travel API, see http://timetravel.mementoweb.org/guide/api/, one can use a
URI of this form to get to a version of a W3C spec as it existed at a given
date:
http://timetravel.mementoweb.org/memento/20031112/https://www.w3.org/TR/webarch/

Cheers

Herbert




    I'm not proposing that the TAG do anything about this or devote
    significant time to it right now, just pointing it out in case it's of
    interest.

    Thank you.

    Noah


    [1]
    http://gadgets.ndtv.com/apps/news/firefox-will-try-to-show-you-saved-archive-of-a-page-instead-of-404-error-869482
    <http://gadgets.ndtv.com/apps/news/firefox-will-try-to-show-you-saved-archive-of-a-page-instead-of-404-error-869482>




--
Herbert Van de Sompel
Digital Library Research & Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/
http://orcid.org/0000-0002-0715-6126

==


Reply | Threaded
Open this post in threaded view
|

Re: Firefox addin to replace 404 pages with archived pages from wayback machine

Noah Mendelsohn

On 8/8/2016 11:58 PM, Melvin Carvalho wrote:
> Let's say a user has bookmarked a page for reference.  And that page has
> moved, but is yet archived.  I can see value for a user to see the material
> that she had seen before, from an archived version.

Your example is of a page you describe as "moved". Shouldn't the server
return 301 for that? (Or possibly 303 - See other)?

> 4xx is indicated to the user agent, and I think that fundamentally in web arch the user is the ultimate curator of the content presented.

Yes, as I acknowledged in my post, but from a protocol point of view 404
means "doesn't exist" (Not found). If there's a real need for "Not found,
but please offer users an old version if available" I would think a new 40X
code would be the more architecturally robust way of giving the server
control.

Noah

Reply | Threaded
Open this post in threaded view
|

Re: Firefox addin to replace 404 pages with archived pages from wayback machine

Marcos Caceres-4
On August 17, 2016 at 12:31:40 AM, Noah Mendelsohn ([hidden email]) wrote:
> > 4xx is indicated to the user agent, and I think that fundamentally in web arch the user
> is the ultimate curator of the content presented.
>
> Yes, as I acknowledged in my post, but from a protocol point of view 404
> means "doesn't exist" (Not found). If there's a real need for "Not found,
> but please offer users an old version if available" I would think a new 40X
> code would be the more architecturally robust way of giving the server
> control.


Perhaps, but that would take years to roll out - also, if it's moved,
then 301 or 307 (or 200 OK, with "we trashed this page, gasp!"). If a
user wants to find a 404 page, then they can manually go to a Web
Archive to find the page.

The Firefox extension is just helping with that step: the point is to
revive the page despite (in spite of?) what the server wants - the
extension and the user don't care and are in fact, directly trying to
subvert the server on purpose (by design).

Thus, it seems like a waste of time to work around this with new
codes, when the old ones are suitable - and it would just start a
small arms race against web archives and extensions that facilitate
finding lost content.

Reply | Threaded
Open this post in threaded view
|

Re: Firefox addin to replace 404 pages with archived pages from wayback machine

Melvin Carvalho
In reply to this post by Noah Mendelsohn


On 16 August 2016 at 16:26, Noah Mendelsohn <[hidden email]> wrote:

On 8/8/2016 11:58 PM, Melvin Carvalho wrote:
Let's say a user has bookmarked a page for reference.  And that page has
moved, but is yet archived.  I can see value for a user to see the material
that she had seen before, from an archived version.

Your example is of a page you describe as "moved". Shouldn't the server return 301 for that? (Or possibly 303 - See other)?

I think 301 or 303 would be considered a best practice.  And servers are sometimes incentivized to do this through search engine traffic, I suppose.

But it may be the case that the server owner just doesnt have time to do that.

Im trying to figure out which web architectural principle this is most applicable to.  In my mind Im thinking about the decentralized property of the web.  The server acts as definitive but the user at their description, might want to over ride that.  Perhaps this is indeed a slippery slope.  Im curious as to where you think it might end, what spung to mind was some kind of spaghetti of status codes negotiated between server and client.

I like the idea of the user being able to override or augment the decisions made on the server side, with those on the client.  Ive not fully thought through the possible downside of this, and would be interested to hear if there are thoughts on this.
 

4xx is indicated to the user agent, and I think that fundamentally in web arch the user is the ultimate curator of the content presented.

Yes, as I acknowledged in my post, but from a protocol point of view 404 means "doesn't exist" (Not found). If there's a real need for "Not found, but please offer users an old version if available" I would think a new 40X code would be the more architecturally robust way of giving the server control.

Noah