URI length statistics "in the wild"?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

URI length statistics "in the wild"?

Dan Brickley-2
Hi folks

Some topics seem peculiarly ill-suited for Web searches - hence this
mail. I am looking for data on typical lengths of URIs, in particular
as they're used in the public Web. Breakdown by scheme would be nice,
but anything would be a start.

Context for this enquiry is an investigation into the use of
mechanisms like QR Codes and also audio encodings (eg.
http://github.com/diva/digital-voices/ ) as a way of passing URIs
around, eg. to a smartphone from a media centre. I'd like to know
what's out there, what's feasible to encode using these techniques,
and as well as what the official limits are. In
http://tools.ietf.org/html/rfc3986 I don't see much about URI length
except in the reg-name portion.

So - what are the official limits? what are the practical limits (eg.
imposed by common implementations)? Can we say that 99.9% of URIs in
the public Web are shorter than ...X chars?

Ideally barcode and audio encodings wouldn't impose arbitrary limits;
however it would be good to document what's folk can expect to
encounter, if only for sensible testing of error correction, reader
accuracy etc.

Thanks for any pointers,

Dan

Reply | Threaded
Open this post in threaded view
|

Re: URI length statistics "in the wild"?

Gannon Dick
Hi Dan,

I feel your pain.  My Web Mail fetched your missive with a GET in (a downright pathological) 515 Bytes.

If I might make a suggestion though: Apache may have some hints to the "standards" in their documentation on how to parse server logs.
--Gannon

--- On Thu, 4/8/10, Dan Brickley <[hidden email]> wrote:

> From: Dan Brickley <[hidden email]>
> Subject: URI length statistics "in the wild"?
> To: [hidden email]
> Date: Thursday, April 8, 2010, 5:06 AM
> Hi folks
>
> Some topics seem peculiarly ill-suited for Web searches -
> hence this
> mail. I am looking for data on typical lengths of URIs, in
> particular
> as they're used in the public Web. Breakdown by scheme
> would be nice,
> but anything would be a start.
>
> Context for this enquiry is an investigation into the use
> of
> mechanisms like QR Codes and also audio encodings (eg.
> http://github.com/diva/digital-voices/ )
> as a way of passing URIs
> around, eg. to a smartphone from a media centre. I'd like
> to know
> what's out there, what's feasible to encode using these
> techniques,
> and as well as what the official limits are. In
> http://tools.ietf.org/html/rfc3986 I don't see much
> about URI length
> except in the reg-name portion.
>
> So - what are the official limits? what are the practical
> limits (eg.
> imposed by common implementations)? Can we say that 99.9%
> of URIs in
> the public Web are shorter than ...X chars?
>
> Ideally barcode and audio encodings wouldn't impose
> arbitrary limits;
> however it would be good to document what's folk can expect
> to
> encounter, if only for sensible testing of error
> correction, reader
> accuracy etc.
>
> Thanks for any pointers,
>
> Dan
>
>


     

Reply | Threaded
Open this post in threaded view
|

Re: URI length statistics "in the wild"?

Erik van der Poel
In reply to this post by Dan Brickley-2
The typical length of a URL as found in HTML on the Web is around 64
bytes. I don't know what the average and median are because I bucketed
the stats in powers of 2 (i.e. ..., up to 32, up to 64, 128, etc). The
peak for these buckets was 64.

There is a sharp drop at 2048. This makes sense because MSIE's limit
in HTTP requests is 2048. Firefox and Chrome do not appear to have
limits. (I gave up trying when I reached 32k.)

MSIE's limit in URLs in HTML is 4096 characters. This is not the same
as 4096 bytes. MSIE uses UTF-16 internally. I used IDNA to find this
limit.

Good luck with your barcode/audio efforts,

Erik

On Thu, Apr 8, 2010 at 3:06 AM, Dan Brickley <[hidden email]> wrote:

> Hi folks
>
> Some topics seem peculiarly ill-suited for Web searches - hence this
> mail. I am looking for data on typical lengths of URIs, in particular
> as they're used in the public Web. Breakdown by scheme would be nice,
> but anything would be a start.
>
> Context for this enquiry is an investigation into the use of
> mechanisms like QR Codes and also audio encodings (eg.
> http://github.com/diva/digital-voices/ ) as a way of passing URIs
> around, eg. to a smartphone from a media centre. I'd like to know
> what's out there, what's feasible to encode using these techniques,
> and as well as what the official limits are. In
> http://tools.ietf.org/html/rfc3986 I don't see much about URI length
> except in the reg-name portion.
>
> So - what are the official limits? what are the practical limits (eg.
> imposed by common implementations)? Can we say that 99.9% of URIs in
> the public Web are shorter than ...X chars?
>
> Ideally barcode and audio encodings wouldn't impose arbitrary limits;
> however it would be good to document what's folk can expect to
> encounter, if only for sensible testing of error correction, reader
> accuracy etc.
>
> Thanks for any pointers,
>
> Dan
>
>

Reply | Threaded
Open this post in threaded view
|

Re: URI length statistics "in the wild"?

Bob Aman-2
In reply to this post by Dan Brickley-2
> Context for this enquiry is an investigation into the use of
> mechanisms like QR Codes

Something of a tangent:

I hesitate to suggest this, since I hate URL shorteners, but in the
context of QR codes, they don't seem too bad.  In my experience,
camera phones have much better success rates reading QR codes
containing smaller payloads, and I think that outweighs the
disadvantage of URL shorteners.  Plus I'm sure many of the people
using QR codes to transmit links are going to be interested in the
analytics that URL shorteners can provide since there won't exactly be
a referrer.

-Bob

Reply | Threaded
Open this post in threaded view
|

Re: URI length statistics "in the wild"?

Dan Brickley-2
(thanks to Gannon and Erik too for replies)

On Thu, Apr 8, 2010 at 5:03 PM, Bob Aman <[hidden email]> wrote:

>> Context for this enquiry is an investigation into the use of
>> mechanisms like QR Codes
>
> Something of a tangent:
>
> I hesitate to suggest this, since I hate URL shorteners, but in the
> context of QR codes, they don't seem too bad.  In my experience,
> camera phones have much better success rates reading QR codes
> containing smaller payloads, and I think that outweighs the
> disadvantage of URL shorteners.  Plus I'm sure many of the people
> using QR codes to transmit links are going to be interested in the
> analytics that URL shorteners can provide since there won't exactly be
> a referrer.

Yes, well if not using tinyurl.com or bit.ly.com, I think quite likely
that people encoding URIs in QR Codes will have a strong incentive to
keep them short. Most likely uses are for homepages, blogs, of people
and businesses, or lookups into databases (books, inventory etc). I
expect same likely to be true of audio encodings. So rather than have
a hard cut-off, just let the quality tail off naturally so that
above-average lengths are possible just less reliable, and that short
URIs are rewarded.

My specific interest is in stuffing links like
xmpp:[hidden email] into machine-detectable form, eg. on a TV
screen. In testing with a 3G iPhone (ie. before they improved the lens
and focussing, I'm told) I get reasonable performance, although not
great.

Examples for the curious:
http://www.flickr.com/photos/danbri/4358715185/
http://www.flickr.com/photos/danbri/4360377558/
http://www.flickr.com/photos/danbri/4401213326/
http://www.flickr.com/photos/danbri/4382103516/ (as you can see, I'm
happily sacrificing data reliability for prettyness, by including
needless eye candy in the codes using
http://lapin-bleu.net/riviera/?p=138 and the audio codec stuff has
similar aesthetic biases...).

...in this particular scenario, a media centre box is passing it's
Jabber/XMPP URI to a nearby smartphone. We had a little discussion on
the XMPP-social list about whether passing a certificate fingerprint
in the link might also be possible, but that's perhaps a bit
ambitious. To know really what makes sense I want to run some tests on
different cameras and displays, but also to gather some more general
info about typical URI lengths.

cheers,

Dan

Reply | Threaded
Open this post in threaded view
|

URIs in QR (was: URI length statistics "in the wild"?)

Erik Wilde-3
hello all.

> Yes, well if not using tinyurl.com or bit.ly.com, I think quite likely
> that people encoding URIs in QR Codes will have a strong incentive to
> keep them short. Most likely uses are for homepages, blogs, of people
> and businesses, or lookups into databases (books, inventory etc).

interestingly, goo.gl (google's URI shortener) not only provides short
URIs, you can also get the QR for any short URI by just appending .qr to
it (which simply redirects to the google chart API using QR mode):

http://twitter.com/dret/status/11797022951

i have recently become very interested in QR, and from what i've found
out so far, while QR encoding/decoding (text2QR) is well-defined, the
landscape of what to expect and how to process it is very messy. many QR
programs fail to recognize anr/or properly decode URIs, using ad-hoc
methods and/or not recognizing URIs that are not http: URIs.

since i am very interested in the mobile landscape and particularly in
the mobile web, i would like to ask if anybody is interested in starting
some activity (maybe a W3C incubator group) that would survey the
landscape of existing "standards" and tools, and maybe even come up with
some best practices for how to use QR codes in a web-friendly way.

for example, just for fun i came up with this new office door sign:

http://www.flickr.com/photos/dret/4498124087/

it is supposed to be interpreted like an email, which means it's plain
text that contains a URI (in this case a mailto: URI), and a
user-friendly agent should make that URI actionable. but is that a
reasonable assumption to make? not in the QR reader apps i am testing on
iPhone and Android, but it might help the mobile web quite a bit if
there were some best practices around how to use QR codes in a
web-friendly way. while embedding vcards in QR business cards and
similar self-contained data also would be important, the most crucial
thing to get right would be to have well-defined rules for how to deal
with URIs, both from the encoding point of view (can i use plain text
with URIs in it?) and the user agent point of view (QR scanners should
have configurable ways of how to dispatch URIs to applications).

kind regards,

erik wilde   tel:+1-510-6432253 - fax:+1-510-6425814
        [hidden email]  -  http://dret.net/netdret
        UC Berkeley - School of Information (ISchool)

Reply | Threaded
Open this post in threaded view
|

Re: URIs in QR (was: URI length statistics "in the wild"?)

Sandro Hawke
> hello all.
>
> > Yes, well if not using tinyurl.com or bit.ly.com, I think quite likely
> > that people encoding URIs in QR Codes will have a strong incentive to
> > keep them short. Most likely uses are for homepages, blogs, of people
> > and businesses, or lookups into databases (books, inventory etc).
>
> interestingly, goo.gl (google's URI shortener) not only provides short
> URIs, you can also get the QR for any short URI by just appending .qr to
> it (which simply redirects to the google chart API using QR mode):
>
> http://twitter.com/dret/status/11797022951
>
> i have recently become very interested in QR, and from what i've found
> out so far, while QR encoding/decoding (text2QR) is well-defined, the
> landscape of what to expect and how to process it is very messy. many QR
> programs fail to recognize anr/or properly decode URIs, using ad-hoc
> methods and/or not recognizing URIs that are not http: URIs.
>
> since i am very interested in the mobile landscape and particularly in
> the mobile web, i would like to ask if anybody is interested in starting
> some activity (maybe a W3C incubator group) that would survey the
> landscape of existing "standards" and tools, and maybe even come up with
> some best practices for how to use QR codes in a web-friendly way.
>
> for example, just for fun i came up with this new office door sign:
>
> http://www.flickr.com/photos/dret/4498124087/

Cute.  Worked fine using my android phone aimed at my laptop screen.  I
suppose now I'll be needing a desktop app (maybe part of the screenshot
system) that recognizes QR codes in photos and videos displayed on my
desktop.   :-)

I don't personally know anything about QR codes, and this is the first
I've heard of a need for standardization, but as a W3C staff member, my
ears perked up there.  Yes, this looks like it might well be a good fit
for W3C.  It could be an incubator, like you suggest, or, if it's pretty
clear what a solution looks like, it could be a Submission and then a
Working Group.

Again, I don't know anything about QR codes, but I'm a bit surprised the
folks who defined them haven't tackled this problem.  Perhaps this needs
a new mix of URI and QR expertise.

     -- Sandro


Reply | Threaded
Open this post in threaded view
|

Re: URIs in QR (was: URI length statistics "in the wild"?)

Dan Brickley-2
On Thu, Apr 8, 2010 at 6:28 PM, Sandro Hawke <[hidden email]> wrote:

>> hello all.
>>
>> > Yes, well if not using tinyurl.com or bit.ly.com, I think quite likely
>> > that people encoding URIs in QR Codes will have a strong incentive to
>> > keep them short. Most likely uses are for homepages, blogs, of people
>> > and businesses, or lookups into databases (books, inventory etc).
>>
>> interestingly, goo.gl (google's URI shortener) not only provides short
>> URIs, you can also get the QR for any short URI by just appending .qr to
>> it (which simply redirects to the google chart API using QR mode):
>>
>> http://twitter.com/dret/status/11797022951

Nice, hadn't seen that. In a similar vein, the BBC /programmes team
have wired up every page there to have a QR version also,
http://2d-code.co.uk/bbc-qr-code/ (also cute,
http://2d-code.co.uk/bbc-logo-in-qr-code/ ). For example
http://www.bbc.co.uk/programmes/b00lbpcy/qrcode which should get you
to http://www.bbc.co.uk/programmes/b00lbpcy and therefore to
http://www.bbc.co.uk/programmes/b00lbpcy.rdf

>> i have recently become very interested in QR, and from what i've found
>> out so far, while QR encoding/decoding (text2QR) is well-defined, the
>> landscape of what to expect and how to process it is very messy. many QR
>> programs fail to recognize anr/or properly decode URIs, using ad-hoc
>> methods and/or not recognizing URIs that are not http: URIs.

Yes, the use of URIs in QR struck me as rather ad-hoc too, relying on
heuristics too often.

>> since i am very interested in the mobile landscape and particularly in
>> the mobile web, i would like to ask if anybody is interested in starting
>> some activity (maybe a W3C incubator group) that would survey the
>> landscape of existing "standards" and tools, and maybe even come up with
>> some best practices for how to use QR codes in a web-friendly way.

I've often thought we should do this, but only recently got my hands
dirty coding (integrated a QR reader into an iPhone app).

I don't know the status of QR Codes as a spec, in terms of things like
public availability of the standards docs, royalty free status, etc.
etc. http://en.wikipedia.org/wiki/QR_Code#Standards suggests "NTT
docomo has established de facto standards for the encoding of URLs,
contact information, and several other data types" but the use of
"URL" rather than "URI" doesn't fill me with confidence that the work
is 100% done. But even having a 'state of the landscape' document
(rather than something standards-like) could be a huge help for
everyone. Although an XG is a good idea and I'd try to participate,
for now I'd settle for a fact-pooling Wiki page...

>> for example, just for fun i came up with this new office door sign:
>>
>> http://www.flickr.com/photos/dret/4498124087/
>
> Cute.  Worked fine using my android phone aimed at my laptop screen.  I
> suppose now I'll be needing a desktop app (maybe part of the screenshot
> system) that recognizes QR codes in photos and videos displayed on my
> desktop.   :-)

Nice idea :)

Have a play with http://github.com/diva/digital-voices/
http://www.youtube.com/watch?v=cjnPwV6yP6o
http://www.ics.uci.edu/~lopes/dv/dv.html too, especially the birdsong
encoding I think is quite lovely -
http://www.ics.uci.edu/~lopes/dv/BirdInfo.wav - you could have similar
software scanning audio going through your machine looking for secret
codes in radio jingles, youtube videos etc...

> I don't personally know anything about QR codes, and this is the first
> I've heard of a need for standardization, but as a W3C staff member, my
> ears perked up there.  Yes, this looks like it might well be a good fit
> for W3C.  It could be an incubator, like you suggest, or, if it's pretty
> clear what a solution looks like, it could be a Submission and then a
> Working Group.

I think there's a lot to be gained here around mobile apps, eg. to
avoid a proliferation of barcodes outside points of interest (shops,
restaurants etc) I hope we'll see a move towards a single barcode that
is for a page describing the location, and embedded RDFa to make that
description machine-readable.

Can you advise on the options when a W3C work item is heavily based
around work already standardised elsewhere? Are there formal
conventions/constraints already, or just social conventions like -
play nice and establish friendly liaison?

> Again, I don't know anything about QR codes, but I'm a bit surprised the
> folks who defined them haven't tackled this problem.  Perhaps this needs
> a new mix of URI and QR expertise.

A lot of QR use does seem to be to discover identifiers. My guess is
that heuristics are doing 80%+ of the work just fine, so there hasn't
been a strong driver to 'catch up with the paperwork' and say exactly
how this should work.

My guess is that the missing doc isn't any new standard, but more a
kind of implementor's report from the field, giving best practice
advise - eg. about the amount of data you can stuff into QR Codes and
similar and different phones' ability to handle this, about exact
detail of URI syntax, and perhaps other considerations like making
sure the is a 'mobile OK' and ideally RDF-enhanced version of anything
that is linked to. Plus perhaps conventions around distinguishing URI
codes for things (supply chain stuff, book IDs) from documents about
those things...

cheers,

Dan

Reply | Threaded
Open this post in threaded view
|

Re: URIs in QR

Erik Wilde-3
In reply to this post by Sandro Hawke
hello.

>> http://www.flickr.com/photos/dret/4498124087/
> Cute.  Worked fine using my android phone aimed at my laptop screen.  I
> suppose now I'll be needing a desktop app (maybe part of the screenshot
> system) that recognizes QR codes in photos and videos displayed on my
> desktop.   :-)

my prediction and hope is that QR support will actually become standard
in mobile platforms, so that data input via QR is supported by the
platform itself. for this to become a reality, there should be stable
and well-defined ways about what to expect and how to process it, and
currently that is not really the case outside of vendor-specific
environments.

> I don't personally know anything about QR codes, and this is the first
> I've heard of a need for standardization, but as a W3C staff member, my
> ears perked up there.  Yes, this looks like it might well be a good fit
> for W3C.  It could be an incubator, like you suggest, or, if it's pretty
> clear what a solution looks like, it could be a Submission and then a
> Working Group.

i'd go for a 1 year incubator and then take it from there. this year
could be spent with just collecting information about conventions and
implementations and coming up with recommendations about how to best act
in this existing environment. if there's interest, that could lead to a
more active role by coming up with new conventions or standards,
whatever might be the most appropriate way to proceed.

personally, i'd love to see the W3C take on a more strategic role in a
variety of areas, and the mobile web certainly is one of them. while
premature standardization might be a risky route to take, recognizing an
area and fostering cooperation in that area might be a very useful and
important first step to take.

> Again, I don't know anything about QR codes, but I'm a bit surprised the
> folks who defined them haven't tackled this problem.  Perhaps this needs
> a new mix of URI and QR expertise.

ntt docomo has done a lot of work in this area but mostly for their own
ecosystem, and a lot of that if not very well documented and probably
not even all that well-defined (maybe it is, but definitely not in an
open way that you can easily find). while japanese carriers have been
busy designing and building valuable services for a mobile ecosystem,
european and american carriers have mostly been busy trying to extract
as much money out of their customers by coming up with a countless
number of new "plans", and they still haven't caught on to the idea that
by allowing people to do more things with their phones, the mobile
ecosystem will grow very naturally, and so will their profits.

as yet another tangent: facebook recently started experimenting with QR,
and of course all facebook QRs point to facebook profile pages. so i
think we are at the brink of seeing much more QR usage, and trying to
help developers to better navigate that new landscape could be a very
useful thing to do. google sent out a bunch of QR stickers to many
businesses which of course pointed to google landing pages (and then
apparently filtered access to that URI by browser type as reported on
http://wapreview.com/blog/?p=5834, but that seems to be fixed now), so
we do see very relevant companies recognizing the value of producing
real-world entry points into the web.

cheers,

erik wilde   tel:+1-510-6432253 - fax:+1-510-6425814
        [hidden email]  -  http://dret.net/netdret
        UC Berkeley - School of Information (ISchool)

Reply | Threaded
Open this post in threaded view
|

Re: URIs in QR

Erik Wilde-3
In reply to this post by Dan Brickley-2
hello all.

> Yes, the use of URIs in QR struck me as rather ad-hoc too, relying on
> heuristics too often.

this certainly is the case, and why i haven't done any systematic study,
the few apps that i tried on mobile devices all seemed to use their own
methods and conventions, and most of them were not as robust and generic
and configurable as they should be (if they were trying to be useful as
general-purpose QR apps). i would be hard-pressed to exactly define what
"robust and generic and configurable" means, though, and it seems to me
that answering this question could be very valuable.

> I've often thought we should do this, but only recently got my hands
> dirty coding (integrated a QR reader into an iPhone app).

i still hope to see QR support popping up in mobile OS platforms, and
since there's an open one out there, if somebody is really adventurous,
that would be a very interesting exercise to go through...

> I don't know the status of QR Codes as a spec, in terms of things like
> public availability of the standards docs, royalty free status, etc.
> etc. http://en.wikipedia.org/wiki/QR_Code#Standards suggests "NTT
> docomo has established de facto standards for the encoding of URLs,
> contact information, and several other data types" but the use of
> "URL" rather than "URI" doesn't fill me with confidence that the work
> is 100% done. But even having a 'state of the landscape' document
> (rather than something standards-like) could be a huge help for
> everyone. Although an XG is a good idea and I'd try to participate,
> for now I'd settle for a fact-pooling Wiki page...

my take on this is that wikis tend to be much less useful that actual
deliverables with a well-defined set of what should be covered, and by
when. maybe we could have an QR dinner at www2010 and see how we might
best proceed with this?

> Have a play with http://github.com/diva/digital-voices/
> http://www.youtube.com/watch?v=cjnPwV6yP6o
> http://www.ics.uci.edu/~lopes/dv/dv.html too, especially the birdsong
> encoding I think is quite lovely -
> http://www.ics.uci.edu/~lopes/dv/BirdInfo.wav - you could have similar
> software scanning audio going through your machine looking for secret
> codes in radio jingles, youtube videos etc...

i did not know of this. pretty amazing. so, instead of looking at QR,
probably it would make sense to say what to best in real-world
representations of "web-friendly handles", and then QR or audio would
just be different representations. QR is ISO and thus may be a little
further along that the audio encodings, but generally speaking,
connecting the real world and the web should probably work in the same
way for both representation types.

> I think there's a lot to be gained here around mobile apps, eg. to
> avoid a proliferation of barcodes outside points of interest (shops,
> restaurants etc) I hope we'll see a move towards a single barcode that
> is for a page describing the location, and embedded RDFa to make that
> description machine-readable.

this idea will make facebook or google very happy. have a sole provider
of identity and everything goes through them. i think how web-friendly
real-world encodings will be used should not be prescribed in any way,
but we need to make sure that the web architecture parts of it work ina
  well-defined way.

> Can you advise on the options when a W3C work item is heavily based
> around work already standardised elsewhere? Are there formal
> conventions/constraints already, or just social conventions like -
> play nice and establish friendly liaison?

my take is that this is basically the same as almost all web stuff using
unicode; unicode is stable and that's all that matters. the same can be
said for QR. it's an ISO standard and that's all that matters (maybe not
quite giving IP issues, but at least from the technical perspective).

> A lot of QR use does seem to be to discover identifiers. My guess is
> that heuristics are doing 80%+ of the work just fine, so there hasn't
> been a strong driver to 'catch up with the paperwork' and say exactly
> how this should work.

all of the apps i tried work in some cases and not in others. many get
URI decoding wrong with non-ASCII URIs or even with "+" signs in them.
many don't recognize non-HTTP(S) URIs. many have no way of associating
schemes with apps. all of this is not rocket science, but it might help
the ecosystem a lot to tell developers what they're supposed to do if
they want to support not only QR scanning, but also the mobile web.

cheers,

dret.

Reply | Threaded
Open this post in threaded view
|

Re: URIs in QR (was: URI length statistics "in the wild"?)

Sampo Syreeni
In reply to this post by Erik Wilde-3
On 2010-04-08, Erik Wilde wrote:

> interestingly, goo.gl (google's URI shortener) not only provides short
> URIs, you can also get the QR for any short URI by just appending .qr
> to it (which simply redirects to the google chart API using QR mode):
> [...]

If somebody is truly interested in audio coding of Linked Addresses, I
could perhaps lend a little bit of assistance. But since that stuff goes
well into the DSP department and I'm somebody who you'd steer well off
in your critical path, interested folks should contact me privately and
not expect too much.

(Quite a lot of data can be put into audible frequencies. It's the
practical and protocol issues like coding, error correction, framing,
synch, expected propagation conditions and so on which make the problem
convoluted. With a reasonable design, a typical room and two consumer
grade mobile phones at no more than 1m apart from each other, I see no
reason why 1200-4800bps one-way contacts couldn't routinely be achieved
at .5-.1.0 second latency or so. Might require some asking around,
though, since I'm only privy to the theory and not the practice.)
--
Sampo Syreeni, aka decoy - [hidden email], http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Reply | Threaded
Open this post in threaded view
|

Re: URI length statistics "in the wild"?

Mark Nottingham-2
In reply to this post by Dan Brickley-2
Somewhat related, see <http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-09#section-4.1.2>, last paragraph.

Cheers,


On 08/04/2010, at 8:06 PM, Dan Brickley wrote:

> Hi folks
>
> Some topics seem peculiarly ill-suited for Web searches - hence this
> mail. I am looking for data on typical lengths of URIs, in particular
> as they're used in the public Web. Breakdown by scheme would be nice,
> but anything would be a start.
>
> Context for this enquiry is an investigation into the use of
> mechanisms like QR Codes and also audio encodings (eg.
> http://github.com/diva/digital-voices/ ) as a way of passing URIs
> around, eg. to a smartphone from a media centre. I'd like to know
> what's out there, what's feasible to encode using these techniques,
> and as well as what the official limits are. In
> http://tools.ietf.org/html/rfc3986 I don't see much about URI length
> except in the reg-name portion.
>
> So - what are the official limits? what are the practical limits (eg.
> imposed by common implementations)? Can we say that 99.9% of URIs in
> the public Web are shorter than ...X chars?
>
> Ideally barcode and audio encodings wouldn't impose arbitrary limits;
> however it would be good to document what's folk can expect to
> encounter, if only for sensible testing of error correction, reader
> accuracy etc.
>
> Thanks for any pointers,
>
> Dan
>


--
Mark Nottingham     http://www.mnot.net/


Reply | Threaded
Open this post in threaded view
|

Re: URIs in QR

Martin J. Dürst
In reply to this post by Erik Wilde-3
Hello everybody,

Living in Japan, I'm a bit surprised that QR codes suddenly seem to be
so 'in'. There are definitely quite a few around here in Japan, but not
that all the walls are covered with them.

The extraction of URIs from QR codes seems to work about as well (or not
well) as extraction of URIs from email. Definitely your email reader's
mileage may vary quite a bit with respect to what it recognizes and what
not.

As for 'tying URIs/URI schemes to applications', some advice (and maybe
actual standards work if that prooves necessary) about how to avoid the
problems with URI scheme proliferation due to application proliferation
(what started with iTunes and now continues with iPhone and so on) would
be very helpful. I don't see much other items of work for this area,
because even in the PC world, it's the user's business of how to
configure this association.

When QR codes were reasonably new here in Japan (must have been quite a
few years back), I looked into what W3C might be able (or need) to do
for them, but at that time, I got told that there was no pressing need
for further standardization. I don't know whether this might have
changed in the meantime.

What I seem to remember was that QR codes were defined only for ASCII
and for Shift_JIS; ideally, they should use UTF-8, of course.

Regards,    Martin.

On 2010/04/09 3:40, Erik Wilde wrote:

> hello all.
>
>> Yes, the use of URIs in QR struck me as rather ad-hoc too, relying on
>> heuristics too often.
>
> this certainly is the case, and why i haven't done any systematic study,
> the few apps that i tried on mobile devices all seemed to use their own
> methods and conventions, and most of them were not as robust and generic
> and configurable as they should be (if they were trying to be useful as
> general-purpose QR apps). i would be hard-pressed to exactly define what
> "robust and generic and configurable" means, though, and it seems to me
> that answering this question could be very valuable.
>
>> I've often thought we should do this, but only recently got my hands
>> dirty coding (integrated a QR reader into an iPhone app).
>
> i still hope to see QR support popping up in mobile OS platforms, and
> since there's an open one out there, if somebody is really adventurous,
> that would be a very interesting exercise to go through...
>
>> I don't know the status of QR Codes as a spec, in terms of things like
>> public availability of the standards docs, royalty free status, etc.
>> etc. http://en.wikipedia.org/wiki/QR_Code#Standards suggests "NTT
>> docomo has established de facto standards for the encoding of URLs,
>> contact information, and several other data types" but the use of
>> "URL" rather than "URI" doesn't fill me with confidence that the work
>> is 100% done. But even having a 'state of the landscape' document
>> (rather than something standards-like) could be a huge help for
>> everyone. Although an XG is a good idea and I'd try to participate,
>> for now I'd settle for a fact-pooling Wiki page...
>
> my take on this is that wikis tend to be much less useful that actual
> deliverables with a well-defined set of what should be covered, and by
> when. maybe we could have an QR dinner at www2010 and see how we might
> best proceed with this?
>
>> Have a play with http://github.com/diva/digital-voices/
>> http://www.youtube.com/watch?v=cjnPwV6yP6o
>> http://www.ics.uci.edu/~lopes/dv/dv.html too, especially the birdsong
>> encoding I think is quite lovely -
>> http://www.ics.uci.edu/~lopes/dv/BirdInfo.wav - you could have similar
>> software scanning audio going through your machine looking for secret
>> codes in radio jingles, youtube videos etc...
>
> i did not know of this. pretty amazing. so, instead of looking at QR,
> probably it would make sense to say what to best in real-world
> representations of "web-friendly handles", and then QR or audio would
> just be different representations. QR is ISO and thus may be a little
> further along that the audio encodings, but generally speaking,
> connecting the real world and the web should probably work in the same
> way for both representation types.
>
>> I think there's a lot to be gained here around mobile apps, eg. to
>> avoid a proliferation of barcodes outside points of interest (shops,
>> restaurants etc) I hope we'll see a move towards a single barcode that
>> is for a page describing the location, and embedded RDFa to make that
>> description machine-readable.
>
> this idea will make facebook or google very happy. have a sole provider
> of identity and everything goes through them. i think how web-friendly
> real-world encodings will be used should not be prescribed in any way,
> but we need to make sure that the web architecture parts of it work ina
> well-defined way.
>
>> Can you advise on the options when a W3C work item is heavily based
>> around work already standardised elsewhere? Are there formal
>> conventions/constraints already, or just social conventions like -
>> play nice and establish friendly liaison?
>
> my take is that this is basically the same as almost all web stuff using
> unicode; unicode is stable and that's all that matters. the same can be
> said for QR. it's an ISO standard and that's all that matters (maybe not
> quite giving IP issues, but at least from the technical perspective).
>
>> A lot of QR use does seem to be to discover identifiers. My guess is
>> that heuristics are doing 80%+ of the work just fine, so there hasn't
>> been a strong driver to 'catch up with the paperwork' and say exactly
>> how this should work.
>
> all of the apps i tried work in some cases and not in others. many get
> URI decoding wrong with non-ASCII URIs or even with "+" signs in them.
> many don't recognize non-HTTP(S) URIs. many have no way of associating
> schemes with apps. all of this is not rocket science, but it might help
> the ecosystem a lot to tell developers what they're supposed to do if
> they want to support not only QR scanning, but also the mobile web.
>
> cheers,
>
> dret.
>
>

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: URIs in QR

Erik Wilde-3
hello martin.

> Living in Japan, I'm a bit surprised that QR codes suddenly seem to be
> so 'in'. There are definitely quite a few around here in Japan, but not
> that all the walls are covered with them.

i don't think there is any sudden change, but maybe because we missed
the "QR on feature phones" phase, now the move to catch up with
smartphones is a bigger leap.

> The extraction of URIs from QR codes seems to work about as well (or not
> well) as extraction of URIs from email. Definitely your email reader's
> mileage may vary quite a bit with respect to what it recognizes and what
> not.

agreed. and email programs certainly also have their problems with
properly recognizing, decoding, and dispatching URIs.

> As for 'tying URIs/URI schemes to applications', some advice (and maybe
> actual standards work if that prooves necessary) about how to avoid the
> problems with URI scheme proliferation due to application proliferation
> (what started with iTunes and now continues with iPhone and so on) would
> be very helpful. I don't see much other items of work for this area,
> because even in the PC world, it's the user's business of how to
> configure this association.

the one thing that's different between email and QR is capacity and
packaging. email has MIME so if i am attaching my vcard then there's a
way how to do this and how to indicate the type. there is no such
packaging for QR, and there are some conventions (business card
conventions, for example, seem to exist in a variety of flavors), but
email has a more robust structure for that. i see three main areas:

- how to encode and decode and process URIs: this mainly looks at how to
properly encode and decode and process URIs, so that at least in theory
QR readers behave well for http: and tel: and sms: and geo:

- how to encode and decode and process "rich text": no idea whether this
should even be possible, but if something like email-style rich content
(like http://www.flickr.com/photos/dret/4498124087/) should be
encourages, then it would be good to have some recommendations for this
as well.

- how to encode and decode and process self-contained data: this gets
probably tricky because of size limitations. however, something like a
business card probably is useful to have as QR, but in this scenario we
need some way how to indicate a content type (maybe...).

> When QR codes were reasonably new here in Japan (must have been quite a
> few years back), I looked into what W3C might be able (or need) to do
> for them, but at that time, I got told that there was no pressing need
> for further standardization. I don't know whether this might have
> changed in the meantime.

this to a large extent depends on your personal opinions about what and
when to standardize. but from what i have experienced just from looking
what's out there and how it is working, it seems to me that the current
situation certainly is less than ideal, creating unpredictable results
and thus unsatisfactory user experiences when you try to use QR codes.

> What I seem to remember was that QR codes were defined only for ASCII
> and for Shift_JIS; ideally, they should use UTF-8, of course.

i still have some basic QR reading to do. my current understanding is
that on the lowest level (the ISO QR standard), QR can encode four
different content types, numeric, ASCII, binary, and kanji. this alone
may be unfortunate because then you cannot encode general unicode unless
you create a convention to use binary and UTF-8 (for example). i am
still catching up in these areas, but it seems to me that these are
exactly the things that need to be clarified and documented. maybe it
needs just clarification and documentation, but maybe there also is a
need for producing new conventions; we'll see.

as an example that still confuses me a bit: http://✪df.ws/ez9 is a IRI
and should somehow work, i guess. if i paste that into
http://www.mskynet.com/static/maestro it complains that this is not
ASCII (so both the URL and the RAW modes on that site seem to support
ASCII only). if i paste it into firefox address bar for producing a
google QR chart, it looks like this and sort of works:

http://chart.apis.google.com/chart?cht=qr&chs=150x150&choe=UTF-8&chld=H&chl=http%3A%2F%2F✪df.ws%2Fez9

it produces a URI that in some QR readers actually displays as the ✪,
but on my iphone, for example, forwarding that URI to safari then breaks
it. i don't know exactly where things are going wrong and it would take
a lot more testing to figure out what's going on in various scenarios
and what should be going on, but i think scenarios such as this should
be well-defined and documented, and since all of this is basically the
question how reliably web identifiers can be handled, the w3c might be a
good candidate to support such an activity.

cheers,

dret.