encoding problem detection issue

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

encoding problem detection issue

Jonathan Grant
Hello

Could I ask for your help.

I followed this example:

http://www.w3.org/International/questions/qa-validator-charset-check.en

but it didn't catch the corrupt characters in the following page, any ideas?


http://man7.org/linux/man-pages/man1/hostname.1.html




See text below with ??


Information about the project can be found at
       ??http://net-tools.sourceforge.net/??.  If you have a bug report for
       this manual page, see ??http://net-tools.sourceforge.net/??.


The bytes seem to be some multi byte E2 9F A8

I'm not a member on this list, so please keep my email in replies.

Regards, Jonny


Reply | Threaded
Open this post in threaded view
|

Re: encoding problem detection issue

Jukka K. Korpela
2015-02-09, 14:11, Jonathan Grant wrote:

> I followed this example:
>
> http://www.w3.org/International/questions/qa-validator-charset-check.en
>
> but it didn't catch the corrupt characters in the following page, any ideas?
>
>
> http://man7.org/linux/man-pages/man1/hostname.1.html

There are no corrupt characters there, as far as I can see. But some
characters there can be problematic in terms of font support; that’s a
completely different problem.

The page is declared as UTF-8 encoded, both in a <meta> tag and in an
HTTP header. And it appears to be actually UTF-8 encoded.

> See text below with ??
>
>
> Information about the project can be found at
>         ??http://net-tools.sourceforge.net/??.  If you have a bug report for
>         this manual page, see ??http://net-tools.sourceforge.net/??.
>
>
> The bytes seem to be some multi byte E2 9F A8

The character before the URL is “⟨” U+27E8 MATHEMATICAL LEFT ANGLE
BRACKET, which is E2 9F A in UTF-8 encoding; see
http://www.fileformat.info/info/unicode/char/27e8/index.htm
And the character after the URL is “⟩” U+27E9 MATHEMATICAL RIGHT ANGLE
BRACKET.

At the level of character representation and use of characters in
(X)HTML, everything is correct; there is no error to report.

But font support is limited; the page
http://www.fileformat.info/info/unicode/char/27e8/fontsupport.htm
lists most of the fonts containing these characters (though it may lack
some very new or specialized fonts). Browsers generally indicate lack of
font support by displaying a small rectangle instead.

Moreover, it is questionable whether these characters, designated as
mathematical, should be used as URL delimiters.

It is much safer, and much more common, to use the Ascii characters “<”
and “>” as delimiters. In (X)HTML, you just need to remember to write
the former as &lt; due to (X)HTML syntax rules.

> I'm not a member on this list, so please keep my email in replies.

OK.

Yucca



Reply | Threaded
Open this post in threaded view
|

Re: encoding problem detection issue

Jonathan Grant

Many thanks for your reply

Regards, Jon

On 16 February 2015 at 19:19, Jukka K. Korpela <[hidden email]> wrote:
> 2015-02-09, 14:11, Jonathan Grant wrote:
>
>> I followed this example:
>>
>> http://www.w3.org/International/questions/qa-validator-charset-check.en
>>
>> but it didn't catch the corrupt characters in the following page, any
>> ideas?
>>
>>
>> http://man7.org/linux/man-pages/man1/hostname.1.html
>
>
> There are no corrupt characters there, as far as I can see. But some
> characters there can be problematic in terms of font support; that’s a
> completely different problem.
>
> The page is declared as UTF-8 encoded, both in a <meta> tag and in an HTTP
> header. And it appears to be actually UTF-8 encoded.
>
>> See text below with ??
>>
>>
>> Information about the project can be found at
>> ??http://net-tools.sourceforge.net/??. If you have a bug report
>> for
>> this manual page, see ??http://net-tools.sourceforge.net/??.
>>
>>
>> The bytes seem to be some multi byte E2 9F A8
>
>
> The character before the URL is “⟨” U+27E8 MATHEMATICAL LEFT ANGLE BRACKET,
> which is E2 9F A in UTF-8 encoding; see
> http://www.fileformat.info/info/unicode/char/27e8/index.htm
> And the character after the URL is “⟩” U+27E9 MATHEMATICAL RIGHT ANGLE
> BRACKET.
>
> At the level of character representation and use of characters in (X)HTML,
> everything is correct; there is no error to report.
>
> But font support is limited; the page
> http://www.fileformat.info/info/unicode/char/27e8/fontsupport.htm
> lists most of the fonts containing these characters (though it may lack some
> very new or specialized fonts). Browsers generally indicate lack of font
> support by displaying a small rectangle instead.
>
> Moreover, it is questionable whether these characters, designated as
> mathematical, should be used as URL delimiters.
>
> It is much safer, and much more common, to use the Ascii characters “<” and
> “>” as delimiters. In (X)HTML, you just need to remember to write the former
> as &lt; due to (X)HTML syntax rules.
>
>> I'm not a member on this list, so please keep my email in replies.
>
>
> OK.
>
> Yucca
>
>