japanese encoding nightmare

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

japanese encoding nightmare

Paul Arenson
Hello

I came here via

For a long time I have used Mozilla to create (or adapt other) web pages.


It has worked.  I went back and was surprised that it worked DESPITE different encodings I inadvertantly used.

But recently tried to make pages that did NOT work!!!!  Am not sure why.  And so I am wriiting.


UNSUCCESSFUL EXAMPLE (Looks ok on desktop but not on server)

CODE
 <meta content="text/html; charset=UTF-8" http-equiv="content-type">


here are successful example from the past:
                    - - - - - - - - - - - - -           

SUCCESSFUL EXAMPLE ONE (JAPANESE COMES OUT RIGHT)

This was made via EXPRESSION ENGINE

I note I have both  xml: lang and  uft-8.
I also note I am confused about differences between character encoding and language, but anyway, it works.

CODE
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja" lang="ja">
<head>
<title>April entries</title>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

                    - - - - - - - - - - - - -           




SUCCESSFUL EXAMPLE TWO 

THIS WAS MADE BY HAND USING a CSS TEMPLATE.

I THOUGHT I did  this in UFT-8, but no.
 Mozilla even says it is UFT-8, but as you can see the code is western.
In other words, why does it work?


CODE
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">

                    - - - - - - - - - - - - -           



SUCCESSFUL EXAMPLE THREE
Now here is one where I specified uft-8 and it too is ok!

  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">


SUCCESSUL EXAMPLE FOUR (most bizarre?)
I even forgot to add the meta tag!!!

                    - - - - - - - - - - - - -           



PROBLEMS STARTED APPEARING WITH NEW PAGES

EXPERIMENT:

Method

Make a page in several  encodings
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html; charset=ISO-2022-JP"

LOOKS OK ONLINE 
                    - - - - - - - - - - - - -           

  <meta content="text/html; charset=UTF-8" http-equiv="content-type">
DOES NOT LOOK OK ONLINE
                    - - - - - - - - - - - - -           
  <meta content="text/html; charset=Shift_JIS" http-equiv="content-type">
DOES NOT LOOK OK ONLINE
                    - - - - - - - - - - - - -           
  <meta content="text/html; charset=EUC-JP" http-equiv="content-type">
DOES NOT LOOK OK ONLINE
                    - - - - - - - - - - - - -           



CONCLUSION:

Can anyone tell me what is going on?


Thanks!


__/__/__/__/__/__/__/__/__/__/
Paul Arenson

EMAIL


__/__/__/__/__/__/__/__/__/__/


Reply | Threaded
Open this post in threaded view
|

Re: japanese encoding nightmare

Karl Dubost


Le 13 nov. 2006 à 10:50, Paul Arenson a écrit :
> UNSUCCESSFUL EXAMPLE (Looks ok on desktop but not on server)
> http://tokyoprogressive.org/why.html
>
> CODE
>  <meta content="text/html; charset=UTF-8" http-equiv="content-type">

but this page is not in utf-8 but in shift-jis

Either you have to save your page as utf-8 or to change the encoding  
information to
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=Shift_JIS">


> SUCCESSFUL EXAMPLE ONE (JAPANESE COMES OUT RIGHT)
> http://www.tokyoprogressive.org/index/weblog/print/april-entries/

Yes the page is rightly utf-8. not valid but utf-8
<a href="http://validator.w3.org/check?uri=http%3A%2F%">http://validator.w3.org/check?uri=http%3A%2F% 
2Fwww.tokyoprogressive.org%2Findex%2Fweblog%2Fprint%2Fapril-entries%2F

> This was made via EXPRESSION ENGINE
>
> I note I have both  xml: lang and  uft-8.

xml:lang doesn't influence the display of the page. It is there for  
example for triggering the right accent when passing the text through  
a vocal browser. Or to help translation engines (not sure they  
implement it though). Or to help spelling cheker to choose the right  
dictionary.

I would recommend that you stick to utf-8, it would help to keep  
consistency in the way you serve the pages.

A cool plug-in that could be develop and be added to LogValidator.
        http://www.w3.org/QA/Tools/LogValidator/

Given a list of URIs, create a table with
uri   server_encoding   meta_encoding    guessed_encoding

Someone on the list would like to do that?
http://www.w3.org/QA/Tools/LogValidator/Manual-Modules



> I THOUGHT I did  this in UFT-8, but no.
>  Mozilla even says it is UFT-8, but as you can see the code is  
> western.
> In other words, why does it work?

because so browsers try to display wrong pages (invalid, wrong  
encoding, etc.) then people who develop Web pages do not know that  
they have done something wrong, and they do not fix it. IMHO it is a  
mistake from browsers.
It is cool to try to recover and display the page, but it is wrong to  
do silent recovery, as we do not enter in a cycle which help everyone  
to fix things and have a better experience.

> SUCCESSUL EXAMPLE FOUR (most bizarre?)
> I even forgot to add the meta tag!!!
> http://tokyoprogressive.org/

The server is sending by default an information which has usually  
priority other the information contained in the file.
The encoding in a file is a guess, and the browser _should_ follow  
what the servers says.


> Make a page in several  encodings
> http://tokyoprogressive.org/a.html
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
> <html>
> <head>
>   <meta content="text/html; charset=ISO-2022-JP"
> LOOKS OK ONLINE

doesn't look ok for me.

but your server is configured in a strange way

GET /a.html HTTP/1.1[CRLF]
Host: tokyoprogressive.org[CRLF]
Connection: close[CRLF]
Accept-Encoding: gzip[CRLF]
Accept: text/xml,application/xml,application/xhtml+xml,text/
html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[CRLF]
Accept-Language:  
fr,en;q=0.9,ja;q=0.9,de;q=0.8,es;q=0.7,it;q=0.7,nl;q=0.6,sv;q=0.5,nb;q=0
.5,da;q=0.4,fi;q=0.3,pt;q=0.3,zh-Hans;q=0.2,zh-Hant;q=0.1,ko;q=0.1[CRLF]
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF]
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:
1.8.0.7) Gecko/20060911 Camino/1.0.3 Web-Sniffer/1.0.24[CRLF]
Referer: http://web-sniffer.net/[CRLF]
[CRLF]


Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF]

You serve first iso-8859-1 and then utf-8 and then anything. Maybe  
one of the sources of your problems is there.

1. Change all your pages in one encoding only.
        utf-8
2. Change the configuration of your server to send only utf-8.





--
Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager, QA Activity Lead
   QA Weblog - http://www.w3.org/QA/
      *** Be Strict To Be Cool ***




Reply | Threaded
Open this post in threaded view
|

Re: japanese encoding nightmare

Paul Arenson

__/__/__/__/__/__/__/__/__/__/
Paul Arenson

EMAIL

PHONE &VOICE MAIL
1-617-379-0761 (U.S.)
090-4173-3873 (Japan)
paularenson (Skype)
__/__/__/__/__/__/__/__/__/__/





On Nov 13, 2006, at 10:22 PM, Karl Dubost wrote:


Le 13 nov. 2006 à 10:50, Paul Arenson a écrit :
UNSUCCESSFUL EXAMPLE (Looks ok on desktop but not on server)

CODE
 <meta content="text/html; charset=UTF-8" http-equiv="content-type">

but this page is not in utf-8 but in shift-jis

Either you have to save your page as utf-8 or to change the encoding information to
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=Shift_JIS">


It is?  I don't recall using that.  hmmm.  And when i save to desktop, changing to shift jis doesn't help, nor does looking at it on the web. Oh well....




SUCCESSFUL EXAMPLE ONE (JAPANESE COMES OUT RIGHT)

Yes the page is rightly utf-8. not valid but utf-8

Ok.....way back when i used the predecessor to Expression Engine, the encoding was something other than  unicode.  Then when I upgraded to unicode,
I asked the guy who helped me and he changed something in the program or on my server (using the database???). When he did that the new pages, like above, came out good, though old pages did not.   perhaps what he did to make Expression Engine work has to do with the server?

As i said, pages look good on my desktop but not on the server....




This was made via EXPRESSION ENGINE

I note I have both  xml: lang and  uft-8.

xml:lang doesn't influence the display of the page. It is there for example for triggering the right accent when passing the text through a vocal browser. Or to help translation engines (not sure they implement it though). Or to help spelling cheker to choose the right dictionary.

I would recommend that you stick to utf-8, it would help to keep consistency in the way you serve the pages.



I THOUGHT I did  this in UFT-8, but no.
 Mozilla even says it is UFT-8, but as you can see the code is western.
In other words, why does it work?

because so browsers try to display wrong pages (invalid, wrong encoding, etc.) then people who develop Web pages do not know that they have done something wrong, and they do not fix it. IMHO it is a mistake from browsers.
It is cool to try to recover and display the page, but it is wrong to do silent recovery, as we do not enter in a cycle which help everyone to fix things and have a better experience.

SUCCESSUL EXAMPLE FOUR (most bizarre?)
I even forgot to add the meta tag!!!

The server is sending by default an information which has usually priority other the information contained in the file.
The encoding in a file is a guess, and the browser _should_ follow what the servers says.

Yes, I guess in the css?
http://tokyoprogressive.org/style.css

But I do not see anything there....hmmmm?


Anyway, I am a bit lost.  Is this something that the person who adjusted my database did when he set for Expression  Engine and it affects all pages on server?

How do I fix the server (it is a commercial company)...

Thanks!



Make a page in several  encodings
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html; charset=ISO-2022-JP"
LOOKS OK ONLINE

doesn't look ok for me.

but your server is configured in a strange way

GET /a.html HTTP/1.1[CRLF]
Host: tokyoprogressive.org[CRLF]
Connection: close[CRLF]
Accept-Encoding: gzip[CRLF]
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[CRLF]
Accept-Language: fr,en;q=0.9,ja;q=0.9,de;q=0.8,es;q=0.7,it;q=0.7,nl;q=0.6,sv;q=0.5,nb;q=0.5,da;q=0.4,fi;q=0.3,pt;q=0.3,zh-Hans;q=0.2,zh-Hant;q=0.1,ko;q=0.1[CRLF]
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF]
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.0.7) Gecko/20060911 Camino/1.0.3 Web-Sniffer/1.0.24[CRLF]
Referer: <A href="http://web-sniffer.net/[CRLF]">http://web-sniffer.net/[CRLF]
[CRLF]


Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF]

You serve first iso-8859-1 and then utf-8 and then anything. Maybe one of the sources of your problems is there.

1. Change all your pages in one encoding only.
utf-8
2. Change the configuration of your server to send only utf-8.





-- 
W3C Conformance Manager, QA Activity Lead
  QA Weblog - http://www.w3.org/QA/
     *** Be Strict To Be Cool ***




Reply | Threaded
Open this post in threaded view
|

MORE TESTING

Paul Arenson
In reply to this post by Karl Dubost
I do not think it is the server, because I just took two more files.  one was created before and called testz.  The other i created now in Mozilla using UFT-8.
Called testzz,  I uploaded both to two different servers and both came out wrong.


Is my Mozilla corrupted?




Going to bed, it is midnight here.  Good night, and thanks.


__/__/__/__/__/__/__/__/__/__/
Paul Arenson

EMAIL

PHONE &VOICE MAIL
1-617-379-0761 (U.S.)
090-4173-3873 (Japan)
paularenson (Skype)
__/__/__/__/__/__/__/__/__/__/





On Nov 13, 2006, at 11:40 PM, Greg Swaney wrote:

I did a lot of poking and changing character sets on your account on sunday and it never showed the characters how they were supposed to be shown. What did w3 say?

Paul Arenson wrote:
Hi Greg
Further to my Sunday post about files I create in various encodings using Mozilla looking ok on my desktop but not on the server, I wrote to w3.org and they advised me, but it is way over my head.
What i am guessing is that files created by Expression Engine output in unicode (UFT-8) and somehow something on the server (database?)
tells the server to do something to the encoding.  Anyway, when I create a uft encoding on my desktop, it is served different on the site.....
I still use Expression Engine, but also use my own pages.
Maybe I should contact the guy who set up expression engine for me?
I am totally lost....though perhaps it is simple?
Thanks!
paul
see below from the web person--> [hidden email] <[hidden email]>
thanks
__/__/__/__/__/__/__/__/__/__/
Paul Arenson
EMAIL
PHONE &VOICE MAIL
1-617-379-0761 (U.S.)
090-4173-3873 (Japan)
paularenson (Skype)
__/__/__/__/__/__/__/__/__/__/
Begin forwarded message:
*From: *Karl Dubost <[hidden email] <[hidden email]>>
*Date: *November 13, 2006 10:22:09 PM JST
*To: *Paul Arenson <[hidden email] <[hidden email]>>
*Subject: **Re: japanese encoding nightmare*



Le 13 nov. 2006 à 10:50, Paul Arenson a écrit :
UNSUCCESSFUL EXAMPLE (Looks ok on desktop but not on server)

CODE
 <meta content="text/html; charset=UTF-8" http-equiv="content-type">

but this page is not in utf-8 but in shift-jis

Either you have to save your page as utf-8 or to change the encoding information to
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;">


SUCCESSFUL EXAMPLE ONE (JAPANESE COMES OUT RIGHT)

Yes the page is rightly utf-8. not valid but utf-8

This was made via EXPRESSION ENGINE

I note I have both  xml: lang and  uft-8.

xml:lang doesn't influence the display of the page. It is there for example for triggering the right accent when passing the text through a vocal browser. Or to help translation engines (not sure they implement it though). Or to help spelling cheker to choose the right dictionary.

I would recommend that you stick to utf-8, it would help to keep consistency in the way you serve the pages.

A cool plug-in that could be develop and be added to LogValidator.

Given a list of URIs, create a table with
uri   server_encoding   meta_encoding    guessed_encoding

Someone on the list would like to do that?



I THOUGHT I did  this in UFT-8, but no.
 Mozilla even says it is UFT-8, but as you can see the code is western.
In other words, why does it work?

because so browsers try to display wrong pages (invalid, wrong encoding, etc.) then people who develop Web pages do not know that they have done something wrong, and they do not fix it. IMHO it is a mistake from browsers.
It is cool to try to recover and display the page, but it is wrong to do silent recovery, as we do not enter in a cycle which help everyone to fix things and have a better experience.

SUCCESSUL EXAMPLE FOUR (most bizarre?)
I even forgot to add the meta tag!!!

The server is sending by default an information which has usually priority other the information contained in the file.
The encoding in a file is a guess, and the browser _should_ follow what the servers says.


Make a page in several  encodings
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html; charset=ISO-2022-JP"
LOOKS OK ONLINE

doesn't look ok for me.

but your server is configured in a strange way

GET /a.html HTTP/1.1[CRLF]
Host: tokyoprogressive.org[CRLF]
Connection: close[CRLF]
Accept-Encoding: gzip[CRLF]
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[CRLF]
Accept-Language: fr,en;q=0.9,ja;q=0.9,de;q=0.8,es;q=0.7,it;q=0.7,nl;q=0.6,sv;q=0.5,nb;q=0.5,da;q=0.4,fi;q=0.3,pt;q=0.3,zh-Hans;q=0.2,zh-Hant;q=0.1,ko;q=0.1[CRLF]
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF]
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.0.7) Gecko/20060911 Camino/1.0.3 Web-Sniffer/1.0.24[CRLF]
Referer: <A href="http://web-sniffer.net/[CRLF]">http://web-sniffer.net/[CRLF]
[CRLF]


Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF]

You serve first iso-8859-1 and then utf-8 and then anything. Maybe one of the sources of your problems is there.

1. Change all your pages in one encoding only.
utf-8
2. Change the configuration of your server to send only utf-8.





-- 
W3C Conformance Manager, QA Activity Lead
  QA Weblog - http://www.w3.org/QA/
     *** Be Strict To Be Cool ***





-- 
Greg Swaney
NEXCESS.NET Internet Solutions
304 1/2 S. State St.
Ann Arbor, MI 48104
1.866.NEXCESS

Reply | Threaded
Open this post in threaded view
|

Re: japanese encoding nightmare

Daniel Barclay
In reply to this post by Paul Arenson

Paul Arenson wrote:
...

>>> CODE
>>>  <meta content="text/html; charset=UTF-8" http-equiv="content-type">
>>
>> but this page is not in utf-8 but in shift-jis
>
>> Either you have to save your page as utf-8 or to change the encoding
>> information to
>> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;">
>
>
> It is?  I don't recall using that.  hmmm.  And when i save to desktop,
> changing to shift jis doesn't help, nor does looking at it on the web.
> Oh well....

Remember that <META HTTP-EQUIV="..." ...> elements are not supposed
to be read by the browser when the browser retrieved the document
from a server.

Such META elements are for the server to read and use to construct
real HTTP header fields (if the server chooses that mechanism).

(When dereferencing a "file:..." URL, there is no explicit service,
so browsers are probably allowed to read META elements, but they
very well might not.)

Daniel









Reply | Threaded
Open this post in threaded view
|

RE: japanese encoding nightmare

Mike Schinkel-2

Daniel Barclay wrote:

>> Remember that <META HTTP-EQUIV="..." ...> elements are not supposed
>> to be read by the browser when the browser retrieved the document
>> from a server.
>> Such META elements are for the server to read and use to construct
>> real HTTP header fields (if the server chooses that mechanism).

I recently read (from what I remember to be an authoritative source) that in
practice servers rarely ever read them because of performance so the browser
has to. (The only thing I can remember reading authoritative recently was
Weaving the Web, but I don't think TBL covered that in there. I wish my
member were better..)  

This http://www.w3.org/TR/html4/struct/global.html#adef-http-equiv says
(emphasis mine): "HTTP servers *MAY* use the property name specified by the
http-equiv attribute to create an [RFC822]-style header in the HTTP
response."  That would imply they might not, and if so the browser would
have to handle, no?

Anyway, just wanted to point this out (it is a shame the recommendation
didn't say "MUST" instead of "MAY")

-Mike Schinkel
http://www.mikeschinkel.com/blogs/
http://www.welldesignedurls.org/



Reply | Threaded
Open this post in threaded view
|

Re: japanese encoding nightmare

David Dorward-3

On Mon, Nov 13, 2006 at 06:11:18PM -0500, Mike Schinkel wrote:
> This http://www.w3.org/TR/html4/struct/global.html#adef-http-equiv says
> (emphasis mine): "HTTP servers *MAY* use the property name specified by the
> http-equiv attribute to create an [RFC822]-style header in the HTTP
> response."  That would imply they might not, and if so the browser would
> have to handle, no?

No, just that the server should use some other means to determine the
character encoding of the document (generally "use the configured
value").

> Anyway, just wanted to point this out (it is a shame the recommendation
> didn't say "MUST" instead of "MAY")

Ouch, every HTTPD an HTML parser? Ouch!

--
David Dorward                                      http://dorward.me.uk


Reply | Threaded
Open this post in threaded view
|

JAPANESE WOES

Paul Arenson
In reply to this post by Paul Arenson

Thanking Greg at Nexcess.net and the many people at
[hidden email] auch as  Karl Dubost <[hidden email]>, etc.

SUMMARY
(1) I have done two tests of my problem of unredable Japanese (where I  
never had this problem before)  and found that working at home on a  
MAC OSX creating files in Mozilla (which previously worked), uploading  
to tokyoprogressive.org and tokyoprogressive.org.uk (two companies)  
both fail in all encodings of Japanese.

(2) I wrote to the w3.org list and requested help.  I got an  
explanation, but it was above my head (sorry).

(3) I have tested at work on Windows 2000 and this time uft-8 works on  
both servers plus Google.  Shift-Jis works only on one of the servers.

(4) Conclusion?  Could there be something wrong with my MAC suddenly?  
Should I try another Mac at home?  Could it be my internet provider?
The fact that  the files work (all uft versions, at least) from work  
Windows machine (also Mozilla) and they do not work from home seems to  
say something happened to my Mac.


DETAILS BELOW FOR TESTING

LAST NIGHT FROM HOME (MAC/MOZILLA)

  http://tokyoprogressive.org/testz.html
  http://tokyoprogressive.org.uk/testz.html

  http://tokyoprogressive.org/testzz.html
  http://tokyoprogressive.org.uk/testzz.html


TODAY AT WORK
Then today, at work, on a Windows 2000 machine I used Mozilla and  
again created two files.  This time a uft-8 file and a Shift-Jis file.  
  (I prefer UFT-8 but wanted to check.) This time, more encouraging.  
I uploaded to 3 places:

NO GOOD (SHIFT JIS)
http://docs.google.com/View?docid=dfztwqbx_31fcz6hv
GOOD
http://docs.google.com/View?docid=dfztwqbx_32p97g5t


NO GOOD
http://tokyoprogressive.org/shiftjis.html
GOOD
http://tokyoprogressive.org/uft8.html



GOOD
http://tokyoprogressive.org.uk/shiftjis.html
GOOD
http://tokyoprogressive.org.uk/uft8.html




There are question as yet unclear ahbout server configurations, but I  
thought it significant that things have worked from this Windows  
MAchine.


THanks



> I do not think it is the server, because I just took two more files.
> one was created before and called testz.  The other i created now in
> Mozilla using UFT-8.
> Called testzz,  I uploaded both to two different servers and both  came
> out wrong.
>
>
> Is my Mozilla corrupted?
>
> http://tokyoprogressive.org/testz.html
> http://tokyoprogressive.org.uk/testz.html
>
> http://tokyoprogressive.org/testzz.html
> http://tokyoprogressive.org.uk/testzz.html
>
>
> Going to bed, it is midnight here.  Good night, and thanks.
>
>
> __/__/__/__/__/__/__/__/__/__/
> Paul Arenson
>
> EMAIL
> [hidden email]
>
> PHONE &VOICE MAIL
> 1-617-379-0761 (U.S.)
> 090-4173-3873 (Japan)
> paularenson (Skype)
> __/__/__/__/__/__/__/__/__/__/
>
>
>
>
>
> On Nov 13, 2006, at 11:40 PM, Greg Swaney wrote:
>
>> I did a lot of poking and changing character sets on your account    
>> on sunday and it never showed the characters how they were supposed  
>>   to be shown. What did w3 say?
>>
>> Paul Arenson wrote:
>>> Hi Greg
>>> Further to my Sunday post about files I create in various    
>>> encodings using Mozilla looking ok on my desktop but not on the    
>>> server, I wrote to w3.org and they advised me, but it is way over  
>>>  my head.
>>> What i am guessing is that files created by Expression Engine    
>>> output in unicode (UFT-8) and somehow something on the server    
>>> (database?)
>>> tells the server to do something to the encoding.  Anyway, when I  
>>>  create a uft encoding on my desktop, it is served different on  
>>> the   site.....
>>> I still use Expression Engine, but also use my own pages.
>>> Maybe I should contact the guy who set up expression engine for me?
>>> I am totally lost....though perhaps it is simple?
>>> Thanks!
>>> paul
>>> see below from the web person--> [hidden email]    
>>> <mailto:[hidden email]>
>>> thanks
>>> __/__/__/__/__/__/__/__/__/__/
>>> Paul Arenson
>>> EMAIL
>>> [hidden email] <mailto:[hidden email]>
>>> PHONE &VOICE MAIL
>>> 1-617-379-0761 (U.S.)
>>> 090-4173-3873 (Japan)
>>> paularenson (Skype)
>>> __/__/__/__/__/__/__/__/__/__/
>>> Begin forwarded message:
>>>> *Resent-From: *[hidden email] <mailto:public- [hidden email]>
>>>> *From: *Karl Dubost <[hidden email] <mailto:[hidden email]>>
>>>> *Date: *November 13, 2006 10:22:09 PM JST
>>>> *To: *Paul Arenson <[hidden email]    
>>>> <mailto:[hidden email]>>
>>>> *Cc: *[hidden email] <mailto:[hidden email]>
>>>> *Subject: **Re: japanese encoding nightmare*
>>>>
>>>>
>>>>
>>>> Le 13 nov. 2006 à 10:50, Paul Arenson a écrit :
>>>>> UNSUCCESSFUL EXAMPLE (Looks ok on desktop but not on server)
>>>>> http://tokyoprogressive.org/why.html
>>>>>
>>>>> CODE
>>>>> <meta content="text/html; charset=UTF-8" http-equiv="content- type">
>>>>
>>>> but this page is not in utf-8 but in shift-jis
>>>>
>>>> Either you have to save your page as utf-8 or to change the    
>>>> encoding information to
>>>> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;">
>>>>
>>>>
>>>>> SUCCESSFUL EXAMPLE ONE (JAPANESE COMES OUT RIGHT)
>>>>> http://www.tokyoprogressive.org/index/weblog/print/april-entries/
>>>>
>>>> Yes the page is rightly utf-8. not valid but utf-8
>>>> <a href="http://validator.w3.org/check?uri=http%3A%2F%">http://validator.w3.org/check?uri=http%3A%2F%   
>>>> 2Fwww.tokyoprogressive.org%2Findex%2Fweblog%2Fprint%2Fapril-  
>>>> entries%2F
>>>>
>>>>> This was made via EXPRESSION ENGINE
>>>>>
>>>>> I note I have both  xml: lang and  uft-8.
>>>>
>>>> xml:lang doesn't influence the display of the page. It is there    
>>>> for example for triggering the right accent when passing the text  
>>>>   through a vocal browser. Or to help translation engines (not  
>>>> sure  they implement it though). Or to help spelling cheker to  
>>>> choose  the right dictionary.
>>>>
>>>> I would recommend that you stick to utf-8, it would help to keep  
>>>>  consistency in the way you serve the pages.
>>>>
>>>> A cool plug-in that could be develop and be added to LogValidator.
>>>> http://www.w3.org/QA/Tools/LogValidator/
>>>>
>>>> Given a list of URIs, create a table with
>>>> uri   server_encoding   meta_encoding    guessed_encoding
>>>>
>>>> Someone on the list would like to do that?
>>>> http://www.w3.org/QA/Tools/LogValidator/Manual-Modules
>>>>
>>>>
>>>>
>>>>> I THOUGHT I did  this in UFT-8, but no.
>>>>> Mozilla even says it is UFT-8, but as you can see the code is  western.
>>>>> In other words, why does it work?
>>>>
>>>> because so browsers try to display wrong pages (invalid, wrong    
>>>> encoding, etc.) then people who develop Web pages do not know    
>>>> that they have done something wrong, and they do not fix it. IMHO  
>>>>   it is a mistake from browsers.
>>>> It is cool to try to recover and display the page, but it is    
>>>> wrong to do silent recovery, as we do not enter in a cycle which  
>>>>  help everyone to fix things and have a better experience.
>>>>
>>>>> SUCCESSUL EXAMPLE FOUR (most bizarre?)
>>>>> I even forgot to add the meta tag!!!
>>>>> http://tokyoprogressive.org/
>>>>
>>>> The server is sending by default an information which has usually  
>>>>   priority other the information contained in the file.
>>>> The encoding in a file is a guess, and the browser _should_    
>>>> follow what the servers says.
>>>>
>>>>
>>>>> Make a page in several  encodings
>>>>> http://tokyoprogressive.org/a.html
>>>>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
>>>>> <html>
>>>>> <head>
>>>>>  <meta content="text/html; charset=ISO-2022-JP"
>>>>> LOOKS OK ONLINE
>>>>
>>>> doesn't look ok for me.
>>>>
>>>> but your server is configured in a strange way
>>>>
>>>> GET /a.html HTTP/1.1[CRLF]
>>>> Host: tokyoprogressive.org[CRLF]
>>>> Connection: close[CRLF]
>>>> Accept-Encoding: gzip[CRLF]
>>>> Accept: text/xml,application/xml,application/xhtml+xml,text/  
>>>> html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[CRLF]
>>>> Accept-Language:    
>>>> fr,en;q=0.9,ja;q=0.9,de;q=0.8,es;q=0.7,it;q=0.7,nl;q=0.6,sv;q=0.5,nb  
>>>> ;q=0.5,da;q=0.4,fi;q=0.3,pt;q=0.3,zh-Hans;q=0.2,zh-  
>>>> Hant;q=0.1,ko;q=0.1[CRLF]
>>>> Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF]
>>>> User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:  
>>>>  1.8.0.7) Gecko/20060911 Camino/1.0.3 Web-Sniffer/1.0.24[CRLF]
>>>> Referer: http://web-sniffer.net/[CRLF]
>>>> [CRLF]
>>>>
>>>>
>>>> Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF]
>>>>
>>>> You serve first iso-8859-1 and then utf-8 and then anything.    
>>>> Maybe one of the sources of your problems is there.
>>>>
>>>> 1. Change all your pages in one encoding only.
>>>> utf-8
>>>> 2. Change the configuration of your server to send only utf-8.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Karl Dubost - http://www.w3.org/People/karl/
>>>> W3C Conformance Manager, QA Activity Lead
>>>>  QA Weblog - http://www.w3.org/QA/
>>>>     *** Be Strict To Be Cool ***
>>>>
>>>>
>>>>
>>>>
>>
>> --
>> Greg Swaney
>> NEXCESS.NET Internet Solutions
>> http://nexcess.net
>> 304 1/2 S. State St.
>> Ann Arbor, MI 48104
>> 1.866.NEXCESS



Reply | Threaded
Open this post in threaded view
|

Re: japanese encoding nightmare

Daniel Barclay
In reply to this post by Mike Schinkel-2

Mike Schinkel wrote:
> Daniel Barclay wrote:
>
>>> Remember that <META HTTP-EQUIV="..." ...> elements are not supposed

I should narrow that to "some ... elements "

>>> to be read by the browser when the browser retrieved the document
>>> from a server.
>>> Such META elements are for the server to read and use to construct
>>> real HTTP header fields (if the server chooses that mechanism).
>
> I recently read (from what I remember to be an authoritative source) that in
> practice servers rarely ever read them because of performance so the browser
> has to.

In some cases, the browser is not even allowed to use them.

If the server indicates the content type and character encoding
("charset") in the HTTP response, the browser must use _that_ type and
charset and must _not_ use values from a <META HTTP-EQUIV="Content-Type"
...> element or anything else in the returned entity (document) to
determine the type and charset.  That is, the server's HTTP headers
override any specifications inside the entity.

A server is supposed to be able to change the encoding of a document as
long as it reports the encoding correctly in the Content-Type header.
It is not supposed to have to change any <META HTTP-EQUIV="Content-Type"
...> elements.

(Besides requiring any transcoding server to understand HTML, changing
such elements would be changing the _contents_ of the document, not just
changing its _encoding_ (changing the sequence of characters, not just
changing the bytes that encode the characters).)

If the browser ignored the Content-Type header from the server and read
a <META HTTP-EQUIV="Content-Type" ...> element, it might be trying to
use the wrong encoding.


I thought that any browser that behaved differently (say, IE 6,
which sometimes ignores "text/plain" from the server) violated some
specification.

However, looking at the HTML 4.01 specification, I only see wording
about servers' being allowed to read such element:
- "HTTP servers use this attribute to gather information for HTTP
   response message headers"
- "HTTP servers may use the property name specified by the http-equiv
   attribute to create an [RFC822]-style header in the HTTP response."

Evidently my source was something else.  I don't remember which
document it was, so I don't know whether it was as authoritative as
a specification.  (I do think it was something from the W3C.)


Note that XML has similar a rule regarding the character encoding
specified inside an XML document in the XML declaration ("<?xml
encoding='...'?>").  If the character encoding is specified to the
XML processor at a higher level (e.g., via an HTTP Content-Type
header), then the processor must ignore the character encoding
specification in the XML declaration.

(Again, I can't find that in the XML specification itself, so I
can't currently vouch for the authoritativeness of my source.)


Of course, that's all about the content type and encoding.  Since I
don't recall my source, I can't say whether most HTTP-EQUIV elements
are like Content-Type (the browser must _not_ use them) or not (the
browser can use them).


> This http://www.w3.org/TR/html4/struct/global.html#adef-http-equiv says
> (emphasis mine): "HTTP servers *MAY* use the property name specified by the
> http-equiv attribute to create an [RFC822]-style header in the HTTP
> response."  That would imply they might not, and if so the browser would
> have to handle, no?

Not quite.

It's not a server's not reading HTTP-EQUIV information from inside an
HTML document that might imply that the browser should read it.

If the server read more-authoritative information from elsewhere (e.g.,
a server configuration file describing the documents to be served out)
and reported it in an HTTP header, then the browser should not ignore
its more-authoritative source (the server HTTP response header) and
instead read an less-authoritative source (the insides of the document).


However, it might be a server's not sending a header at all that implies
that the browser can (or maybe should) use HTTP-EQUIV information.

(I'm not sure that there's not a case where the server can choose to not
return a certain header and where the browser should take that lack of
a header as authoritative.)



Daniel




Reply | Threaded
Open this post in threaded view
|

RE: japanese encoding nightmare

r12a
In reply to this post by Paul Arenson

Paul, read this and let me know if you still have questions:
 

Changing (X)HTML page encoding to UTF-8
http://www.w3.org/International/questions/qa-changing-encoding
 
RI



============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/People/Ishida/
http://www.w3.org/International/
http://people.w3.org/rishida/blog/
http://www.flickr.com/photos/ishida/


 


________________________________

        From: [hidden email]
[mailto:[hidden email]] On Behalf Of Paul Arenson
        Sent: 13 November 2006 01:51
        To: [hidden email]
        Cc: Paul Arenson
        Subject: japanese encoding nightmare
       
       
        Hello

        I came here via
        http://www.webstandards.org/learn/articles/askw3c/dec2002/

        For a long time I have used Mozilla to create (or adapt other) web
pages.


        It has worked. I went back and was surprised that it worked DESPITE
different encodings I inadvertantly used.

        But recently tried to make pages that did NOT work!!!! Am not sure
why. And so I am wriiting.


        UNSUCCESSFUL EXAMPLE (Looks ok on desktop but not on server)
        http://tokyoprogressive.org/why.html

        CODE
        <meta content="text/html; charset=UTF-8" http-equiv="content-type">


        here are successful example from the past:
        - - - - - - - - - - - - -

        SUCCESSFUL EXAMPLE ONE (JAPANESE COMES OUT RIGHT)
        http://www.tokyoprogressive.org/index/weblog/print/april-entries/

        This was made via EXPRESSION ENGINE

        I note I have both xml: lang and uft-8.
        I also note I am confused about differences between character
encoding and language, but anyway, it works.

        CODE
        <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja" lang="ja">
        <head>
        <title>April entries</title>

        <meta http-equiv="Content-Type" content="text/html; charset=utf-8"
/>

        - - - - - - - - - - - - -




        SUCCESSFUL EXAMPLE TWO
        http://tokyoprogressive.org/indexoct2006.html

        THIS WAS MADE BY HAND USING a CSS TEMPLATE.

        I THOUGHT I did this in UFT-8, but no.
        Mozilla even says it is UFT-8, but as you can see the code is
western.
        In other words, why does it work?


        CODE
        <meta http-equiv="Content-Type"
        content="text/html; charset=iso-8859-1">

        - - - - - - - - - - - - -



        SUCCESSFUL EXAMPLE THREE
        http://tokyoprogressive.org/indexnov2006.html
        Now here is one where I specified uft-8 and it too is ok!

        <meta http-equiv="Content-Type" content="text/html; charset=utf-8">


        SUCCESSUL EXAMPLE FOUR (most bizarre?)
        I even forgot to add the meta tag!!!

        http://tokyoprogressive.org/
        - - - - - - - - - - - - -



        PROBLEMS STARTED APPEARING WITH NEW PAGES

        EXPERIMENT:

        Method

        Make a page in several encodings
        http://tokyoprogressive.org/a.html
        <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
        <html>
        <head>
        <meta content="text/html; charset=ISO-2022-JP"

        LOOKS OK ONLINE
        - - - - - - - - - - - - -

        http://tokyoprogressive.org/b.html
        <meta content="text/html; charset=UTF-8" http-equiv="content-type">
        DOES NOT LOOK OK ONLINE
        - - - - - - - - - - - - -
        http://tokyoprogressive.org/c.html
        <meta content="text/html; charset=Shift_JIS"
http-equiv="content-type">
        DOES NOT LOOK OK ONLINE
        - - - - - - - - - - - - -
        http://tokyoprogressive.org/d.html
        <meta content="text/html; charset=EUC-JP" http-equiv="content-type">
        DOES NOT LOOK OK ONLINE
        - - - - - - - - - - - - -



        CONCLUSION:

        Can anyone tell me what is going on?


        Thanks!


                __/__/__/__/__/__/__/__/__/__/
        Paul Arenson

        EMAIL
        [hidden email]


        __/__/__/__/__/__/__/__/__/__/

       



Reply | Threaded
Open this post in threaded view
|

RE: japanese encoding nightmare

Tex Texin

Hi Richard,
That page seems incomplete and potentially dangerous.

1) Simply saying to save as utf-8 ignores the problem of knowing which
encoding you are starting from.
Often text is thought to be iso-8859-1, big-5 or some other encoding and it
is actually 1252, big5-hkscs or a variant or different encoding.
If the source encoding is incorrect, then the conversion to utf-8 may result
in the wrong characters and data loss.

The document should make sure users proactively identify the correct
encoding of the page before transcoding.


2) When converting text or html to utf-8 special consideration needs to be
given to URLs. A URL has 4 parts: scheme, domain, path and query.
Schemes are ASCII and not a problem to convert to utf-8 as they remain
ASCII. Domains and Paths should be convertible to UTF-8.
(They will go thru additional conversions to an ASCII form before going over
the wire.)

However the query portion of a URL is not necessarily convertible to
Unicode. The query portion represents data that is used as a reference
within some other application pointed to by the remainder of the URL. That
application may require an encoding other than UTF-8 or it may not be
textual.
Conversion to utf-8 may therefore damage the URL.

For example, I might have a cgi and database application based on
iso-8859-1.
The original URL might be the following contrived example (I left off the
scheme http: since it isn't a working url) www.i18nguy.com/?find=cafe

In a page encoded as iso-8859-1 the e-acute will be represented by a single
byte as 0xE9.
The i18nguy.com cgi and database application will expect to match the byte
0xE9.

If the URL is transcoded to UTF-8, the character e-acute will become two
bytes and represented in the URL by hex encoding as %C3%A9.
The URL will no longer work unless the application is also modified to
expect UTF-8 values.

However, when the x(h)tml page is transcoded to utf-8, the embedded URLs may
be links to applications that we have no control over and they may be
affected.

Therefore a more appropriate recommendation might be to first represent the
query portions of a URL by a hex-encoded form in the original encoding, and
then the page can be converted to utf-8.

E.g. convert www.i18nguy.com/?find=cafe to www.i18nguy.com/?find=caf%E9
Subsequent transcoding to utf-8 won't change the value %E9.

On the other hand, simply transcoding to utf-8 will give
www.i18nguy.com/?find=caf%C3%A9 which will break the link or reference the
incorrect value in the target application.

====
Haven't we been over this ground before? Perhaps in one of the other
documents. The page should be updated.

tex

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Richard Ishida
Sent: Thursday, November 23, 2006 2:16 AM
To: 'Paul Arenson'; [hidden email]
Subject: RE: japanese encoding nightmare


Paul, read this and let me know if you still have questions:
 

Changing (X)HTML page encoding to UTF-8
http://www.w3.org/International/questions/qa-changing-encoding
 
RI



============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/People/Ishida/
http://www.w3.org/International/
http://people.w3.org/rishida/blog/
http://www.flickr.com/photos/ishida/


 


________________________________

        From: [hidden email]
[mailto:[hidden email]] On Behalf Of Paul Arenson
        Sent: 13 November 2006 01:51
        To: [hidden email]
        Cc: Paul Arenson
        Subject: japanese encoding nightmare
       
       
        Hello

        I came here via
        http://www.webstandards.org/learn/articles/askw3c/dec2002/

        For a long time I have used Mozilla to create (or adapt other) web
pages.


        It has worked. I went back and was surprised that it worked DESPITE
different encodings I inadvertantly used.

        But recently tried to make pages that did NOT work!!!! Am not sure
why. And so I am wriiting.


        UNSUCCESSFUL EXAMPLE (Looks ok on desktop but not on server)
        http://tokyoprogressive.org/why.html

        CODE
        <meta content="text/html; charset=UTF-8" http-equiv="content-type">


        here are successful example from the past:
        - - - - - - - - - - - - -

        SUCCESSFUL EXAMPLE ONE (JAPANESE COMES OUT RIGHT)
        http://www.tokyoprogressive.org/index/weblog/print/april-entries/

        This was made via EXPRESSION ENGINE

        I note I have both xml: lang and uft-8.
        I also note I am confused about differences between character
encoding and language, but anyway, it works.

        CODE
        <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja" lang="ja">
        <head>
        <title>April entries</title>

        <meta http-equiv="Content-Type" content="text/html; charset=utf-8"
/>

        - - - - - - - - - - - - -




        SUCCESSFUL EXAMPLE TWO
        http://tokyoprogressive.org/indexoct2006.html

        THIS WAS MADE BY HAND USING a CSS TEMPLATE.

        I THOUGHT I did this in UFT-8, but no.
        Mozilla even says it is UFT-8, but as you can see the code is
western.
        In other words, why does it work?


        CODE
        <meta http-equiv="Content-Type"
        content="text/html; charset=iso-8859-1">

        - - - - - - - - - - - - -



        SUCCESSFUL EXAMPLE THREE
        http://tokyoprogressive.org/indexnov2006.html
        Now here is one where I specified uft-8 and it too is ok!

        <meta http-equiv="Content-Type" content="text/html; charset=utf-8">


        SUCCESSUL EXAMPLE FOUR (most bizarre?)
        I even forgot to add the meta tag!!!

        http://tokyoprogressive.org/
        - - - - - - - - - - - - -



        PROBLEMS STARTED APPEARING WITH NEW PAGES

        EXPERIMENT:

        Method

        Make a page in several encodings
        http://tokyoprogressive.org/a.html
        <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
        <html>
        <head>
        <meta content="text/html; charset=ISO-2022-JP"

        LOOKS OK ONLINE
        - - - - - - - - - - - - -

        http://tokyoprogressive.org/b.html
        <meta content="text/html; charset=UTF-8" http-equiv="content-type">
        DOES NOT LOOK OK ONLINE
        - - - - - - - - - - - - -
        http://tokyoprogressive.org/c.html
        <meta content="text/html; charset=Shift_JIS"
http-equiv="content-type">
        DOES NOT LOOK OK ONLINE
        - - - - - - - - - - - - -
        http://tokyoprogressive.org/d.html
        <meta content="text/html; charset=EUC-JP" http-equiv="content-type">
        DOES NOT LOOK OK ONLINE
        - - - - - - - - - - - - -



        CONCLUSION:

        Can anyone tell me what is going on?


        Thanks!


                __/__/__/__/__/__/__/__/__/__/
        Paul Arenson

        EMAIL
        [hidden email]


        __/__/__/__/__/__/__/__/__/__/

       





Reply | Threaded
Open this post in threaded view
|

Re: japanese encoding nightmare: conclusion

Paul Arenson
In reply to this post by Paul Arenson
Hi 

Way back a month ago I asked a question about why I was able to create a functioning web page on my MAC desktop that showed up wrong on my server,

You might be interested in this report.  It shows something very wieird with one machine or program or set of programs ...

It was created as UTF-8 yet one of you mentioned that it was JIS (anyway Japanese).

Well, I found that uploaded to another company's server it was also the same problem.  Duplicating the same thing on another Mac as well as a Windows
machine I found subsequently that creating a similar file worked on the desktops as well as the servers.  

So I again went back to the offending Mac, created a new file, and again the same problem .  When i sent that file to myself and pic ked it up on the other Mac (using Email), then uploaded to the web, it was fine.


Conclusion:  something on that one mac is corrupting the file.


It is mysterious and never happened before.  I tried to download a new version of Mozilla midwat between last time and now and as I recall it did not change things.   Shall i conclude that something on my one Mac is corrupting things?   


Anyway, your guess is as good as mine, but this does seem to be the problem with one Mac.  Would you guess I should reformat the thing, or do you have any idea what might cause the Mac/Mozilla/FTP program (one or all?) to mess up a file?


I can do any tests if anyone is interested.


Thanks

 

  
__/__/__/__/__/__/__/__/__/__/
Paul Arenson

EMAIL

PHONE &VOICE MAIL
1-617-379-0761 (U.S.)
090-4173-3873 (Japan)
paularenson (Skype)
__/__/__/__/__/__/__/__/__/__/





On Nov 13, 2006, at 11:25 PM, Paul Arenson wrote:


__/__/__/__/__/__/__/__/__/__/
Paul Arenson

EMAIL

PHONE &VOICE MAIL
1-617-379-0761 (U.S.)
090-4173-3873 (Japan)
paularenson (Skype)
__/__/__/__/__/__/__/__/__/__/





On Nov 13, 2006, at 10:22 PM, Karl Dubost wrote:


Le 13 nov. 2006 à 10:50, Paul Arenson a écrit :
UNSUCCESSFUL EXAMPLE (Looks ok on desktop but not on server)

CODE
 <meta content="text/html; charset=UTF-8" http-equiv="content-type">

but this page is not in utf-8 but in shift-jis

Either you have to save your page as utf-8 or to change the encoding information to
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=Shift_JIS">


It is?  I don't recall using that.  hmmm.  And when i save to desktop, changing to shift jis doesn't help, nor does looking at it on the web. Oh well....




SUCCESSFUL EXAMPLE ONE (JAPANESE COMES OUT RIGHT)

Yes the page is rightly utf-8. not valid but utf-8

Ok.....way back when i used the predecessor to Expression Engine, the encoding was something other than  unicode.  Then when I upgraded to unicode,
I asked the guy who helped me and he changed something in the program or on my server (using the database???). When he did that the new pages, like above, came out good, though old pages did not.   perhaps what he did to make Expression Engine work has to do with the server?

As i said, pages look good on my desktop but not on the server....




This was made via EXPRESSION ENGINE

I note I have both  xml: lang and  uft-8.

xml:lang doesn't influence the display of the page. It is there for example for triggering the right accent when passing the text through a vocal browser. Or to help translation engines (not sure they implement it though). Or to help spelling cheker to choose the right dictionary.

I would recommend that you stick to utf-8, it would help to keep consistency in the way you serve the pages.



I THOUGHT I did  this in UFT-8, but no.
 Mozilla even says it is UFT-8, but as you can see the code is western.
In other words, why does it work?

because so browsers try to display wrong pages (invalid, wrong encoding, etc.) then people who develop Web pages do not know that they have done something wrong, and they do not fix it. IMHO it is a mistake from browsers.
It is cool to try to recover and display the page, but it is wrong to do silent recovery, as we do not enter in a cycle which help everyone to fix things and have a better experience.

SUCCESSUL EXAMPLE FOUR (most bizarre?)
I even forgot to add the meta tag!!!

The server is sending by default an information which has usually priority other the information contained in the file.
The encoding in a file is a guess, and the browser _should_ follow what the servers says.

Yes, I guess in the css?
http://tokyoprogressive.org/style.css

But I do not see anything there....hmmmm?


Anyway, I am a bit lost.  Is this something that the person who adjusted my database did when he set for Expression  Engine and it affects all pages on server?

How do I fix the server (it is a commercial company)...

Thanks!



Make a page in several  encodings
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html; charset=ISO-2022-JP"
LOOKS OK ONLINE

doesn't look ok for me.

but your server is configured in a strange way

GET /a.html HTTP/1.1[CRLF]
Host: tokyoprogressive.org[CRLF]
Connection: close[CRLF]
Accept-Encoding: gzip[CRLF]
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[CRLF]
Accept-Language: fr,en;q=0.9,ja;q=0.9,de;q=0.8,es;q=0.7,it;q=0.7,nl;q=0.6,sv;q=0.5,nb;q=0.5,da;q=0.4,fi;q=0.3,pt;q=0.3,zh-Hans;q=0.2,zh-Hant;q=0.1,ko;q=0.1[CRLF]
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF]
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.0.7) Gecko/20060911 Camino/1.0.3 Web-Sniffer/1.0.24[CRLF]
Referer: <A href="http://web-sniffer.net/[CRLF]">http://web-sniffer.net/[CRLF]
[CRLF]


Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF]

You serve first iso-8859-1 and then utf-8 and then anything. Maybe one of the sources of your problems is there.

1. Change all your pages in one encoding only.
utf-8
2. Change the configuration of your server to send only utf-8.





-- 
W3C Conformance Manager, QA Activity Lead
  QA Weblog - http://www.w3.org/QA/
     *** Be Strict To Be Cool ***