How to convert things like ã to utf-8?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to convert things like ã to utf-8?

Peng Yu
Hi,

For the following xml, I want to convert things like ã to utf-8.

http://ieeexplore.ieee.org/gateway/ipsSearch.jsp?sortfield=py&hc=1000&sortorder=desc&an=6706948

But I still see things like ã with the following command. Does
anybody know what is the correct command to do the conversion? Thanks.

~$ curl "http://ieeexplore.ieee.org/gateway/ipsSearch.jsp?sortfield=py&hc=1000&sortorder=desc&an=6706948"
> tmp1.xml
~$ tidy -q -xml --preserve-entities no --output-encoding utf8 tmp1.xml
> tmp2.xml
~$ vim tmp1.xml
~$ grep Bilz tmp2.xml
<![CDATA[Bilz&#x00E3;  Ara&#x00FA; jo;  Liang Zhao]]>

--
Regards,
Peng

Reply | Threaded
Open this post in threaded view
|

Re: How to convert things like &#x00E3; to utf-8?

Geoff McLane
Hi Peng,

Thank you for your inquiry...

Could you add it as an issue on -
  https://github.com/htacg/tidy-html5/issues
where I am sure it will get more attention...

Also add the version of tidy used, and the
expected output... thanks...

Regards,
Geoff.

On 17/05/16 15:11, Peng Yu wrote:

> Hi,
>
> For the following xml, I want to convert things like &#x00E3; to utf-8.
>
> http://ieeexplore.ieee.org/gateway/ipsSearch.jsp?sortfield=py&hc=1000&sortorder=desc&an=6706948
>
> But I still see things like &#x00E3; with the following command. Does
> anybody know what is the correct command to do the conversion? Thanks.
>
> ~$ curl "http://ieeexplore.ieee.org/gateway/ipsSearch.jsp?sortfield=py&hc=1000&sortorder=desc&an=6706948"
>> tmp1.xml
> ~$ tidy -q -xml --preserve-entities no --output-encoding utf8 tmp1.xml
>> tmp2.xml
> ~$ vim tmp1.xml
> ~$ grep Bilz tmp2.xml
> <![CDATA[Bilz&#x00E3;  Ara&#x00FA; jo;  Liang Zhao]]>
>
> --
> Regards,
> Peng
>