Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / XML / March 2007

Tip: Looking for answers? Try searching our database.

Can XmlDocument.Load() method handle unicode characters?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
lamxing@gmail.com - 30 Jan 2007 20:51 GMT
Dear all,

     I've spent a long time to try to get the xmldocument.load method
to handle UTF-8 characters, but no luck.  Every time it loads a
document contains european characters (such as the one below, output
from google map API), it always said invalid character at position
229, which I believe is the "ß" character.

     Can anyone point me to the right direction of how to load such
documents using the xmldocument.load() method, or some other better
ways to do this?

      Thanks!

---------------sample XML file------------------
 <?xml version="1.0" encoding="UTF-8" ?>
- <kml xmlns="http://earth.google.com/kml/2.0">
- <Response>
 <name>germaniastr 134, berlin berlin</name>
- <Status>
 <code>200</code>
 <request>geocode</request>
 </Status>
- <Placemark>
 <address>Germaniastraße 134, 12099 Tempelhof, Berlin, Germany</
address>
- <AddressDetails Accuracy="8"
xmlns="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0">
- <Country>
 <CountryNameCode>DE</CountryNameCode>
- <AdministrativeArea>
 <AdministrativeAreaName>Berlin</AdministrativeAreaName>
- <SubAdministrativeArea>
 <SubAdministrativeAreaName>Berlin</SubAdministrativeAreaName>
- <Locality>
 <LocalityName>Berlin</LocalityName>
- <DependentLocality>
 <DependentLocalityName>Tempelhof</DependentLocalityName>
- <Thoroughfare>
 <ThoroughfareName>Germaniastraße 134</ThoroughfareName>
 </Thoroughfare>
- <PostalCode>
 <PostalCodeNumber>12099</PostalCodeNumber>
 </PostalCode>
 </DependentLocality>
 </Locality>
 </SubAdministrativeArea>
 </AdministrativeArea>
 </Country>
 </AddressDetails>
- <Point>
 <coordinates>13.399486,52.464476,0</coordinates>
 </Point>
 </Placemark>
 </Response>
 </kml>
Bjoern Hoehrmann - 31 Jan 2007 06:07 GMT
* lamxing@gmail.com wrote in microsoft.public.dotnet.xml:
>      I've spent a long time to try to get the xmldocument.load method
>to handle UTF-8 characters, but no luck.  Every time it loads a
>document contains european characters (such as the one below, output
>from google map API), it always said invalid character at position
>229, which I believe is the "ß" character.

Then it is most likely that your document is not UTF-8 encoded. You will
have to check which bytes are actually at that position, e.g. using a
hex editor (e.g., use File.OpenFile ... /e:Binary in Visual Studio). If
the ß is encoded as two bytes C3 9F then that's either not the offending
character, or you have other encoding problems (for example, you might
have told the XML processor the document is US-ASCII encoded).

Note that loading XML documents in Internet Explorer and copying and
pasting the results does not help in any way to debug this kind of
problem, compressing the document and loading it up to some web server
is a more sensible approach.
Signature

Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

lamxing@gmail.com - 31 Jan 2007 07:47 GMT
Thanks for your reply, Björn.  Since this file is coming from a
dynamic URL online, I just used the XmlDocument.Load(URL) method to
load the xml file.  In this case, how do I tell the XML processor what
encoding the file would be before I load the document?  I've saved the
sample XML file (dynamicaly generated from google map) from IE's File-
>Save As... , and uploaded the file to http://www.usctimes.com/gmap/
geo.xml .  It seems to open fine in the browser, does that means
anything?

> Then it is most likely that your document is not UTF-8 encoded. You will
> have to check which bytes are actually at that position, e.g. using a
[quoted text clipped - 11 lines]
> Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
> 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/
Martin Honnen - 31 Jan 2007 13:09 GMT
> Since this file is coming from a
> dynamic URL online, I just used the XmlDocument.Load(URL) method to
> load the xml file.  In this case, how do I tell the XML processor what
> encoding the file would be before I load the document?  

You don't have to tell the encoding, pass in the URL to the Load method
and the XML parser will check the XML declaration for the declared
encoding or will check for byte order mark and will then based on that
information decode the bytes served to characters. If that is not
possible you get an error.

> I've saved the
> sample XML file (dynamicaly generated from google map) from IE's File-
>> Save As... , and uploaded the file to http://www.usctimes.com/gmap/
> geo.xml .  It seems to open fine in the browser, does that means
> anything?

It also loads fine with .NET and the Load method of
System.Xml.XmlDocument so that file is properly encoded. And .NET parses
it just fine (tested with .NET 1.x and 2.0).

Signature

    Martin Honnen --- MVP XML
    http://JavaScript.FAQTs.com/

lamxing@gmail.com - 31 Jan 2007 16:46 GMT
Hi Martin,

      Thanks for the test result.  It seems that if I load the file I
saved earlier using XmlDocument.Load(), it worked fine.  But when I
tried to load the dynamic generated file directly from google map's
server, it will cause that "invalid character in the given encoding,
line 1, position 228" error.  Does that mean google map uses the wrong
encoding for that XML file?  I don't think I can post the complete
google map link here as the URL contains the google map API key.  But
the URL goes something like this:
http://maps.google.com/maps/geo?q=germaniastr%20134,%20berlin%20berlin&output=xm
l&key=GOOGLEKEY


     Any thoughts?

Chris

> lamx...@gmail.com wrote:
>
[quoted text clipped - 25 lines]
>         Martin Honnen --- MVP XML
>        http://JavaScript.FAQTs.com/
Martin Honnen - 31 Jan 2007 16:58 GMT
> It seems that if I load the file I
> saved earlier using XmlDocument.Load(), it worked fine.  But when I
> tried to load the dynamic generated file directly from google map's
> server, it will cause that "invalid character in the given encoding,
> line 1, position 228" error.  Does that mean google map uses the wrong
> encoding for that XML file?  

It means that the XML is not properly encoded.

Signature

    Martin Honnen --- MVP XML
    http://JavaScript.FAQTs.com/

lamxing@gmail.com - 31 Jan 2007 22:16 GMT
> lamx...@gmail.com wrote:
> > It seems that if I load the file I
[quoted text clipped - 10 lines]
>         Martin Honnen --- MVP XML
>        http://JavaScript.FAQTs.com/

Martin,

Do you have any suggestion on how can I load this dynamic file, or how
to make the xml document properly encoded?

Thanks!
Bjoern Hoehrmann - 01 Feb 2007 00:15 GMT
* lamxing@gmail.com wrote in microsoft.public.dotnet.xml:
>Do you have any suggestion on how can I load this dynamic file, or how
>to make the xml document properly encoded?

If the XML document is really not properly encoded, you should contact
Google to have their service fixed. Until then all you can do is try to
fix the XML document before parsing. For example, you could remove all
non-ASCII octets or you could transcode the document from Windows-1252
to UTF-8 using System.Text.Encoding.
Signature

Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

lamxing@gmail.com - 01 Feb 2007 18:25 GMT
> * lamx...@gmail.com wrote in microsoft.public.dotnet.xml:
>
[quoted text clipped - 10 lines]
> Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
> 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/

Hi Björn, Can you provide an example of how to save an online xml
document and transcode it to UTF-8 with System.Text.Encoding?  Thanks!
Helena Kotas [MSFT] - 06 Feb 2007 02:10 GMT
First you have to find out which encoding does the dynamic document use.
XmlDocument/XmlTextReader by default uses UTF-8 unless there is a BOM mark or
encoding attribute in the XML declaration that says something else. Once you
find out the encoding, create a StreamReader over the input stream and
specify the document's encoding in its constructor. Then create an XmlReader
over this StreamReader and use XmlDocument.Load to load the document.

If you are sure that the document's encoding is indeed UTF-8 and there is an
invalid character in it, you can create an instance of UTF8Encoding that will
ignore invalid characters (see the UTF8Encoding constuctor).

-Helena

> > * lamx...@gmail.com wrote in microsoft.public.dotnet.xml:
> >
[quoted text clipped - 13 lines]
> Hi Björn, Can you provide an example of how to save an online xml
> document and transcode it to UTF-8 with System.Text.Encoding?  Thanks!
Tim Heap - 22 Mar 2007 13:33 GMT
Help !
I have the same problem and need to remove funny characters from my
source xml file. Please can someone supply an example..

Tim Heap
Software & Database Manager
POSTAR Ltd
www.postar.co.uk
tim@postar.co.uk

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.