Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / ASP.NET / Web Services / November 2007

Tip: Looking for answers? Try searching our database.

Extra 'invisible' characters in soap packet

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
R. K. Wijayaratne - 21 Nov 2007 23:16 GMT
We are using .NET 2.0 and WSE 3.0 to call a Java web service. It sets
charcter limits to certain feilds (e.g. max 100 chars) and if there
are more than the expected number, it throws an error. So what we do
is we retrieve the data from the MSSQL database, truncate it to 100
characters if it is over the limit, and then call the web service.

The problem is sometimes extra 'invisible' characters get inserted
into the field data that take the field over the character limit.
These characters are there, but are not visible when I open the XML
logs files in Notepad, Visual Studio and Altova XMLSpy, but they are
visible when I open them in the free Context Editor.

For example note the extra 'Â' char in the field below, which takes
the character count to 101 and thus over the limit by 1.

<Neighbourhood>Elizabeth North is one of the older suburbs, with
development dating from the 1950s and 1960s, as mu</Neighbourhood>

It seems that these characters are 'invisible' to the .NET String
manipulation methods, which does not seem to count them when counting
characters for truncation.

Any ideas what is happening here???
Chris Mullins [MVP - C#] - 22 Nov 2007 00:01 GMT
Welcome to the Brave New Unicode World.

What's going on is that you have combining characters, "A" and "^", which
are actually in string as two seperate codepoints. When you "view" the
string, the display infrastructure turns that into a single graphme, and
shows it as a single character. This is by design.

The easiest thing to do, is to stop counting characters, and start counting
bytes. To get an accurate byte count, you need to know what encoding you're
using. Then you can ask the encoder, ".GetBytes()" and have it return you
the byte count. Be carefull that you don't just start chopping bytes though,
as you may end up cutting a surrorgate pair in half, and destroying your
string.

The .Net classes that deal with this stuff start with the StringInfo class.

The best place to start reading the Jon Skeet's primer on this stuff for
.Net developers:
http://www.yoda.arachsys.com/csharp/unicode.html

--
Chris Mullins

We are using .NET 2.0 and WSE 3.0 to call a Java web service. It sets
charcter limits to certain feilds (e.g. max 100 chars) and if there
are more than the expected number, it throws an error. So what we do
is we retrieve the data from the MSSQL database, truncate it to 100
characters if it is over the limit, and then call the web service.

The problem is sometimes extra 'invisible' characters get inserted
into the field data that take the field over the character limit.
These characters are there, but are not visible when I open the XML
logs files in Notepad, Visual Studio and Altova XMLSpy, but they are
visible when I open them in the free Context Editor.

For example note the extra 'Â' char in the field below, which takes
the character count to 101 and thus over the limit by 1.

<Neighbourhood>Elizabeth North is one of the older suburbs, with
development dating from the 1950s and 1960s, as mu</Neighbourhood>

It seems that these characters are 'invisible' to the .NET String
manipulation methods, which does not seem to count them when counting
characters for truncation.

Any ideas what is happening here???
R. K. Wijayaratne - 22 Nov 2007 04:06 GMT
Hello,

Thanks for your helpful reply. Can I ask how do we what you have
suggested below?

   "To get an accurate byte count, you need to know what encoding
you're using."

Do we target UTF8? Or do we need to find out what encoding the Java
web service uses and accommodate that (I think they are using ASCII)?

RKW.

On Nov 22, 11:01 am, "Chris Mullins [MVP - C#]" <cmull...@yahoo.com>
wrote:
> Welcome to the Brave New Unicode World.
>
[quoted text clipped - 41 lines]
>
> Any ideas what is happening here???
R. K. Wijayaratne - 23 Nov 2007 04:13 GMT
Converting the string to ASCII before truncating did the trick:

   Encoding asciiEnc = Encoding.ASCII;
   byte[] buffer = asciiEnc.GetBytes(myString);
   myString = asciiEnc.GetString(buffer);

> Hello,
>
[quoted text clipped - 59 lines]
>
> - Show quoted text -

Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.