Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / CLR / October 2006

Tip: Looking for answers? Try searching our database.

Encoding difference in Vista breaks my app :(

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Tim_Mac - 06 Oct 2006 15:15 GMT
hi,
just getting Vista up and running now for some dev testing, seems great so
far.
i have an app that uses MD5 hashing on the passwords.  when i run the app in
Vista, the hashing code below gives a different result to what XP/Server
2003 computes, it is probably because i'm using GetString on binary content
but i'm sure i got this code off an MS sample somewhere...

public static string EncryptMd5(string text)
{
UTF8Encoding encoder = new UTF8Encoding();
MD5CryptoServiceProvider md5Hasher = new MD5CryptoServiceProvider();
byte[] hashedBytes = md5Hasher.ComputeHash(encoder.GetBytes(text));
return encoder.GetString(hashedBytes);
}

when this runs on Vista i get some extra unprintable characters included in
the string, which aren't there when i run it XP.  when i debug in Vista,
these additional characters are interleaved between the XP set of
characters, rendered as squares in the debug window. not sure how they will
appear in this post but i'll include them here anyway:
Vista:     ��E�Y����j��\fg
XP:        EYj g

When i look at the individual bytes before GetString is called, the
unprintable ones are above int 179 which seems to be where the ASCII table
goes beyond normally used characters. ref: http://www.lookuptables.com/

as a short term i hack i can regex out anything above 179 but i would really
like to understand it!

any help is greatly appreciated.
tim
Tim_Mac - 06 Oct 2006 15:57 GMT
i've done some more testing and found that it wasn't safe to discard
anything above ASCII 179.
a working version is to open up the hashed string into a Char array, and
then check if the integer value of each one is 65533.  If it is, the
character should be discarded because it would not exist if you run the same
code on Server 2003 or XP.

/// <summary>
/// This function hashes the text to MD5, in binary format
/// </summary>
public static string EncryptMd5Binary(string text)
{
   UTF8Encoding encoder = new UTF8Encoding();
   MD5CryptoServiceProvider md5Hasher = new MD5CryptoServiceProvider();
   byte[] hashedBytes = md5Hasher.ComputeHash(encoder.GetBytes(text));
   string hashedString = encoder.GetString(hashedBytes);
   string result = "";
   // strip out any chars that int to 65533, this only happens when running
on vista
   foreach (char c in hashedString.ToCharArray())
       if ((int)c != 65533)
           result += c;
   return result;
}

can anyone explain what the difference is?
thanks
tim
Jon Skeet [C# MVP] - 07 Oct 2006 08:00 GMT
> i've done some more testing and found that it wasn't safe to discard
> anything above ASCII 179.

Just as another point - there's no such thing as ASCII 179. ASCII is a
7-bit encoding.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Mattias Sjögren - 06 Oct 2006 17:07 GMT
>it is probably because i'm using GetString on binary content
>but i'm sure i got this code off an MS sample somewhere...

Regardless where you got it from, your code is broken. If you want to
represent random binary data as a string, you should consider using
Base64 encoding.

Mattias

Signature

Mattias Sjögren [C# MVP]  mattias @ mvps.org
http://www.msjogren.net/dotnet/ | http://www.dotnetinterop.com
Please reply only to the newsgroup.

Jon Skeet [C# MVP] - 07 Oct 2006 07:59 GMT
> just getting Vista up and running now for some dev testing, seems great so
> far.
[quoted text clipped - 10 lines]
>  return encoder.GetString(hashedBytes);
> }

Well, that code is broken to start with. ComputeHash returns arbitrary
binary data, which is unlikely to be a valid UTF-8 encoded string.

You should use something like base64 encoding to convert *arbitrary*
binary data into a string.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Tim_Mac - 07 Oct 2006 11:24 GMT
hey guys thanks for the clarification.  i'll have to keep the code
operational because i can't re-encode the passwords with a correct
algorithm, without knowing the passwords.

jon i suppose you mean 'arbitrary' in a technical sense that i'm not aware
of.  since it's MD5 it always returns an identical hash of a given string.
according to the SDK, UTF8Encoding.GetString "Decodes a sequence of bytes
into a string", which is what i want, and it has worked correctly for years
except for this new platform difference between Vista and previous windows
versions.

i have another version of the code which yields Hex digits, and is obviously
much safer to use but i'll have to live with the current setup.

thanks again for your help,
tim

>> just getting Vista up and running now for some dev testing, seems great
>> so
[quoted text clipped - 22 lines]
> See http://www.pobox.com/~skeet/csharp/unicode.html for more
> information.
Jon Skeet [C# MVP] - 07 Oct 2006 21:51 GMT
> hey guys thanks for the clarification.  i'll have to keep the code
> operational because i can't re-encode the passwords with a correct
> algorithm, without knowing the passwords.
>
> jon i suppose you mean 'arbitrary' in a technical sense that i'm not aware
> of.

I mean that it's binary data with no significance as far as the UTF-8
encoding is concerned. It could be *any* binary data. Not every
sequence of binary data is a valid UTF-8 string.

> since it's MD5 it always returns an identical hash of a given string.
> according to the SDK, UTF8Encoding.GetString "Decodes a sequence of bytes
> into a string", which is what i want, and it has worked correctly for years
> except for this new platform difference between Vista and previous windows
> versions.

It didn't work "correctly". It may have done something repeatable when
presented with a sequence of bytes which was not a valid UTF-8 encoded
string, but I don't believe that behaviour was documented, and I don't
think it's unreasonable to change it in Vista.

It's like relying on the results of GetHashcode from one framework
version (or even one run) to another, or accessing the UI from a
different thread: you may get away with it for a while, but that
doesn't mean the code is correct, or that you should rely on it working
in the future.

> i have another version of the code which yields Hex digits, and is obviously
> much safer to use but i'll have to live with the current setup.

I would start migrating code away from the flawed behaviour ASAP, if I
were you. You may need to support two formats or something like that
for a while, but relying on *just* the band-aid could well cause more
problems down the line.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Tim_Mac - 07 Oct 2006 23:48 GMT
hi jon,
you're absolutely right.  i can implement a correct solution in parallel and
add it in to a second password column in the database. eventually everyone
will have logged in to the site and i can disband the old code and badly
hashed passwords.

thanks for the convincing argument!

tim

>> hey guys thanks for the clarification.  i'll have to keep the code
>> operational because i can't re-encode the passwords with a correct
[quoted text clipped - 35 lines]
> for a while, but relying on *just* the band-aid could well cause more
> problems down the line.
Chris Mullins - 07 Oct 2006 19:57 GMT
> [Broken Code on Vista]
>
[quoted text clipped - 5 lines]
> return encoder.GetString(hashedBytes);
> }

I'm afraid your code is broken, and it's got naught to do with Vista. The
bytes being returned from ComputeHash aren't UTF8 (or even UTF16) bytes.
They're just random bytes.

Try using:
public static string EncryptMd5(string text)
{
   UnicodeEncoding encoder = new UnicodeEncoding();
   MD5CryptoServiceProvider md5Hasher = new MD5CryptoServiceProvider();
   byte[] hashedBytes = md5Hasher.ComputeHash(encoder.GetBytes(text));
   return Convert.ToBase64String(hashedBytes);
}

Also, was there a specific reason you were using the UFT8 encoder to get the
byte array? The native encoding is UTF16 (aka: Unicode Encoding), and if
there's no compelling reason to use soemthing else, you should just use
that.

Signature

Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins


Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.