.NET Forum / Languages / C# / January 2008
how to generate unique Hash Code for string
|
|
Thread rating:  |
Ashish Khandelwal - 21 Dec 2007 12:47 GMT As MSDN is not giving us guarantee upon uniqueness of Hash Code, so could any one suggest me that how to generate a unique Hash Code for same string always, and generate different-2 Hash Code Different-2 string.
Mannen - 21 Dec 2007 12:57 GMT > As MSDN is not giving us guarantee upon uniqueness of Hash Code, so > could any one suggest me that how to generate a unique Hash Code for > same string always, and generate different-2 Hash Code Different-2 > string. MSDN can't guarantee this, because it is not possible to give such a guarantee.
Consider that the size of the hash is an int, and therefore 32 bits only (or 64 bits on a 64-bit system). This means that the contents of a string must be "squeezed" into those bits, which in turn means a lot of data is lost. For any given string bigger than the size of the hash, total uniqueness cannot be guaranteed. I have never run into any problems with GetHashCode(), but if you have, consider implementing your own MD5 (or whatever you feel appropriate.)
Ashish Khandelwal - 21 Dec 2007 13:04 GMT Thanks for reply.
> but if you have, consider implementing your own MD5 (or whatever you feel > appropriate.) Yes i am facing such problem, for exp. i am giving you "blair" and "brainlessness" strings, these both strings are returning the same Hash Code using GetHashCode() method, but as you said, use MD5, so can you please give me some input related to MD5 like can i reply on the output of this, and also as you were saying "or whatever you feel appropriate" so here just same was my question that what should be the way to get unique Hash Code, any idea...
Jon Skeet [C# MVP] - 21 Dec 2007 13:05 GMT > As MSDN is not giving us guarantee upon uniqueness of Hash Code, so > could any one suggest me that how to generate a unique Hash Code for > same string always, and generate different-2 Hash Code Different-2 > string. Sure, so long as you've got an infinite range of numbers... My guess is that you haven't though.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet World class .NET training in the UK: http://iterativetraining.co.uk
Kerem Gümrükcü - 21 Dec 2007 13:10 GMT Hi Ashish,
use the Managed MD5 or SHA1 Class to make a unique hash of your string. See this, it works with file streams, but the technique is the same:
http://download.chip.eu/de/KHash-Tools-1.0_1317168.html
I wrote this, because i needed a very fast and console based file hashing tool. There was none simple in www, so i wrote this. It is open-source,...
Hashes generated with MD5 and or SHA1 are always unique!
Regards
Kerem
P.S: Where does your name come from...i am just curious,...
 Signature ----------------------- Beste Grüsse / Best regards / Votre bien devoue Kerem Gümrükcü Microsoft Live Space: http://kerem-g.spaces.live.com/ Latest Open-Source Projects: http://entwicklung.junetz.de ----------------------- "This reply is provided as is, without warranty express or implied."
Lasse Vågsæther Karlsen - 21 Dec 2007 13:22 GMT > Hi Ashish, > [quoted text clipped - 9 lines] > > Hashes generated with MD5 and or SHA1 are always unique! No, they're not
It's just very improbable that you can either find two files with different content and the same hash, or manage to change a file and keep its original hash.
But they're not unique.
The only guaranteed way to produce a unique hash for a stream of X bytes is just to copy the entire stream.
 Signature Lasse Vågsæther Karlsen mailto:lasse@vkarlsen.no http://presentationmode.blogspot.com/
Kerem Gümrükcü - 21 Dec 2007 13:24 GMT Hi Lasse,
>The only guaranteed way to produce a unique hash for a stream of X bytes is >just to copy the entire stream. and thats what i asume here and why i wrote this!
Regards
Kerem
 Signature ----------------------- Beste Grüsse / Best regards / Votre bien devoue Kerem Gümrükcü Microsoft Live Space: http://kerem-g.spaces.live.com/ Latest Open-Source Projects: http://entwicklung.junetz.de ----------------------- "This reply is provided as is, without warranty express or implied."
Jon Skeet [C# MVP] - 21 Dec 2007 14:37 GMT > >The only guaranteed way to produce a unique hash for a stream of X bytes is > >just to copy the entire stream. > > and thats what i asume here and why i wrote this! But that goes completely against your other statement of:
"Hashes generated with MD5 and or SHA1 are always unique!"
which is entirely false.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet World class .NET training in the UK: http://iterativetraining.co.uk
Lasse Vågsæther Karlsen - 21 Dec 2007 13:25 GMT >> Hashes generated with MD5 and or SHA1 are always unique! > [quoted text clipped - 8 lines] > The only guaranteed way to produce a unique hash for a stream of X bytes > is just to copy the entire stream. And of course then it wouldn't be named a "hash".
 Signature Lasse Vågsæther Karlsen mailto:lasse@vkarlsen.no http://presentationmode.blogspot.com/
seani - 21 Dec 2007 15:38 GMT > > Hi Ashish, > [quoted text clipped - 15 lines] > different content and the same hash, or manage to change a file and keep > its original hash. Unfortunately it's becoming more and more "probable" as time goes on...
http://www.cits.rub.de/MD5Collisions/
...but this sort of meddling is unlikely to significantly change the OPs best-fit solution
Peter Duniho - 21 Dec 2007 17:38 GMT > [...] > The only guaranteed way to produce a unique hash for a stream of X bytes > is just to copy the entire stream. Pedantically speaking, that's not exactly true. There are a number of algorithsm that generate unique representations for the same data. Unique, that is, in the sense that using _a given_ algorithm, different input data will always produce different output data.
That's why file compression works. In fact, compressing a file is one obvious way to generate a "unique hash for a stream of X bytes" that is not in fact a copy of the entire stream.
Of course, both a compressed version of the file and an exact copy of the file are both not really what we'd call a hash anyway. But if you're going to accept an exact representation of the file as a hash, you have to accept any mapping of that representation to some other representation as a hash as well. :)
Pete
Ashish Khandelwal - 24 Dec 2007 07:27 GMT Thanks all for reply...
Now on the same issus,
-----See below code, string str = "blair"; string strValue = "Ashish"; string str1 = "brainlessness"; string strValue1 = "Khandelwal"; int hash = str.GetHashCode() ; // Returns 175803953 int hash1 = str1.GetHashCode(); // Returns 175803953 Hashtable ht = new Hashtable(); ht.Add(hash ,strValue); ht.Add(hash1,strValue1); // ****ERROR**** string strTmp = (string) ht[str]; string strTmp1 = (string) ht[hash1];
In Above code when i try to call GetHashCode() for both str and str1, it returns me same Hash Code '175803953', and that's why when i try to add into hashtable, exception generates which is normal (i know we cannot add same key twice). Now.... see below code
string str = "blair"; string strValue = "Ashish"; string str1 = "brainlessness"; string strValue1 = "Khandelwal"; Hashtable ht = new Hashtable(); ht.Add(str,strValue); ht.Add(str1,strValue1);
the above code runs perfectly without any error, so now here i want to understand one thing, as HashTable calls GetHashCode() method to get the Hash Code of passed key and as we show in the 1st example that the both strings are generating the same Hash Code so why there is no exception in the 2nd example,
Does HashTable use some other algorithm to generate the Hash Code of passed key? if so, i think then its always better to assign object directly as a key in stand of first generate the Hash Code and then assign it to HashTable as a key.
(My main concentration on String as a Key)
Please help me to understand...
Lasse Vågsæther Karlsen - 03 Jan 2008 10:38 GMT >> [...] >> The only guaranteed way to produce a unique hash for a stream of X [quoted text clipped - 6 lines] > > That's why file compression works. In fact, compressing a file is one <snip>
You are right of course. Keeping information entropy but reducing the total size is a good way to produce a smaller, unique, version of the original data.
Though, the definition of a hash function is that it typically produces a finite and fixed range of output bits for an infinite (or rather, unfixed) amount of input bits. As such, compression is not really hashing, but yes, it can produce something that is unique, but occupy fewer bits than the original data.
 Signature Lasse Vågsæther Karlsen mailto:lasse@vkarlsen.no http://presentationmode.blogspot.com/
Peter Duniho - 03 Jan 2008 18:34 GMT >>> [...] >>> The only guaranteed way to produce a unique hash for a stream of X [quoted text clipped - 8 lines] > a finite and fixed range of output bits for an infinite (or rather, > unfixed) amount of input bits. As such, compression is not really hashing Neither is "to copy the entire stream", per your suggestion. :)
Which is why I wrote "if you're going to accept an exact representation of the file as a hash, you have to accept any mapping of that representation to some other representation as a hash as well". Suggesting that compression is a valid hash algorithm is only correct in a context where one assumes your suggestion is valid as well; it's not actually a context I personally think is worth considering, but I was happy to accept it on your behalf for the sake of discussion.
I'm more than happy to agree that neither "copy the entire stream" (your suggestion ) or "compress the stream" (my alternative suggestion) is actually a hash algorithm. :)
Pete
Mannen - 21 Dec 2007 13:35 GMT > Hi Ashish, > [quoted text clipped - 9 lines] > > Hashes generated with MD5 and or SHA1 are always unique! No, they are not always unique, which is why they are called "one-way hash algorithms". They are not reversible, because data is lost. The problem still exists, but becomes less frequent (since MD5, for example, uses 128 bits instead of GetHashCode()'s 32 bits on a 32-bit system).
The reason for hash collisions is that n bits cannot be fit into x bits where n > x. Thus, when you hash a string of say 256 bits into 32 bits, 224 bits get lost because there is (obviously) no way to fit 256 bits into 32. What you can do, however, is combine the 256 bits into the 32 bits in a certain way (ie, the algorithm) to make it more likely that a unique bitpattern is produced. This is what the hashing algorithms are all about.
http://en.wikipedia.org/wiki/MD5
> Regards > > Kerem > > P.S: Where does your name come from...i am just curious,... Kerem Gümrükcü - 21 Dec 2007 13:49 GMT Hi Mannen,
>What you can do, however, is combine the 256 bits into the 32 bits in a >certain way (ie, the algorithm) to make it more likely that a unique >bitpattern is produced. This is what the hashing algorithms are all about. Sure, then you have to implement your own hashing class that works with MD5 as its base. I am familliar with MD5 and SHA1, i know its "pitfalls",...
Regards
Kerem
 Signature ----------------------- Beste Grüsse / Best regards / Votre bien devoue Kerem Gümrükcü Microsoft Live Space: http://kerem-g.spaces.live.com/ Latest Open-Source Projects: http://entwicklung.junetz.de ----------------------- "This reply is provided as is, without warranty express or implied."
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|