Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / New Users / September 2007

Tip: Looking for answers? Try searching our database.

Object Hash of Contents

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Evan Camilleri - 31 Aug 2007 17:58 GMT
What is the fastest way to get the 'hash' (or CRC32 or whatever) of the
contents of an object

i don't care what's inside, I just want the 'hash' of its contents

(not to mixed with object.GetHashCode() which gives a hash code of an
instance, I want the hash of the data contents)

Evan
Peter Duniho - 31 Aug 2007 18:13 GMT
> What is the fastest way to get the 'hash' (or CRC32 or whatever) of the
> contents of an object

In both of your questions, you are overloooking a critical detail: there
is no consistent definition of an object's "contents".

You can come the closest, IMHO, for objects that can be serialized.
Then you can serialize them and use the resulting stream of data
(whether it's XML or binary) for your purposes.

But not all objects can be serialized, and unless you're talking about a
struct or class that only has value-type fields as data, there's not a
single block of data that represents the object.

If you can define "contents" in a way that allows you to address your
first question ("memory stream"), then the answer to the second ("hash")
can be discussed (though I think that answering "fastest" is
problematic, since any hash is going to have tradeoffs with respect to
speed versus how you intend to use it...the method that's truly
"fastest" may not work for your purposes).

Without a better definition and criteria for how you want to do this, I
don't think it's really possible to answer your question in a reasonable
way.

Pete
Michel Posseth  [MCP] - 01 Sep 2007 21:29 GMT
AFAIK

MD5 Hash is fast and pretty reliable for its purpose  CRC is more reliable
but also much slower

MD5 hash is often used to quickly  compare 2 binary`s ( are they the same or
not , idea; for update/ file synchronization  programs etc etc )
CRC is often used by compression program`s ( winzip , winrar etc etc ) to
check if they are exactly the same on byte level ( to check if the file not
has gone corrupted during deflation )

HTH

Michel

> What is the fastest way to get the 'hash' (or CRC32 or whatever) of the
> contents of an object
[quoted text clipped - 5 lines]
>
> Evan
Jon Skeet [C# MVP] - 01 Sep 2007 22:09 GMT
> MD5 Hash is fast and pretty reliable for its purpose  CRC is more reliable
> but also much slower

I'm pretty sure that's the wrong way round. In particular, CRCs don't
attempt to foil deliberate attempts to circumvent them.

> MD5 hash is often used to quickly  compare 2 binary`s ( are they the same or
> not , idea; for update/ file synchronization  programs etc etc )
> CRC is often used by compression program`s ( winzip , winrar etc etc ) to
> check if they are exactly the same on byte level ( to check if the file not
> has gone corrupted during deflation )

Well, MD5 can be used for the latter as well, but is harder to
deliberately fool.

MD5 isn't the safest hash algorithm around - there are ways to break it
in certain circumstances - but it's a lot safer from a tampering point
of view than CRC.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Arne Vajhøj - 02 Sep 2007 01:52 GMT
>> MD5 Hash is fast and pretty reliable for its purpose  CRC is more reliable
>> but also much slower
>
> I'm pretty sure that's the wrong way round. In particular, CRCs don't
> attempt to foil deliberate attempts to circumvent them.

No doubt about the reliable part.

Your assumption about the speed part is the same I had. CRC ought
to be much faster than MD5.

But apparently it is not.

#ZipLib CRC-32 is only about 5% faster than .NET MD5.

Maybe that implementation is not super good - Java CRC-32 is
40% faster that Java MD5, but still nowhere near the expected
difference.

I guess CRC's are really intended for hardware not for
software.

BTW, the CRC-32's I used are according to Wikipedia not a
real CRC, but are supposed to be faster than real CRC's, so ...

Arne
Michel Posseth  [MCP] - 02 Sep 2007 10:05 GMT
Strange ,,,,

Just one week ago , i was asked to create a network  file synchronization
mechanism wich did not care about file versions
"file remote different as the local version , copy it local" as we are
talking about hundreds of files and a total size of + 100 MB
i needed a fast way to check these files .

So i went digging on the web wich algorythm would be the fastest  and found
my previous conclusion on various websites

My project is finished and performs superb , but you are telling me now that
CRC ought to be faster but less reliable ??

Michel

>> MD5 Hash is fast and pretty reliable for its purpose  CRC is more
>> reliable
[quoted text clipped - 17 lines]
> in certain circumstances - but it's a lot safer from a tampering point
> of view than CRC.
Jon Skeet [C# MVP] - 02 Sep 2007 12:18 GMT
> Strange ,,,,
>
[quoted text clipped - 9 lines]
> My project is finished and performs superb , but you are telling me now that
> CRC ought to be faster but less reliable ??

It *may* be faster, depending on the exact implementation. However,
it's unlikely that the hash performance is going to be significant
compared with the IO cost. Hashing 100MB of data is likely to be very
quick with either algorithm.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Michel Posseth  [MCP] - 02 Sep 2007 14:58 GMT
>However,
> it's unlikely that the hash performance is going to be significant
> compared with the IO cost.

Yes ... that is a good one .. in my situation the cost of copying a file
that did not need replacement

However it seems that i got my implentation right as MD5 hash would be more
reliable but probably a bit slower as a CRC
and in my situation it turned out thet this is exactly what i need cause it
is more costly to copy the file over the intranet to the client

So i guess i had a lucky day when i wrote it :-)

regards

And thanks for sharing

Michel

>> Strange ,,,,
>>
[quoted text clipped - 16 lines]
> compared with the IO cost. Hashing 100MB of data is likely to be very
> quick with either algorithm.
Jon Skeet [C# MVP] - 02 Sep 2007 17:59 GMT
> >However,
> > it's unlikely that the hash performance is going to be significant
> > compared with the IO cost.
>
> Yes ... that is a good one .. in my situation the cost of copying a file
> that did not need replacement

What I meant is that the cost of reading a file in order to calculate
the hash is probably bigger than the computational cost of the hash.
Both MD5 and CRC will require the whole file to be read, so there's no
benefit there either.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Evan Camilleri - 04 Sep 2007 15:29 GMT
I actually wanted to see some code since I cannot find how to get MD5 or CRC
for the object's data

thanks

Evan

>> >However,
>> > it's unlikely that the hash performance is going to be significant
[quoted text clipped - 7 lines]
> Both MD5 and CRC will require the whole file to be read, so there's no
> benefit there either.
Jon Skeet [C# MVP] - 04 Sep 2007 15:37 GMT
On Sep 4, 3:29 pm, "Evan Camilleri" <e...@holisticrd.com.nospam>
wrote:
> I actually wanted to see some code since I cannot find how to get MD5 or CRC
> for the object's data

As Peter said, there's no such concept as "the object's data" that
makes taking an MD5 hash sensible in all cases.

What would the MD5 of a NetworkStream be? Would you have to read all
its contents to find out?

Jon

Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.