Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / July 2007

Tip: Looking for answers? Try searching our database.

Compression size

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
VBA - 23 Jun 2007 03:48 GMT
I compressed a file with GZipStream class and is larger than the original
file.... how can this be?, the original file is 737 KB and the "compressed"
file is 1.1 MB. Did i miss something or is normal with that compression class?

Signature

VBA

Andrew Robinson - 23 Jun 2007 06:41 GMT
What type of file are you compressing? Some highly compressed files such as
images may grow in size when compressed a second time. Also, the Microsoft
algorithms are not idea but rather make an attempt to steer clear of patent
issues.

>I compressed a file with GZipStream class and is larger than the original
> file.... how can this be?, the original file is 737 KB and the
> "compressed"
> file is 1.1 MB. Did i miss something or is normal with that compression
> class?
VBA - 23 Jun 2007 06:52 GMT
First I compressed a txt file, i read that if a file is very small , the
compression can turn it larger in size, so then i tried with a mp3 file (not
sure if the file type matters) of 3.4 Mb, but turned it to 5.3
MB....so.....what's wrong??

Signature

VBA

> What type of file are you compressing? Some highly compressed files such as
> images may grow in size when compressed a second time. Also, the Microsoft
[quoted text clipped - 6 lines]
> > file is 1.1 MB. Did i miss something or is normal with that compression
> > class?
Scott C - 23 Jun 2007 08:10 GMT
> First I compressed a txt file, i read that if a file is very small , the
> compression can turn it larger in size, so then i tried with a mp3 file (not
> sure if the file type matters) of 3.4 Mb, but turned it to 5.3
> MB....so.....what's wrong??

MP3 is a compressed file... I bet you'd get better behavior with a 3.5
MB text file.

Scott
VBA - 23 Jun 2007 08:26 GMT
But i can only compress text files?? because a tried a while ago with a
pdf....and resulted the same..bigger size, but i don't know if a pdf file is
somehow compressed already.

by the way, when compressing a file, the resulting compressed file should be
with the same file extension? or i must use something like *.Z ????

Signature

VBA

> > First I compressed a txt file, i read that if a file is very small , the
> > compression can turn it larger in size, so then i tried with a mp3 file (not
[quoted text clipped - 5 lines]
>
> Scott
Marc Gravell - 23 Jun 2007 09:20 GMT
PDF can contain compressed graphics (and, IIRC, sometimes text), and
if it is encrypted the data can appear relatively random. Both of
these make it a poor choice for compression.

Put simply: some files compress very well indeed, and some don't. In
particular, those that are already compressed (or highly random) don't
tend to compress (and can get bigger).

Marc
Peter Duniho - 23 Jun 2007 11:26 GMT
> [...]
> by the way, when compressing a file, the resulting compressed file  
> should be
> with the same file extension? or i must use something like *.Z ????

You can name the compressed file whatever you like.  Of course, using the  
Gzip class, it's common to use the ".gz" extension for the output.  But  
there's no requirement that you do so.

Pete
Peter Duniho - 23 Jun 2007 11:28 GMT
> [...] Also, the Microsoft
> algorithms are not idea but rather make an attempt to steer clear of  
> patent
> issues.

GzipStream may not implement an ideal algorithm, but since Gzip itself is  
an open format, I doubt that patent issues are part of the question.
Tom Spink - 23 Jun 2007 10:35 GMT
> I compressed a file with GZipStream class and is larger than the original
> file.... how can this be?, the original file is 737 KB and the
> "compressed" file is 1.1 MB. Did i miss something or is normal with that
> compression class?

Hi VBA,

Random data is hard to compress, as compression techniques often work on
probabilities (e.g. Huffman encoding).  So, encrypted files and already
compressed files, such as MP3s, JPEGs, GIFs, etc will not compress at all.

Text documents written in English, or files containing sparse data (such as
BMPs and certain executables) will compress fairly well.  It all depends on
the compression algorithm.

You should choose an algorithm that's appropriate to the type of data you're
trying to compress... a bad algorithm will almost certainly result in
larger files.

But like I said at the start random data is hard if not damn near impossible
to compress.

Signature

Tom Spink
University of Edinburgh

VBA - 23 Jun 2007 17:16 GMT
Looks very interesenting all that you are telling me :)
I just now thought in a new question related it.... how does Winzip work?? i
mean you can put any file in a Winzip file and compress it, and i read in a
book that uses a similar compression algorithm, is that another type a
compression or you could do a similar software in .NET using GZipStream????

Signature

VBA

> > I compressed a file with GZipStream class and is larger than the original
> > file.... how can this be?, the original file is 737 KB and the
[quoted text clipped - 17 lines]
> But like I said at the start random data is hard if not damn near impossible
> to compress.
Peter Duniho - 23 Jun 2007 18:24 GMT
> Looks very interesenting all that you are telling me :)
> I just now thought in a new question related it.... how does Winzip  
> work??

Two standard compression algorithms on which much (nearly all, actually,  
as far as I know) of our lossless compression tools are built on are  
Huffman encoding and the Lempel-Ziv-Welch algorithm.  I don't have  
specifics on the exact implementation of WinZip, but I gather that like  
all "zip" variations, it uses some forms of these algorithms.

If you want to have a better idea of how various compression schemes work,  
the place to start is reading about these basic algorithms.

> i mean you can put any file in a Winzip file and compress it, and i read  
> in a
> book that uses a similar compression algorithm, is that another type a
> compression or you could do a similar software in .NET using  
> GZipStream????

You can't "put any file in a Winzip file and compress it".  Typically,  
something like WinZip will try a variety of specific compression  
algorithms to see which performs best (each variation of a given algorithm  
may perform differently, depending on the content and structure of the  
data).  In some cases, no compression algorithm will reduce the size, or  
will reduce it significantly, and the original data will be used.  But  
inclusion of file headers and other information will increase the file  
size at least a little.

Note that the GzipStream class does not have the entire data before it  
must make decisions about how to compress the data.  As far as I know, it  
just uses a single "best general case" version of the "deflate" algorithm  
(based on Huffman and LZW).  In any case, it's guaranteed that GzipStream  
doesn't have the ability to pick from a variety of algorithms to use the  
best-performing one, as something like WinZip can.

Again, I don't know specifically how WinZip works, but all compression  
tools have this basic behavior.  There is not a single compression tool  
that is guaranteed to reduce the size of the data.

Pete
Arne Vajhøj - 02 Jul 2007 03:56 GMT
>> Looks very interesenting all that you are telling me :)
>> I just now thought in a new question related it.... how does Winzip
[quoted text clipped - 5 lines]
> specifics on the exact implementation of WinZip, but I gather that like
> all "zip" variations, it uses some forms of these algorithms.

Absolutely untrue.

LZ78 (LZW) is used in traditional Unix compress.

But ZIP and GZip uses LZ77.

Both often combined with either Huffman or Arithmetic encoding.

BZip uses Burrows Wheeler.

>> i mean you can put any file in a Winzip file and compress it, and i
>> read in a
[quoted text clipped - 17 lines]
> GzipStream doesn't have the ability to pick from a variety of algorithms
> to use the best-performing one, as something like WinZip can.

I would assume that WinZip only uses the possibilities within the
Zip format and not some custom format.

And deflate is still LZ77 not LZ78 (LZW).

Arne
Peter Duniho - 02 Jul 2007 04:00 GMT
>>  Two standard compression algorithms on which much (nearly all,  
>> actually, as far as I know) of our lossless compression tools are built  
[quoted text clipped - 3 lines]
>
> Absolutely untrue.

Okay.

> LZ78 (LZW) is used in traditional Unix compress.
>
> But ZIP and GZip uses LZ77.
>
> Both often combined with either Huffman or Arithmetic encoding.

That's what I said.  I thought you said what I said was "absolutely  
untrue".

Maybe the word "absolutely" means something different in your native  
language?  Here, it's used to emphasize, rather than to negate.

Pete
Arne Vajhøj - 03 Jul 2007 03:27 GMT
>>>  Two standard compression algorithms on which much (nearly all,
>>> actually, as far as I know) of our lossless compression tools are
[quoted text clipped - 18 lines]
> Maybe the word "absolutely" means something different in your native
> language?  Here, it's used to emphasize, rather than to negate.

????

You said that nearly all lossless compression tools are build on LZW.

That is absolute untrue or complete bullshit or whatever you want
to call it.

It even explained why: that ZIP and GZip does not use LZW. And they
are a lot more used than good old Unix Compress.

Arne
Peter Duniho - 03 Jul 2007 03:35 GMT
> You said that nearly all lossless compression tools are build on LZW.

I wrote (and you quoted) "WinZip...uses some forms of these algorithms".

In what way is LZ77 (the algorithm you wrote is used with the ZIP format)  
_not_ "some form" of the LZW algorithm?

> That is absolute untrue or complete bullshit or whatever you want
> to call it.

My statement was just fine, and your own claims even confirm that.  You  
can continue to write asinine things like "absolute untrue" and "complete  
bullshit" as much as you like, there was nothing wrong with my post.  
Furthermore, your posts continue to insult without educating.

If you have an actual point, try making it without being such an a.s.

Thanks,
Pete
Arne Vajhøj - 04 Jul 2007 01:02 GMT
>> You said that nearly all lossless compression tools are build on LZW.
>
> I wrote (and you quoted) "WinZip...uses some forms of these algorithms".
>
> In what way is LZ77 (the algorithm you wrote is used with the ZIP
> format) _not_ "some form" of the LZW algorithm?

No.

Not code wise. Not patent wise. Not in any way.

>> That is absolute untrue or complete bullshit or whatever you want
>> to call it.
>
> My statement was just fine, and your own claims even confirm that.

Bullshit.

> Furthermore, your posts continue to insult without educating.

I have tried multiple times to explain to you that the most
widely used compression algorithms does not use LZW they use
LZ77.

That is educational.

That you refuse to understand it does not make it less educational.

> If you have an actual point, try making it without being such an a.s.

It seems as if you just have difficulties understanding the point.

Arne
Peter Duniho - 04 Jul 2007 01:29 GMT
> It seems as if you just have difficulties understanding the point.

When you make a point that is comprehensible, then I will start worrying  
about whether I understand it.
Arne Vajhøj - 05 Jul 2007 00:37 GMT
>> It seems as if you just have difficulties understanding the point.
>
> When you make a point that is comprehensible, then I will start worrying
> about whether I understand it.

So you did not understand the following:

#> In what way is LZ77 (the algorithm you wrote is used with the ZIP
#> format) _not_ "some form" of the LZW algorithm?
#
#No.
#
#Not code wise. Not patent wise. Not in any way.

LZW is a completely different algorithm than LZ77. An implementation
will be different code. The infamous LZW patent does not apply to LZ77.

It is difficult to understand ?

Arne
Peter Duniho - 05 Jul 2007 00:54 GMT
> [...]
> LZW is a completely different algorithm than LZ77. An implementation
> will be different code. The infamous LZW patent does not apply to LZ77.

You have a very strange concept of these absolute terms you're using:  
"absolutely untrue", "complete bullshit", "completely different  
algorithm", etc.

LZW is _not_ a COMPLETELY different algorithm.  A COMPLETELY different  
algorithm would share absolutely zero similarities.

All of the algorithms spawned by Lempel and Ziv, including the LZW  
algorithm, share various similarities.  Some have more similarities in  
common than others, but they are ALL "some form" of each other.  They all  
share the same heritage, and in many ways address similar problems with  
similar approaches.  All of the LZ-based algorithms, being  
dictionary-based, are much more similar to each other than they are to,  
for example, Huffman encoding.

The question of a patent is completely irrelevant, by the way.  Even  
assuming that software patents make sense in the first place, it doesn't  
take much for a patent to be inapplicable to closely related code.  Most  
software patents are written narrowly, for the very reason that it's too  
easy to invalidate a broadly-written patent.  As such, relatively minor  
variations can results in two otherwise closely related algorithms not  
sharing patent protection (see MP3 versus other similar  
psychoacoustics-based audio compression algorithms, for example).

You seem to have this pathological need to find fault in whatever has been  
written, at least with respect to my own posts, regardless of how  
contrivedly narrow you have to interpret what was actually written, even  
to the point of completely ignoring whatever intent actually existed in  
what was written.

Frankly, I find _that_ to be "complete bullshit", and I'm sick and tired  
of it.  I go to a lot of trouble to make what I write as correct as I can,  
and to make it clear where my first-hand knowledge of something is vague  
or incomplete.  When someone posts a _valid_ correction to something I've  
written, I have no problem acknowledging my mistake, and I've posted my  
share of "mea culpas" here in this newsgroup and others.

I find your insistence on finding fault with my posts where no fault  
exists to be idiotic.  I wish you would cut it out.

Pete
Arne Vajhøj - 05 Jul 2007 01:06 GMT
> LZW is _not_ a COMPLETELY different algorithm.  A COMPLETELY different
> algorithm would share absolutely zero similarities.
[quoted text clipped - 6 lines]
> dictionary-based, are much more similar to each other than they are to,
> for example, Huffman encoding.

LZ77 and LZW are both dictionary based, but that does not make LZ77
a form of LZW.

> You seem to have this pathological need to find fault in whatever has
> been written, at least with respect to my own posts, regardless of how
> contrivedly narrow you have to interpret what was actually written, even
> to the point of completely ignoring whatever intent actually existed in
> what was written.

Let us take a step back.

You started by writing:

#Two standard compression algorithms on which much (nearly all,
#actually, as far as I know) of our lossless compression tools are built
#on are Huffman encoding and the Lempel-Ziv-Welch algorithm.

I replied:

#Absolutely untrue.
#
#LZ78 (LZW) is used in traditional Unix compress.
#
#But ZIP and GZip uses LZ77.

That is not an interpretation. What you wrote was plain wrong.

The most common compression tools does not use LZW.

> Frankly, I find _that_ to be "complete bullshit", and I'm sick and tired
> of it.  I go to a lot of trouble to make what I write as correct as I
> can, and to make it clear where my first-hand knowledge of something is
> vague or incomplete.  When someone posts a _valid_ correction to
> something I've written, I have no problem acknowledging my mistake, and
> I've posted my share of "mea culpas" here in this newsgroup and others.

Well in this case you have tried to cover your mistake with various
lame excuses:

#In what way is LZ77 (the algorithm you wrote is used with the ZIP
#format) _not_ "some form" of the LZW algorithm?

instead of just admitting that you remembered wrong regarding LZW.

Arne
Peter Duniho - 05 Jul 2007 01:29 GMT
> LZ77 and LZW are both dictionary based, but that does not make LZ77
> a form of LZW.

Why not?  Who are you that you get to define what "a form" is?  Why is  
your definition any more important or correct than mine?  Where is the  
"official" definition of "a form" on which you base your claim?

I have explained my basis for my usage of the phrase "some form" or "a  
form".  You have not bothered to explain your basis, but even if you  
should happen to, why would your explanation take priority over mine with  
respect to interpreting what *I* wrote?

You have a pretty arrogant view of your own importance in how language  
should be used, especially when it comes to the intent of someone _else's_  
use of language.

> [...]
> Well in this case you have tried to cover your mistake with various
> lame excuses:

Baloney.  I made no mistake, and I stand by my original post.  I am not  
trying to "cover" anything.  It is only your pathological need to find  
fault that has resulted in this inane sub-thread.

And inane it is.  Frankly, I'm a bit embarassed to have even bothered  
feeding your troll-like behavior, and I'm done.

To anyone else who has rightly identified this as a useless sub-thread, I  
apologize for it and promise that my involvement with it, as well as more  
generally with Arne's continued insistence on finding fault where none  
exists, is over with.  Life's too short to waste time on idiotic stuff  
like this.

Pete
Arne Vajhøj - 02 Jul 2007 03:49 GMT
> But like I said at the start random data is hard if not damn near impossible
> to compress.

Some define random data as being data that are uncompressable ...

:-)

Arne

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.