Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / November 2007

Tip: Looking for answers? Try searching our database.

Testing File Format

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Tom - 19 Nov 2007 01:19 GMT
Hi all,

I am looking for a smart way to assure a file is indeed a text file
within a C# method and not binary.

For example: Will "thisMysteryFile.dat" be legible if opened in a
RichTextBox ... or is it a binary file?

I have searched various methods in the string class and am having no
luck.

Under consideration >>

Open the file in a binary reader and then test either the first 1000
char or until File End and if any char are less than 32 or greater
than 127 ... then flag it as binary.

If not binary >> open in a RichTextBox

Can anyone tell me a more efficient way to accomplish this task?

Thanks !!
Peter Bromberg [C# MVP] - 19 Nov 2007 02:04 GMT
The first problem I see with the "under consideration" method is that there
are so many legitimate characters (mostly in languages other than English)
that will fall outside your ASCII code range. Unicode (which can certainly be
the contents of a "text file" supports 65536 characters.

--Peter
"Inside every large program, there is a small program trying to get out."
http://www.eggheadcafe.com
http://petesbloggerama.blogspot.com
http://www.blogmetafinder.com

> Hi all,
>
[quoted text clipped - 18 lines]
>
> Thanks !!
Tom - 19 Nov 2007 03:43 GMT
Peter -- Thanks. Your comments have me thinking outside the match box
in which I was stuck. I'm now digging into the RichTextBoxStreamType
enumeration >> UnicodePlainText.

I'll experiment with this enumeration and see if loading a binary data
file throws an exception. All this RichTextBox stuff is new for me ...
so I have a lot to learn for sure.

Perhaps a restricted load of a tiny size for a preview and then have
control buttons with "Load Full File" or "Clear RichTextBox" options?

Avoiding the accidental loading of a huge binary data file is part of
my objective. The other part of the objective is read only viewing the
small parameter data file as part of a data run initialization.

I am always amazed at how another's input can cause me to refocus.
Darn trees ruining my view of the forrest!! LOL

Have a great day. Thanks again!

-- Tom

>The first problem I see with the "under consideration" method is that there
>are so many legitimate characters (mostly in languages other than English)
[quoted text clipped - 29 lines]
>>
>> Thanks !!
Peter Duniho - 19 Nov 2007 04:05 GMT
> Peter -- Thanks. Your comments have me thinking outside the match box
> in which I was stuck. I'm now digging into the RichTextBoxStreamType
> enumeration >> UnicodePlainText.

If you do that, won't you limit your input to Unicode files?

I think that one approach would be to use a StreamReader to
automatically detect the encoding of the file for you, and then read
the first 1K or so, counting how many characters return true for the
Char.IsLetterOrDigit method and comparing that to the total number of
characters.

It still won't be perfect, but you should be able to come up with a
reasonably good heuristic regarding what the ratio of alphanumeric
characters to other characters you would expect to see in a text file.

Of course, you can still include the user in the determination.  For
example, run the above test and if the file passes go ahead and use it,
but if it fails provide the user with a chance to override your
analysis.  You could even do this just as you suggest: provide a brief
preview of the initial part of the file to the user so that they can
visually decide whether it's a file they want treated as text.

Caveat: I have basically no experience with non-alphabetic languages,
and I don't know if in a non-alphabetic language a word character would
be considered a "letter" for the purpose of the above test.  If that's
important to you, you'll want to verify that and/or find a form of
classification that will correctly detect those characters as text.

Pete
Tom - 19 Nov 2007 13:38 GMT
Pete --

Thank you! I am new to C# and I am exploring StreamReader a.s.a.p.

I work only in the English language and am not developing programs for
global distribution. Your methodology seems solid to this newb. Usage
of Char.IsLetterOrDigit would effectively provide some language
independence. That independence makes for a MUCH better tool than what
I had been focused upon.

Very, very thought provoking!

Again, thanks.  -- Tom

>> Peter -- Thanks. Your comments have me thinking outside the match box
>> in which I was stuck. I'm now digging into the RichTextBoxStreamType
[quoted text clipped - 26 lines]
>
>Pete
Mihai N. - 19 Nov 2007 07:41 GMT
> Unicode (which can certainly
> be the contents of a "text file" supports 65536 characters.

Unicode goes up to 10FFFF, which is a bit more than one million.
Other than that, very good warning :-)

Signature

Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Tom - 19 Nov 2007 14:24 GMT
Hey folks --

I've been rethinking my usage of RichTextBox long and hard. At first
it seemed the do all new magic class. For some tasks it is just that!
Accidentally opening a huge file from a ListView selection is
painfully slow and consumes resources like no tomorrow. Ouch.

What I really crave is a Text Viewer class without editing capability.
One that only loads a screen worth of text at a time. Where the thumb
is sized to reflect the file size and placement of the thumb loads
just that section of the data file. Like Petzold's painting with text
example from Programming Windows 95 ... only in .Net 2.0 C# and
integrated with a simpler TextBox? Or another text viewing control
that is more appropriate.

I'm still searching for such a Text Viewer. A search on "Thumb Size
.Net 2.0" led me to some graphics intensive TrackBarRenderer,
trackRectangle, thumbRectangle, etc. usage that goes way beyond the
WinForms book and C# Instructional Texts that I have. Certainly
steepening my learning curve!

My guess is someone has already duplicated that Petzold example in C#
2.0 and that I would learn more and faster from studying a guru's
coding than creating my own.

If anyone can point me towards such a useful, compact, and also
complex tool ... I would be without doubt grateful.

Thanks. -- Tom
Peter Duniho - 19 Nov 2007 18:46 GMT
> [...]
> I'm still searching for such a Text Viewer. A search on "Thumb Size
[quoted text clipped - 9 lines]
> If anyone can point me towards such a useful, compact, and also
> complex tool ... I would be without doubt grateful.

I'm not familiar with Petzold's examples, so I can't comment on that.  
As far as what you're asking about, I'm not aware of a specific
text-box implementation that does what you're talking about.  It
wouldn't be that hard to do, at least for the basic implementation
(duplicating the full functionality of the TextBoxBase classes would be
harder, but it sounds like you only need a minimal subset).

Interestingly, taking a suggestion from a different thread -- in which
someone suggested using  a ListBox to implement a console-output-like
control -- you could use the DataGridView in a similar way, taking
advantage of its "VirtualMode" mechanism.  Using that, the control
handles all of the display and you provide the code that virtualizes
the data rather than having it all in memory at once.

It could be overkill -- the DataGridView control has lots of stuff in
it that would be of no value for this purpose -- and you might have
trouble getting it to look just right, since the DataGridView does have
a specific look and I don't know if you could get rid of the elements
that would be distracting in this use.

But hey, when you're hacking stuff, you can't be picky.  :)

Pete
Tom - 20 Nov 2007 21:00 GMT
Pete --

Pete --

Pete --

Using a DataGrid is very thought provoking. Still beyond my beginner
status capabilities. The curiosity seed is however now planted and
when the right combination of skill accumulation and need occurs I
shall try to baby step my way into the Grid.

My data file picker/viewer is now working. :)

I went back to the RichTextBox usage. My attempts at "painting" text
just caused me headaches and errors. Mixing EventArgs from my ListView
object with PaintEventArgs to control the graphics is beyond my
understanding ... although I gave it many hours of effort.

What ended up being VERY helpful is >>

1) Using StreamReader.   ( ** Thank You ** for the suggestion !! )
2) Setting >> rtb.WordWrap = false; (rtb = RichTextBox object)
3) Implementing FontDialog for selecting the text attributes.
4) Using a line reading counter to limit loading too large of files.
5) Read only is set. Background is gray and I am ok with that.

The top left of my FilePicker is a Directory TreeView and the top
right is a File ListView. This top half acts like a weak version of
FileExplorer. The bottom splitter panel is a RichTextBox that is now
behaving very well!!

When I click on a huge binary data file >> the RichTextBox shows a
fast response 40 lines of gibberish. No big deal and NOT having the
entire file load is NICE! That rtb.LoadFile() can be a pain.

Usually viewing 40 lines is plenty for me to validate that the correct
parameter file is indeed selected ... so it's a winner. I have some
future enhancement ideas too. A two thumb sliding bar controller to
select where to start and end file input is one such idea. I'd like
better list control too. Currently my File ListView does not sort by
variable columns. I'd like to be able to sort by time in addition to
the current ascending FileName ordering.

Amazing how difficult these simple tasks are for beginners. And how
IMPOSSIBLY difficult these tasks use to be. Wow!! I can only shake my
head at the effort needed to duplicate my WinForm 2.0 functionality
using older methods. Ouch!!

Single screen file selection and quick viewing without concern for
accidentally editing it is sweet. A very functional tool I shall use
aplenty.

For emphasis >> All the comments I received were helpful and kept me
digging when I felt like waving the white flag. Now I am all smiles.
Dare I look at my long list of other projects? Yikes !!!

A great day to all.

-- Tom

>> [...]
>> I'm still searching for such a Text Viewer. A search on "Thumb Size
[quoted text clipped - 33 lines]
>
>Pete

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.