Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / New Users / March 2008

Tip: Looking for answers? Try searching our database.

StreamReader and NULL characters

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
brian.gabriel@gmail.com - 21 Mar 2008 15:39 GMT
I am trying to read a file that was produced on a mainframe, it is a
text output of a mainframe report.  I am able to read in part of the
file, the problem is that between each "page" of the report there is a
NULL character.  The streamreader will read up until it sees this
character then it stops, the end result being that most of the file
does not get read.

Here is the byte pattern that is at the end of each page: 0D 0A 00 0C
0D 0A

Is there an easy way to read in this file without having to put it
into a byte array and futzing with all of that?

I have tried the following:

StreamReader re = File.OpenText(SourceFile);
string sTemp = re.ReadToEnd();

And:

StreamReader re = File.OpenText(SourceFile);
while((sTemp = re.ReadLine()) != null)
{
 sLine = sTemp;
 //etc...
}

Thanks,

Brian
brian.gabriel@gmail.com
Cowboy (Gregory A. Beamer) - 21 Mar 2008 16:04 GMT
NULL chars create a problem. One way to tackle this is to work with command
line tools to change the NULL char into another char, or even separate into
pages. This works well if the application in question is loading the data;
it is not a good solution for an application that reads the file(s) every
time someone uses a specific function, however.

You might be able to use a FileStream and pull the data in as binary. If
nothing else, you can look for the byte pattern 13-10-0-12-13-10 and replace
it with something like 2 CRLFs (13-10). The cleaned file can be read through
the StreamReader. You can also keep everything binary, if you feel
comfortable in that world. :-)

Signature

Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA

Subscribe to my blog
http://gregorybeamer.spaces.live.com/lists/feed.rss

or just read it:
http://gregorybeamer.spaces.live.com/

*************************************************

| Think outside the box!

*************************************************
>I am trying to read a file that was produced on a mainframe, it is a
> text output of a mainframe report.  I am able to read in part of the
[quoted text clipped - 27 lines]
> Brian
> brian.gabriel@gmail.com
Phil Wilson - 21 Mar 2008 20:55 GMT
From my mainframe days I can see that what you have there is Ebcdic, 0D =
CR, 0A = LF, 0C = form feed. I'm wondering if this means that you could use
the StreamReader ctor that takes an Encoding for an 8 bit Ebcdic and it
would just work.  Anyway, knowing that this is Ebcdic CR LF FF might help.
Signature

Phil Wilson
[MVP Windows Installer]

>I am trying to read a file that was produced on a mainframe, it is a
> text output of a mainframe report.  I am able to read in part of the
[quoted text clipped - 27 lines]
> Brian
> brian.gabriel@gmail.com
Norman Diamond - 24 Mar 2008 09:50 GMT
Those are ASCII control codes, copied into ANSI and ISO code pages.

In EBCDIC, 0x0D = CR but no one used it, 0x25 = LF but no one used it, 0x15
= NL (newline) and it was occasionally used, and 0x0C = FF and I don't know
if anyone used it.

So the original poster receives data that have already been converted from
EBCDIC to ASCII, but that's not the problem.  The problem is that
StreamReader chokes when it hits a 0x00 (NUL in both ASCII and EBCDIC).

> From my mainframe days I can see that what you have there is Ebcdic, 0D =
> CR, 0A = LF, 0C = form feed. I'm wondering if this means that you could
[quoted text clipped - 32 lines]
>> Brian
>> brian.gabriel@gmail.com
Jon Skeet [C# MVP] - 24 Mar 2008 11:09 GMT
> Those are ASCII control codes, copied into ANSI and ISO code pages.
>
[quoted text clipped - 5 lines]
> EBCDIC to ASCII, but that's not the problem.  The problem is that
> StreamReader chokes when it hits a 0x00 (NUL in both ASCII and EBCDIC).

StreamReader doesn't choke on null characters. Here's an example:

using System;
using System.IO;
using System.Text;

class Test
{
   static void Main(string[] args)
   {
       // A NUL B NUL C
       byte[] data = { 65, 0, 66, 0, 67 };
       
       using (MemoryStream stream = new MemoryStream(data))
       using (StreamReader reader = new StreamReader
                  (stream, Encoding.ASCII))
       {
           string line = reader.ReadLine();
           for (int i=0; i < line.Length; i++)
           {
               Console.WriteLine("{0}: {1}", i,
                                 line[i]=='\0'
                                 ? "NUL"
                                 : line[i].ToString());
           }
       }
   }
}

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk

Cowboy (Gregory A. Beamer) - 24 Mar 2008 13:59 GMT
He was choking on NULL char due to the way he was looping:

while (!null == (line = reader.ReadLine()))
{
}

As soon as he hits a null char, he gets and end of read. I use this type of
loop, as well, as it is rather simple, but it will choke on (char) 0.

A method I employed, at one time, for EBCDIC, is running everything binary
until it needed to be text. Much more efficient to stay in the binary world,
from a perf perspective, but also harder to program, as we do not think in
0s and 1s. :-)

Signature

Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA

Subscribe to my blog
http://gregorybeamer.spaces.live.com/lists/feed.rss

or just read it:
http://gregorybeamer.spaces.live.com/

*************************************************

| Think outside the box!

*************************************************
>> Those are ASCII control codes, copied into ANSI and ISO code pages.
>>
[quoted text clipped - 37 lines]
>    }
> }
Jon Skeet [C# MVP] - 24 Mar 2008 14:20 GMT
> He was choking on NULL char due to the way he was looping:
>
[quoted text clipped - 3 lines]
>
> As soon as he hits a null char, he gets and end of read.

He shouldn't do - my example shows two null characters been embedded in
a line read with a single call of ReadLine.

> I use this type of
> loop, as well, as it is rather simple, but it will choke on (char) 0.

Could you give an example of this, bearing mind my earlier example?
Here's another example using the OP's sample data - again, it doesn't
show StreamReader failing with null characters:

using System;
using System.IO;
using System.Text;

class Test
{
   static void Main(string[] args)
   {
       // A B
       byte[] data = { 65, 66, 0x0d, 0x0a, 0,
               0x0c, 0x0d, 0x0a, 67, 68};
       
       using (MemoryStream stream = new MemoryStream(data))
       using (StreamReader reader = new StreamReader(stream))
       {
           string line;
           while ((line=reader.ReadLine()) != null)
           {
               Console.WriteLine ("Next line:");
               for (int i=0; i < line.Length; i++)
               {
                   Console.WriteLine("{0}: {1}", i,
                                     line[i]=='\0'
                                     ? "NUL"
                                     : line[i].ToString());
               }
           }
       }
   }
}

> A method I employed, at one time, for EBCDIC, is running everything binary
> until it needed to be text. Much more efficient to stay in the binary world,
> from a perf perspective, but also harder to program, as we do not think in
> 0s and 1s. :-)

It also means that *you* need to worry about all the nasty issues of
text handling rather than getting the framework to do it.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk

brian.gabriel@gmail.com - 24 Mar 2008 17:07 GMT
Thanks Jon!

After seeing your example I created a much small text file with the
offending characters and was able to better debug the issue.  The
issue is not with the StreamReader, but with displaying the results in
a text box.  A simple Replace removed the offending characters and
everything is displaying properly.

Again thanks for the help.

Brian
brian.gabriel@gmail.com

> Could you give an example of this, bearing mind my earlier example?
> Here's another example using the OP's sample data - again, it doesn't
[quoted text clipped - 42 lines]
> Jon Skeet - <sk...@pobox.com>http://www.pobox.com/~skeet  Blog:http://www.msmvps.com/jon.skeet
> World class .NET training in the UK:http://iterativetraining.co.uk
Cowboy (Gregory A. Beamer) - 26 Mar 2008 01:19 GMT
Duh!

I should have thought about that. The first NULL char will destroy display
at that point.

Signature

Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA

Subscribe to my blog
http://gregorybeamer.spaces.live.com/lists/feed.rss

or just read it:
http://gregorybeamer.spaces.live.com/

*************************************************

| Think outside the box!

*************************************************
Thanks Jon!

After seeing your example I created a much small text file with the
offending characters and was able to better debug the issue.  The
issue is not with the StreamReader, but with displaying the results in
a text box.  A simple Replace removed the offending characters and
everything is displaying properly.

Again thanks for the help.

Brian
brian.gabriel@gmail.com

On Mar 24, 9:20 am, Jon Skeet [C# MVP] <sk...@pobox.com> wrote:
> Could you give an example of this, bearing mind my earlier example?
> Here's another example using the OP's sample data - again, it doesn't
[quoted text clipped - 46 lines]
> Blog:http://www.msmvps.com/jon.skeet
> World class .NET training in the UK:http://iterativetraining.co.uk
Phil Wilson - 24 Mar 2008 17:20 GMT
What do you mean by "nobody used it"? I used to see it all the time on
printers, some TTY green screens etc. It was (is?) very common in the 8-bit
EBCDIC mainframe world.
Signature

Phil Wilson
[MVP Windows Installer]

> Those are ASCII control codes, copied into ANSI and ISO code pages.
>
[quoted text clipped - 42 lines]
>>> Brian
>>> brian.gabriel@gmail.com
Norman Diamond - 25 Mar 2008 01:25 GMT
OK, I never saw the 0x0D 0x25 sequence used in EBCDIC in the TTY green
screen world, I saw the 0x15 single byte newline character used in EBCDIC in
the TTY green screen world.  If the TTY firmware (or hardware) used ASCII
then the driver would have to translate from EBCDIC to ASCII, but I didn't
see anyone do that translation for TTYs at the application level.

> What do you mean by "nobody used it"? I used to see it all the time on
> printers, some TTY green screens etc. It was (is?) very common in the
[quoted text clipped - 45 lines]
>>>> Brian
>>>> brian.gabriel@gmail.com

Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.