Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / General / August 2006

Tip: Looking for answers? Try searching our database.

Characters missing when reading from file.

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
bart.kowalski@gmail.com - 28 Aug 2006 13:42 GMT
I'm trying to read a text file that contains international
(specifically Polish) characters line by line. I'm using the following
C# code:

FileStream lStream = new FileStream(pFileName, FileMode.Open);
using (StreamReader lReader = new StreamReader(lStream))
{
   string lLine;
   while ((lLine = lReader.ReadLine()) != null)
       ProcessLine(/* blah..blah */);
}

The problem is that all Polish characters are missing. It doesn't even
show them incorrectly. It just completely drops the Polish chars and
the string is shorter than expected as a result. Does anyone know how
to fix this?
Michael - 28 Aug 2006 18:17 GMT
Bart,

Just making a guess on this one.  Do you know what encoding the Polish file
is in?  Check out the StreamReader(Stream, Encoding) constructor.  By default
the stream is read in UTF8Encoding.  Chaging to the other constructor allows
you to specify ASCII, Unicode, UTF7 or UTF8.

Michael

> I'm trying to read a text file that contains international
> (specifically Polish) characters line by line. I'm using the following
[quoted text clipped - 12 lines]
> the string is shorter than expected as a result. Does anyone know how
> to fix this?
bart.kowalski@gmail.com - 29 Aug 2006 08:38 GMT
> Bart,
>
> Just making a guess on this one.  Do you know what encoding the Polish file
> is in?  Check out the StreamReader(Stream, Encoding) constructor.  By default
> the stream is read in UTF8Encoding.  Chaging to the other constructor allows
> you to specify ASCII, Unicode, UTF7 or UTF8.

Thanks. Do you know where I can get more information about the
character encoding?

Regards,
Bart.
Michael - 29 Aug 2006 15:53 GMT
That's the real question isn't it!  :)  Unfortunately, that really depends on
the source of the file.  If you are unable to ask the person that created the
file, try Unicode and keep your fingers crossed!

Michael

> > Bart,
> >
[quoted text clipped - 8 lines]
> Regards,
> Bart.
bart.kowalski@gmail.com - 29 Aug 2006 20:16 GMT
> That's the real question isn't it!  :)  Unfortunately, that really depends on
> the source of the file.  If you are unable to ask the person that created the
> file, try Unicode and keep your fingers crossed!

I found out that the file is in ASCII using the Eastern European code
page, and that's why it doesn't work. My question was where can I get
more information about using character encodings and conversions in
.NET, so that I can make it work. I found the MSDN documentation to be
rather short.

Thanks,
Bart.
Michael - 30 Aug 2006 15:31 GMT
You mean ANSI then, right?  Take a look at
System.Text.Encoding.GetEncoding().  

Resources to help you.  Good question.  I've bene fortunate, the last time I
had to deal with this was many years ago as we have been able to ensure that
files that we needed to parse used UTF8.    Try:

Links -
   overview - http://www.yoda.arachsys.com/csharp/unicode.html
   MS's Global Dev Portal - http://www.microsoft.com/globaldev/default.mspx

Books (I haven't look at any of these so don't know how good they are) -
    .NET Internationalization: The Developer's Guide to Building Global
Windows and Web Applications - http://www.bookpool.com/sm/0321341384
    Internationalization and Localization Using Microsoft .NET -
http://www.bookpool.com/sm/1590590023

Michael

> > That's the real question isn't it!  :)  Unfortunately, that really depends on
> > the source of the file.  If you are unable to ask the person that created the
[quoted text clipped - 8 lines]
> Thanks,
> Bart.
bart.kowalski@gmail.com - 31 Aug 2006 07:18 GMT
> You mean ANSI then, right?  Take a look at
> System.Text.Encoding.GetEncoding().
<snip>

Thanks. It works with GetEncoding(1250). The link you provided contains
some useful information too.

Regards,
Bart.
Cor Ligthert [MVP] - 30 Aug 2006 15:54 GMT
Bart,

Maybe does this help you to find the right code page you have to convert.

http://www.vb-tips.com/dbPages.aspx?ID=cca7e08a-9580-42b3-beff-76c81839e6c9

As the v is not used in Polish, does the rest of the world as far as I know
not use the l with hypen in it and therefore everybody outside Polen is
mostly saying Walensa.

You should see what "wauwelen" means in Dutch as you are not a fan of him

:-)

Cor

>> That's the real question isn't it!  :)  Unfortunately, that really
>> depends on
[quoted text clipped - 10 lines]
> Thanks,
> Bart.
Marc Gravell - 31 Aug 2006 06:24 GMT
You probably need to find out what encoding (or codepage) was used to
write the file, and pass that in, e.g.

new StreamReader(IStream, Encoding.UTF8)

or - if the file has byte order marks at the start, you /may/ be able
to auto-detect:

new StreamReader(IStream, true)

Marc

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.