.NET Forum / .NET Framework / General / July 2004
Convert DOS Cyrillic text to Unicode
|
|
Thread rating:  |
Nikolay Petrov - 27 Jul 2004 07:44 GMT How can I convert DOS cyrillic text to Unicode
Jon Skeet [C# MVP] - 27 Jul 2004 08:16 GMT > How can I convert DOS cyrillic text to Unicode See http://www.pobox.com/~skeet/csharp/unicode.html
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
Nikolay Petrov - 27 Jul 2004 09:52 GMT I have read this and other info in Unicode topic My question is how can I do it in VB. I need the code.
> > How can I convert DOS cyrillic text to Unicode > > See http://www.pobox.com/~skeet/csharp/unicode.html Jon Skeet [C# MVP] - 27 Jul 2004 10:15 GMT > I have read this and other info in Unicode topic > My question is how can I do it in VB. I need the code. I provide some C# code to read a file in one encoding and write it in another. It's very simple code - it should be easy to understand and rewrite in VB.NET. The important thing is really just the creation of the StreamReader with the right encoding.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
Nikolay Petrov - 27 Jul 2004 10:25 GMT My problem is that I don't read file. The DOS Cyrillic text is pasted in a textbox, and should apear in another. That's all. I don't have anyting in Binary.
> > I have read this and other info in Unicode topic > > My question is how can I do it in VB. I need the code. [quoted text clipped - 3 lines] > rewrite in VB.NET. The important thing is really just the creation of > the StreamReader with the right encoding. Jon Skeet [C# MVP] - 27 Jul 2004 10:48 GMT > My problem is that I don't read file. > The DOS Cyrillic text is pasted in a textbox, and should apear in another. > That's all. > I don't have anyting in Binary. If it's in a text box, you should have it as Unicode text already. All strings are in Unicode in .NET.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
Cor Ligthert - 27 Jul 2004 10:33 GMT Hi Jon,
I pointed Nikolay in the language.VB newsgroup on you and Jay B, who has answered a message in language.VB however as well not complete enough for Nikolay. Jay B will probably not be active on this newsgroup before 13:00 GMT.
I am curious as well, what is the right encoding you think about for this Cyrillic problem?
Nikolas wrote in the language VB group that he past it from a notepad so I guess UTF16?
:-) Cor
...
> > I have read this and other info in Unicode topic > > My question is how can I do it in VB. I need the code. [quoted text clipped - 5 lines] > > -- Jon Skeet [C# MVP] - 27 Jul 2004 10:49 GMT > I pointed Nikolay in the language.VB newsgroup on you and Jay B, who has > answered a message in language.VB however as well not complete enough for [quoted text clipped - 3 lines] > I am curious as well, what is the right encoding you think about for this > Cyrillic problem? Not sure - but it sounds like it won't actually be a problem, as if he's got the data in notepad to start with, there's no encoding change required - cut and paste should sort everything out.
> Nikolas wrote in the language VB group that he past it from a notepad > so I guess UTF16? No way - DOS precedes UTF16 by a long time!
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
Nikolay Petrov - 27 Jul 2004 11:09 GMT The user pasts text from text files, which contain DOS Cyrillic characters. When they are pasted in text box or even in the Notepad windows they look like garbage. I am not sure, can I post a file here as attachment, so you can see it?
> > I have read this and other info in Unicode topic > > My question is how can I do it in VB. I need the code. [quoted text clipped - 3 lines] > rewrite in VB.NET. The important thing is really just the creation of > the StreamReader with the right encoding. Jon Skeet [C# MVP] - 27 Jul 2004 11:26 GMT > The user pasts text from text files, which contain DOS Cyrillic characters. What does he have the text open in? It sounds like the existing app is probably not putting it into the clipboard in Unicode :(
> When they are pasted in text box or even in the Notepad windows they look > like garbage. Ah - I thought you meant he had it working in notepad to start with.
> I am not sure, can I post a file here as attachment, so you can see it? It's probably best if you email it to me.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
Cor Ligthert - 27 Jul 2004 11:31 GMT Hi John,
>It's probably best if you email it to me. I am also interested in this question, so why not mail to the newsgroup?
Cor
Jon Skeet [C# MVP] - 27 Jul 2004 11:46 GMT > >It's probably best if you email it to me. > > I am also interested in this question, so why not mail to the > newsgroup? It's more that depending on the way of attaching the file, it might get converted during the attachment process - that's less likely to happen in a mail message.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
Cor Ligthert - 27 Jul 2004 11:56 GMT > It's more that depending on the way of attaching the file, it might get > converted during the attachment process - that's less likely to happen > in a mail message. So I wait the results and than you can maybe send it to me when all is clear?
Cor
Jon Skeet [C# MVP] - 27 Jul 2004 12:09 GMT > > It's more that depending on the way of attaching the file, it might get > > converted during the attachment process - that's less likely to happen > > in a mail message.
> So I wait the results and than you can maybe send it to me when all is > clear? Yup, sure. I suspect there's nothing particularly interesting about the file though - it's just I should be able to work out what encoding it's in, so that if the OP *does* want to read it directly (rather than with c'n'p) he should be able to.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
Nikolay Petrov - 27 Jul 2004 12:16 GMT Ok guys, I have mailed it to both of you
I'll also but some of this DOS text here, case anyone else is interested
???<?'? ?? 6 ?. 2004??".
> > The user pasts text from text files, which contain DOS Cyrillic characters. > [quoted text clipped - 9 lines] > > It's probably best if you email it to me. Nikolay Petrov - 27 Jul 2004 14:24 GMT New problem ;-( Text is encoded partialy. All calital letters are fine, and some of the lower, but not all. What may coused this?
> How can I convert DOS cyrillic text to Unicode Jon Skeet [C# MVP] - 27 Jul 2004 14:44 GMT > New problem ;-( > Text is encoded partialy. At what stage?
> All calital letters are fine, and some of the lower, but not all. > What may coused this? No idea - are you saying the original files are corrupt, basically?
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
Paul Gorodyansky - 31 Jul 2004 00:38 GMT Hi,
> New problem ;-( > Text is encoded partialy. > All calital letters are fine, and some of the lower, but not all. > What may coused this? > > > How can I convert DOS cyrillic text to Unicode You did not answer Jon's question, but it was critical - in what _program_ your user opens a text file with DOS Cyrillic?
I am working with Cyrillic encodings since 1995 :) so I dealt with most of them, including CP-866.
The easiest way in your scenario would be:
Open that DOS Cyrillic .txt file in MS Word 2000 or newer, choosing "Cyrillic (DOS)" encoding in the process: http://ourworld.compuserve.com/homepages/PaulGor/cp_e.htm#open
Now your user should see normal Russian text - in Unicode already converted by Word and can paste it itno your text box.
Otherwise, if you try to open a file that contains text in DOS Cyrillic encoding in some regular MS Windows text editor, you *will* see just gibberish - editor expects one of _Windows_ encodings, not a DOS one.
There are many more ways to get it done, say converter programs that make "Cyrillic(Windows), 1251" text from your DOS Cyrillic text, I18n-aware editors that - as Word - offer you to specify explicitely what is the encoding of your file - such as http://www.esperanto.mv.ru/UniRed/ENG/ etc., etc.
 Signature Regards, Paul Gorodyansky "Cyrillic (Russian): instructions for Windows and Internet": http://RusWin.net Russian On-screen Keyboard: http://Kbd.RusWin.net
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|