When using StreamReader to reader a file, the character that cannot be
converted to an encoding will be changed to a replace char ,like '?'.
Is there a way to know what the original char(bytes) is?
For example
My file has following lines
・ 名詞相当語句:
・ think it’s ~ to:
・名詞
the hex value:
EC 7D F6 BE 81 40 ......
EC 59 81 40......
81 45 .....
The original code for three '・' are differenet.
EC7D , EC59 and 8145
But when using following code to read,
they are all changed to '・' (unicode: 0x30FB, shift-jis: 8145).
System.IO.StreamReader sr = new
System.IO.StreamReader(@"C:\test.txt",Encoding.GetEncoding("shift-jis"));
string line = sr.ReadLine();
line = sr.ReadLine();
Is there a way to know what is the original value for the '・' ?
Mihai N. - 01 Mar 2006 04:27 GMT
> When using StreamReader to reader a file, the character that cannot be
> converted to an encoding will be changed to a replace char ,like '?'.
> Is there a way to know what the original char(bytes) is?
Not easy.
An idea is to read the file using the encoding (and you get a Unicode
string), then convert back to the original encoding and compare with the
original, at byte level.
But when you do such low level stuff, the ease-of-use of something high-level
like StreamReader is gone.

Signature
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email