Hi Kueishiong!
> How do I copy the content of a string in one encoding (in my case big5) to a
> char array (unmanaged) of the same encoding?
> String line[] = S"123水泥";
.NET strings have no special encoding!!! They are always stored in UTF-16.
> char buffer[200];
You need to convert the UTF-16 string to the "big5" string!
> for(int i=0; i<line->get_length(); i++)
> {
[quoted text clipped - 3 lines]
> It works fine for the first 3 Ascii characters, but gets messed up for the
> next 2 Chinese characters. What is wrong here?
You can use the following:
System::Text::Encoding *e = System::Text::Encoding::GetEncoding("big5");
System::Byte big5 __gc[] = e->GetBytes(S"123水泥");
char *szBig5 = new char[big5->get_Count()+1];
System::Byte __pin *b = &big5[0];
strncpy(szBig5, (char*) b, big5->get_Count());
szBig5[big5->get_Count()] = 0;

Signature
Greetings
Jochen
My blog about Win32 and .NET
http://blog.kalmbachnet.de/
Carl Daniel [VC++ MVP] - 16 Oct 2005 17:25 GMT
> Hi Kueishiong!
>> How do I copy the content of a string in one encoding (in my case
[quoted text clipped - 4 lines]
> .NET strings have no special encoding!!! They are always stored in
> UTF-16.
Actually, I believe it's UCS2. It's not UTF16 since there's no multi-word
characters in the .NET representation and code points above 0xffff are
simply not representable.
>> char buffer[200];
>
> You need to convert the UTF-16 string to the "big5" string!
Or store it in wchar_t buffer[200] instead of char to preserve the UCS2
format.
-cd
Kueishiong Tu - 16 Oct 2005 18:03 GMT
Thank you very much for replying. I change buffer to wchar_t and the coping
works fine.
However the ultimate object I need is a char array becuase the code
following it
requires that. How do I convert a wchar_t array to a char array? From my
experience I know a char array can store both a one-byte ASCII character and
two-byte Chinese character.
Kueishiong Tu
Jochen Kalmbach [MVP] - 16 Oct 2005 18:20 GMT
Hi Carl!
> Actually, I believe it's UCS2. It's not UTF16 since there's no multi-word
In fact there is no multi-word, but there are high/loh-surrogates...
And this _is_ UTF-16 (everything in windows is using UTF-16).
See: http://www.unicode.org/notes/tn12/
<quote>
Most major software with good Unicode support uses UTF-16 (or 16-bit
Unicode strings). Note that much of the software listed below runs on
Unix/Linux systems as well as Windows and others.
- Everything Microsoft — Windows (including Pocket PC) and application
</quote>
> characters in the .NET representation and code points above 0xffff are
> simply not representable.
This would be very bad, then .NET would not support unicode!!!
(and by the way: .NET *is* fully unicode enabled).
At least with .NET 2.0, they added some classes to query all the
necessary infos...
See: StringInfo Class
http://msdn2.microsoft.com/en-us/library/c4hkht93(en-us,VS.80).aspx
See: StringInfo.ParseCombiningCharacters
http://msdn2.microsoft.com/en-us/library/2wayc3ak(en-us,vs.80).aspx

Signature
Greetings
Jochen
My blog about Win32 and .NET
http://blog.kalmbachnet.de/
Carl Daniel [VC++ MVP] - 16 Oct 2005 18:55 GMT
> Hi Carl!
>
[quoted text clipped - 3 lines]
> In fact there is no multi-word, but there are high/loh-surrogates...
> And this _is_ UTF-16 (everything in windows is using UTF-16).
Consider myself educated :) I didn't realize that support for code points
above 0xffff was in fact included in .NET. I'm sure I've missed something,
but I don't recall any very useful character sets in the code points at
10000 and above (e.g. Klingon, Elvish), but I'm happy to see that they're
representable.
-cd
Jochen Kalmbach [MVP] - 16 Oct 2005 19:02 GMT
Hi Carl!
> Consider myself educated :) I didn't realize that support for code points
> above 0xffff was in fact included in .NET. I'm sure I've missed something,
> but I don't recall any very useful character sets in the code points at
> 10000 and above (e.g. Klingon, Elvish), but I'm happy to see that they're
> representable.
Some might be usefull (but you are right: most of them will never be used):
10000..1007F; Linear B Syllabary
10080..100FF; Linear B Ideograms
10100..1013F; Aegean Numbers
10140..1018F; Ancient Greek Numbers
10300..1032F; Old Italic
10330..1034F; Gothic
10380..1039F; Ugaritic
103A0..103DF; Old Persian
10400..1044F; Deseret
10450..1047F; Shavian
10480..104AF; Osmanya
10800..1083F; Cypriot Syllabary
10A00..10A5F; Kharoshthi
1D000..1D0FF; Byzantine Musical Symbols
1D100..1D1FF; *Musical Symbols*
1D200..1D24F; Ancient Greek Musical Notation
1D300..1D35F; Tai Xuan Jing Symbols
1D400..1D7FF; *Mathematical Alphanumeric Symbols*
20000..2A6DF; *CJK Unified Ideographs Extension B*
2F800..2FA1F; *CJK Compatibility Ideographs Supplement*
E0000..E007F; Tags
E0100..E01EF; Variation Selectors Supplement
F0000..FFFFF; Supplementary Private Use Area-A
100000..10FFFF; Supplementary Private Use Area-B

Signature
Greetings
Jochen
My blog about Win32 and .NET
http://blog.kalmbachnet.de/
Kueishiong Tu - 16 Oct 2005 17:37 GMT
Thank you very much for replying.
"> System::Text::Encoding *e = System::Text::Encoding::GetEncoding("big5");
> System::Byte big5 __gc[] = e->GetBytes(S"123水泥");
However the source is something I read from a text file which is in a String.
FileStream* fs = new FileStream(path, FileMode::Open);
StreamReader* sr = new StreamReader(fs, Encoding::GetEncoding("big5"));
String *line = sr->ReadLine();
As your suggestion, I have to convert a String to a Byte array.
How do I do that?
> char *szBig5 = new char[big5->get_Count()+1];
> System::Byte __pin *b = &big5[0];
> strncpy(szBig5, (char*) b, big5->get_Count());
> szBig5[big5->get_Count()] = 0;
Kueishiong Tu
Jochen Kalmbach [MVP] - 16 Oct 2005 18:22 GMT
Hi Kueishiong!
> However the source is something I read from a text file which is in a String.
>
> FileStream* fs = new FileStream(path, FileMode::Open);
> StreamReader* sr = new StreamReader(fs, Encoding::GetEncoding("big5"));
> String *line = sr->ReadLine();
This does _not_ matter!!!
If you have a "string" then it _is_ unicode. The encoding was only used
while reading the file (and translating the big5-encoding to unicode).
> As your suggestion, I have to convert a String to a Byte array.
> How do I do that?
>> char *szBig5 = new char[big5->get_Count()+1];
My example works very well. What is your problem?

Signature
Greetings
Jochen
My blog about Win32 and .NET
http://blog.kalmbachnet.de/
Kueishiong Tu - 17 Oct 2005 15:27 GMT
In your example
> System::Text::Encoding *e = System::Text::Encoding::GetEncoding("big5");
> System::Byte big5 __gc[] = e->GetBytes(S"123水泥");
> char *szBig5 = new char[big5->get_Count()+1];
> System::Byte __pin *b = &big5[0];
> strncpy(szBig5, (char*) b, big5->get_Count());
> szBig5[big5->get_Count()] = 0;
you copy the content of Byte array pointed at by b to a char array szBig5.
However what I need is to copy the content of a String to a char array.
(said String *b = S"123水泥" to szBig5)
> Hi Kueishiong!
>
[quoted text clipped - 13 lines]
>
> My example works very well. What is your problem?
Jochen Kalmbach [MVP] - 17 Oct 2005 17:19 GMT
Hi Kueishiong!
>>System::Text::Encoding *e = System::Text::Encoding::GetEncoding("big5");
>>System::Byte big5 __gc[] = e->GetBytes(S"123水泥");
[quoted text clipped - 6 lines]
> However what I need is to copy the content of a String to a char array.
> (said String *b = S"123水泥" to szBig5)
Maybe we are talking about different things...
I though you wanted a char-array in big5-encoding? Isn´t this what you
wanted???
And excactly this does my example...
It converts a "string" into an char-array which is encoded in "big5".

Signature
Greetings
Jochen
My blog about Win32 and .NET
http://blog.kalmbachnet.de/
Kueishiong Tu - 17 Oct 2005 18:01 GMT
Hi Jochen!
What I want is to copy the content of a String
(
as the source is read from a text file using the following StreamReader
sr->ReadLine() call and stored in the String class *line
FileStream* fs = new FileStream(path, FileMode::Open);
StreamReader* sr = new StreamReader(fs, Encoding::GetEncoding("big5"));
String *line = sr->ReadLine();
)
to a char array (said buffer declared as char buffer[200]), i.e.
move the contents in *line to buffer[].
> Hi Kueishiong!
>
[quoted text clipped - 16 lines]
> And excactly this does my example...
> It converts a "string" into an char-array which is encoded in "big5".
Jochen Kalmbach [MVP] - 17 Oct 2005 18:15 GMT
Hi Kueishiong!
> What I want is to copy the content of a String
>
[quoted text clipped - 7 lines]
>
> to a char array (said buffer declared as char buffer[200]), i.e.
What is "char" ? 8-bit?
> move the contents in *line to buffer[].
There is no difference between buffer[] and *buffer
System::String *line = S"123水泥";
System::Text::Encoding *e = System::Text::Encoding::GetEncoding("big5");
System::Byte big5 __gc[] = e->GetBytes(line);
char *buffer[ = new char[big5->get_Count()+1];
System::Byte __pin *b = &big5[0];
strncpy(buffer[, (char*) b, big5->get_Count());
buffer[big5->get_Count()] = 0;
// now the buffer contains the char-array encoded in "big5"
// after you have used the buffer, you need to destroy it...
delete [] buffer;
(and this was exactly my 1st reply...)

Signature
Greetings
Jochen
My blog about Win32 and .NET
http://blog.kalmbachnet.de/
Kueishiong Tu - 17 Oct 2005 18:58 GMT
Dear Jochen:
> System::Byte big5 __gc[] = e->GetBytes(line);
It is the above line that converts from a String to a Byte array that I want.
I put that in, and the whole program works fine. Thank you very much for help.
Kueishiong Tu
Norman Diamond - 17 Oct 2005 01:33 GMT
> Thank you very much for replying.
>
[quoted text clipped - 11 lines]
> As your suggestion, I have to convert a String to a Byte array.
> How do I do that?
If the file contains a Byte array (ANSI string) and you need to pass the
same byte array to another routine, then don't read a String (Unicode
string). Read a byte array in the first place.
Kueishiong Tu - 17 Oct 2005 15:36 GMT
Thank you very much for replying.
> If the file contains a Byte array (ANSI string) and you need to pass the
> same byte array to another routine, then don't read a String (Unicode
> string). Read a byte array in the first place.
How do I read the content of a text file in as a Byte array instread of a
String which a StreamReader *sr->ReadLine() return?