
Signature
Happy Coding!
Morten Wennevik [C# MVP]
Thank you, I was looking at the Encoding class without seeing that simple
solution :-/
PL.
> Hi PL,
>
[quoted text clipped - 11 lines]
> hold UTF8 encoded data without loss, but if you plan on doing string
> manipulation I would convert it to unicode first.
> Hi PL,
>
[quoted text clipped - 3 lines]
> byte[] data = System.Text.Encoding.UTF8.GetBytes(utf8string);
> string unicodestring = System.Text.Encoding.Unicode.GetString(data);
This is just wrong.
Strings are strings of characters they are not strings of encodings of
characters hence it is meaningless to have a variable of type System.String
called utf8string.
Consider the simpler situation with Int32:
The integer 10 is not the sequence of characters "10" in decimal and nor is
it "1010" in binary and nor is it the bytes 0x00,0x00,0x00,0x0a - these are
all encodings. The above 2 lines are the equivalent of writing something
like:
int hexInt = 0x42;
string data = hexInt.ToString("X");
int decimalInt = int.Parse(data);
> Beware that UTF16 can be big endian, in which case use BigEndianUnicode to
> get the string.
This brings up the issue of byte order makes (BOM).
If you use BOM then the encoding can be inferred from the first few bytes.
> As for the second question. Yes all strings are unicode, but the content
> of the string does not have to be unicode encoded. I believe a string can
> hold UTF8 encoded data without loss,
A string is not encoded therefore it is meaningless to say that it holds
UTF8 encoded data.
> but if you plan on doing string manipulation I would convert it to
> unicode first.
There is no other type of string in .NET therefore all string manipulation
is inherently unicode.
To understand what you need to do you need to specify how your data comes in
and out of your app. If it comes as byte arrays then what you have is this:
byte[] utf8Input = .....;
string inputString = System.Text.Encoding.UTF8.GetString(utf8Input );
byte[] utf16Output = System.Text.Encoding.Unicode.GetBytes(inputString );
OutputHex(utf16Output);
>> I'm somewhat confused about Unicode but up until now I havent really seen
>> much issues with using it up until recently. We recently started using an
[quoted text clipped - 26 lines]
>> Thank you
>> PL.