Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / Managed C++ / August 2007

Tip: Looking for answers? Try searching our database.

_vsnwprintf_s seems to be broken

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Norman Diamond - 31 Jul 2007 03:09 GMT
I think the current version of _vsnwprintf_s is broken, in ordinary Windows.

I'm not completely sure yet but it looks like this breakage is worse than
previously known Windows CE breakage of StringCchPrintf.  For Windows CE
breakage of StringCchPrintf, since the %S format died instead of converting
ANSI to Unicode, a workaround was to call MultiByteToWideChar and then use
the %s format.

For ordinary Windows breakage of _vsnwprintf_s, the %s format is broken, as
far as I can tell.

The compilation environment is not internationalized.  It's Visual Studio
2005 SP1 + hotfix for Vista, and SDK for Vista, all running on Vista, all in
Japanese, no foreign software involved in this environment.  The project
setting for character set says to use Unicode not ANSI.  Function name
_vsntprintf_s maps to _vsnwprintf_s, _T("") maps to L"", etc., and
everything except _vsnwprintf_s seems to perform properly at execution time.
MFC and ATL are not used.  The CRT is used as a DLL.

The runtime environment where failure was observed is internationalized.
The Chinese MUI pack was downloaded.  The user's locale (viewable format or
something like that), the user's display language, and the system locale
(viewable format for non-Unicode programs) are all set to Chinese
traditional Hong Kong.  The settings were copied to all reserved and default
accounts.  The execution PC was rebooted several times.  The logon screen
and nearly everything else are displayed properly in Chinese.  However, the
CRT DLL is from Vista RTM, not from Visual Studio 2005 SP1.

The user's username is "中文2" (without the quotes).  The user can log on
perfectly.  The Start menu shows the user's name at the top.  Windows
Explorer shows the user's name correctly.  No renaming or anything else has
been done with this user.  Ordinary Windows operations work.  Execution of
my program works, except for calls to _vsnwprintf_s.

Code:
static TCHAR szBuf[2048];
_vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR),
 _T("Username=\"%s\"\n"), userName);

Result:
Username="

_vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple
Unicode character.
Cezary Noweta - 31 Jul 2007 13:12 GMT
Hello,

> Code:
> static TCHAR szBuf[2048];
[quoted text clipped - 3 lines]
> Result:
> Username="

> _vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple
> Unicode character.

What is the type of userName? Is it va_arg or TCHAR *? v* functions take va_arg
params and not TCHAR * ones. Maybe you should use _sntprintf_s in place of
_vsntprintf_s?

-- best regards

Cezary Noweta
Norman Diamond - 01 Aug 2007 01:44 GMT
Ouch, I missumarized the source code when making this posting.  No wonder it
looks like the source code was at fault.  Here, I'll summarize it more
accurately.

_TCHAR userName[48];
DebugLog(_T("Other string=\"%s\"\n"), _T("Hello foreign language"));
DebugLog(_T("Username=\"%s\"\n"), userName);
[...]

void DebugLog(TCHAR* szForm, ...)
{
 va_list args;
 va_start(args, szForm); // init valiable length argument list
 static TCHAR szBuf[2048]; // same size for HexDump
 _vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR), szForm, args);
}

Result:
Other string="Hello foreign language"
Username="

_vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple
Unicode character.  In the Japanese version of Vista, in the Japanese
version of the CRT, the Japanese version of _vsnwprintf_s can't handle
Japanese characters (the Japanese user's username) in Unicode.

> Hello,
>
[quoted text clipped - 17 lines]
>
> Cezary Noweta
Marc - 03 Aug 2007 05:28 GMT
Here is my test program:

#include <tchar.h>

#include <cstdio>
#include <cstdarg>

void DebugLog(TCHAR* szForm, ...)
{
    va_list args;
    va_start(args, szForm);
    static TCHAR szBuf[2048];
    _vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR), szForm, args);
    _vstprintf_s(szBuf, szForm, args);
    vwprintf(szForm, args);
    va_end(args);
}

int __cdecl _tmain(int argc, _TCHAR* argv[])
{
    _TCHAR userName[48] = _T("\u6211\u662f\u4e2d\u570b\u4eba");
    DebugLog(_T("Username=%s\n"), userName);
    return 0;
}

Tested on Windows XP (SysLocale 0x411), VS 2005 Express (SP0), and it
works like a charm
(minus the question marks on the console, but this was expected).
Cannot
test on WiVi.
Norman Diamond - 03 Aug 2007 06:41 GMT
Thank you for suggesting a test program, but it doesn't look like you ran a
useful test.

To repeat for the nth time, the environments where this failed have a
Chinese system locale and user locale, not Japanese.  Only the development
environment was Japanese.  Your test used the Japanese system locale and
unstated user locale.

You said you didn't try Vista, so I think we agree that you didn't observe
if you have a repro on Vista.  But later today I will try your program on
Vista.  (I'll have to see what your characters are though, since we might
perhaps expect failure if they're non-Chinese characters such as kana or
Greek or Cyrillic or accented Italian or whatever.)

> Here is my test program:
>
[quoted text clipped - 26 lines]
> Cannot
> test on WiVi.
Norman Diamond - 01 Aug 2007 05:42 GMT
I have just determined that _vsnwprintf_s is broken in Chinese Vista too,
with no internationalization involved in the execution system.

As posted in my other message a few hours ago, here is a corrected summary
of the source code:

_TCHAR userName[48];
DebugLog(_T("Other string=\"%s\"\n"), _T("Hello foreign language"));
DebugLog(_T("Username=\"%s\"\n"), userName);
[...]

void DebugLog(TCHAR* szForm, ...)
{
 va_list args;
 va_start(args, szForm); // init valiable length argument list
 static TCHAR szBuf[2048]; // same size for HexDump
 _vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR), szForm, args);
}

Result:
Other string="Hello foreign language"
Username="

_vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple
Unicode character.  In the Chinese version of Vista, in the Chinese version
of the CRT, the Chinese version of _vsnwprintf_s can't handle Chinese
characters (the Chinese user's username) in Unicode.

The rest of the program works, all except the calls to _vsnwprintf_s.

(By the way the valiable spelling in comments was there in the original.  I
don't know who the original coder was, only that it was coded in Japan.
Today I copied a bit too much source code when using the mouse, but I did
copy it correctly today.)

>I think the current version of _vsnwprintf_s is broken, in ordinary
>Windows.
[quoted text clipped - 49 lines]
> _vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple
> Unicode character.
Jochen Kalmbach [MVP] - 01 Aug 2007 06:37 GMT
Hi Norman!

>I have just determined that _vsnwprintf_s is broken in Chinese Vista too,
> with no internationalization involved in the execution system.

Can you please provide al *full* working example?

And please do not use non-ASCII chars in the source-code,
so that it can be compiled on other systems with the same result.

> _TCHAR userName[48];
> DebugLog(_T("Other string=\"%s\"\n"), _T("Hello foreign language"));
> DebugLog(_T("Username=\"%s\"\n"), userName);

"userName" is not initialized...

Greetings
 Jochen
Norman Diamond - 01 Aug 2007 09:07 GMT
> Can you please provide al *full* working example?

You mean that I should show the assignment of the value of userName?  I
don't know if I can or not, because you proceed to say this:

> And please do not use non-ASCII chars in the source-code,

The user name was "中文2", without the quotes.  I mentioned that part of it
correctly yesterday.

> "userName" is not initialized...

It was not.  It was retrieved from some decryption code which I will not
quote.  Before being encrypted, it was originally retrieved from an API
which I think is one of the NetWksta____ APIs.  The userName value was
retrieved correctly.  The userName value was passed to other APIs for
authentication and succeeded.  To repeat again, everything worked except for
calls to _vsnwprintf_s.

> And please do not use non-ASCII chars in the source-code, so that it can
> be compiled on other systems with the same result.

Hahahaha.  Did I not show enough times that the Japanese and Chinese
versions of _vsnwprintf_s worked OK on ASCII characters?  They only fail
when presented with strings in their own languages.

> Hi Norman!
>
[quoted text clipped - 14 lines]
> Greetings
>  Jochen
Jochen Kalmbach [MVP] - 01 Aug 2007 09:50 GMT
Hi Norman!

>> Can you please provide a *full* working example?
>
> You mean that I should show the assignment of the value of userName?  I
> don't know if I can or not, because you proceed to say this:
>
>> And please do not use non-ASCII chars in the source-code,

Maybe you can write:
TCHAR szUserName[] = {0x1234, 0x2345, 0x789A, 0x0000};
?????

>> And please do not use non-ASCII chars in the source-code, so that it can
>> be compiled on other systems with the same result.
>
> Hahahaha.

Maybe you can write:
TCHAR szUserName[] = {0x1234, 0x2345, 0x789A, 0x0000};
?????

Hahahahaha....

So... please provide a small, full working example with ASCII chars in the
source code!

Greetings
 Jochen
Cezary Noweta - 01 Aug 2007 11:15 GMT
Hello,

> > Hahahaha.

> Hahahahaha....

Hey men, what are smoking? For me, it would be nice to have this stuff now ;-P

-- best regards

Cezary Noweta
Norman Diamond - 03 Aug 2007 01:30 GMT
>>> Can you please provide a *full* working example?
>>
[quoted text clipped - 6 lines]
> TCHAR szUserName[] = {0x1234, 0x2345, 0x789A, 0x0000};
> ?????

中 = U+4E2D
文 = U+6587
2 = U+0032

>>> And please do not use non-ASCII chars in the source-code, so that it can
>>> be compiled on other systems with the same result.
[quoted text clipped - 6 lines]
>
> Hahahahaha....

Well, the user name isn't intended to be constant.  The user name is
intended to be the actual user name of some actual user, and the DLL
receives it by decrypting information that was previously encrypted by some
other DLL that was running under control of the actual user.

> So... please provide a small, full working example with ASCII chars in the
> source code!

TCHAR szUserName[48] = {0x4E2D, 0x6587, 0x0032, 0x0000};

Not tested.  I might have time to test it later today.
Kalle Olavi Niemitalo - 01 Aug 2007 07:21 GMT
> void DebugLog(TCHAR* szForm, ...)
> {
[quoted text clipped - 3 lines]
>  _vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR), szForm, args);
> }

It seems va_end and output of szBuf[] are missing from this function.

> _vsnwprintf_s dies as soon as the %s format hits a perfectly
> valid simple Unicode character.

Does _vsnwprintf_s crash or call the invalid parameter handler,
or does it return some value (which one)?

> In the Chinese version of Vista, in the Chinese version of the
> CRT, the Chinese version of _vsnwprintf_s can't handle Chinese
> characters (the Chinese user's username) in Unicode.

So presumably you are initializing userName[] in some way.
It would be interesting to know the wchar_t values therein.
(You posted a string earlier but please give the numbers too.)
Norman Diamond - 01 Aug 2007 09:19 GMT
> It seems va_end and output of szBuf[] are missing from this function.

FILE* pf;
_tfopen_s(&pf, LOG_FILE_NAME, _T("a"));
if (pf) _ftprintf_s(pf, szBuf);
if (pf) fclose(pf);
va_end(args);

Do you also need a transcript of actions in Windows Explorer to open the log
file in Notepad and show the contents which my previous messages
transcribed?

Do you think that maybe the CRT's _vsnwprintf_s could handle the language of
its own version of Windows but the CRT's _ftprintf_s failed because it had
harder work to do?  I don't quite think so.

> Does _vsnwprintf_s crash or call the invalid parameter handler,
> or does it return some value (which one)?

If it called the invalid parameter handler then I think the rest of the code
(the caller of DebugLog) would not proceed to get everything else working
properly with other Windows APIs, I think the rest of the code would abort.

Your question about the return value is a good one.  I will add a meta debug
log of that information.  I probably won't have time this week though
because higher priority work has just come in.

> So presumably you are initializing userName[] in some way.
> It would be interesting to know the wchar_t values therein.
> (You posted a string earlier but please give the numbers too.)

The string is L"中文2" (without the quotes).  If you really need the
numbers, you can look them up as easily as I can.  (The third character is
number U+0032.)

>> void DebugLog(TCHAR* szForm, ...)
>> {
[quoted text clipped - 19 lines]
> It would be interesting to know the wchar_t values therein.
> (You posted a string earlier but please give the numbers too.)
Cezary Noweta - 01 Aug 2007 11:05 GMT
Hello,

>  FILE* pf;
>  _tfopen_s(&pf, LOG_FILE_NAME, _T("a"));
[quoted text clipped - 5 lines]
> file in Notepad and show the contents which my previous messages
> transcribed?

It would be nice but not necessary ;)

> Do you think that maybe the CRT's _vsnwprintf_s could handle the language of
> its own version of Windows but the CRT's _ftprintf_s failed because it had
> harder work to do?  I don't quite think so.

Yes - I think so. Wide printf foos stop output when they cannot convert from wide
char to mbcs (current locale CP or console CP). This occurs when writing to the
console, text file and so on. Open the log file in UTF16 mode (i.e. _T("ab") instead
of _T("a")), or use the following code:

======
FILE* pf;
_tfopen_s(&pf, LOG_FILE_NAME, _T("a"));
if ( pf ) {
    int outchars;
    outchars = _ftprintf_s(pf, szBuf);
    _ftprintf_s(pf, _T("STRLEN: %u; OUTCHARS: %i\n"),
              _tcslen(szBuf),
              outchars);
    fclose(pf);
}
va_end(args);
======

or try to set locale (,,_tsetlocale(LC_CTYPE, _T(".932"))'') to CP 932 before you are
calling ftprintf and compare the results.

> If it called the invalid parameter handler then I think the rest of the code
> (the caller of DebugLog) would not proceed to get everything else working
> properly with other Windows APIs, I think the rest of the code would abort.

It called wctomb() which convert to the current locale (at the beginning it is "C"
which means that all chars >= U+0100 are not converted). After it failed fwprintf_s
has failed too and the foo returned number chars output so far. The rest of the code
runs fine.

> The string is L"$BCfJ8(B2" (without the quotes).  If you really need the
> numbers, you can look them up as easily as I can.  (The third character is
> number U+0032.)

Oooo... ,,92 86 95 B6 32'' - 14 chars of text. At the beginning I thought that the
first two char codes are confidential and you can not disclose it explicitly ;)
Really could not you enumerate codes even at the price of a solution of your problem?

-- best regards

Cezary Noweta
Kalle Olavi Niemitalo - 02 Aug 2007 17:44 GMT
> Yes - I think so. Wide printf foos stop output when they cannot convert from wide
> char to mbcs (current locale CP or console CP). This occurs when writing to the
> console, text file and so on.

Yes, that could cause the problem.  (I expected to see an
OutputDebugString call.)

> Open the log file in UTF16 mode (i.e. _T("ab") instead of _T("a")),
> or use the following code:

From the documentation of fopen_s and _wfopen_s, it appears that
the "b" flag only affects control characters, and creating a
UTF-16 file requires _T("a, ccs=UTF-16LE") in Visual C++ 2005.

http://msdn2.microsoft.com/library/z5hh6ee9(VS.80).aspx

> FILE* pf;
> _tfopen_s(&pf, LOG_FILE_NAME, _T("a"));
> if ( pf ) {

The documentation of fopen_s and _wfopen_s does not promise that
they reset the pointer to NULL on error.  On the contrary, they
are specified to leave the contents of pFile unchanged in at
least some error situations.  So I think it would be best to
initialize the pointer to NULL, _and_ check the return value
rather than the pointer.

>     int outchars;
>     outchars = _ftprintf_s(pf, szBuf);

Outputting szBuf as a format string without arguments is likely
to crash the program as soon as percent signs appear.
_fputts(szBuf, pf) would be safer.

> Oooo... ,,92 86 95 B6 32'' - 14 chars of text. At the beginning I thought that the
> first two char codes are confidential and you can not disclose it explicitly ;)

I got the characters U+4E2D U+6587 U+0032, although I cannot be
certain they weren't corrupted by the software I am using.
Norman Diamond - 03 Aug 2007 02:19 GMT
>> Yes - I think so. Wide printf foos stop output when they cannot convert
>> from wide char to mbcs (current locale CP or console CP). This occurs
>> when writing to the console, text file and so on.
>
> Yes, that could cause the problem.  (I expected to see an
> OutputDebugString call.)

The target computer has no serial port, but it has an i1394 port, so maybe I
can try using Windbag over i1394, if I find a cable and ... hmm, and install
Windbag onto some other host that has an i1394 port...

>> Open the log file in UTF16 mode (i.e. _T("ab") instead of _T("a")), or
>> use the following code:
[quoted text clipped - 4 lines]
>
> http://msdn2.microsoft.com/library/z5hh6ee9(VS.80).aspx

Looks like I need to do more reading and experimenting, when I get the time
to try Cezary Noweta's suggestion.

>> FILE* pf;
>> _tfopen_s(&pf, LOG_FILE_NAME, _T("a"));
>> if ( pf ) {
>
> The documentation of fopen_s and _wfopen_s does not promise that they
> reset the pointer to NULL on error.

Um, so what?  The problem isn't in _wfopen_s.  New text is being appended to
the existing debug log exactly as hoped.  The problem comes in with either
_vsnwprintf_s or _ftprintf_s.

>> int outchars;
>> outchars = _ftprintf_s(pf, szBuf);
>
> Outputting szBuf as a format string without arguments is likely to crash
> the program as soon as percent signs appear. _fputts(szBuf, pf) would be
> safer.

Hmm, yes, thank you.  Luckily this week's user name has no percent signs,
but I'd better not add any potentially risky metadebugging code to
production code  ^_^

>> Oooo... ,,92 86 95 B6 32'' - 14 chars of text. At the beginning I thought
>> that the first two char codes are confidential and you can not disclose
>> it explicitly ;)
>
> I got the characters U+4E2D U+6587 U+0032, although I cannot be certain
> they weren't corrupted by the software I am using.

Those match the values that I found this morning, looking them up.
Kalle Olavi Niemitalo - 04 Aug 2007 13:45 GMT
>>> FILE* pf;
>>> _tfopen_s(&pf, LOG_FILE_NAME, _T("a"));
[quoted text clipped - 4 lines]
>
> Um, so what?

So the program can dereference an uninitialized pointer and crash
if it cannot open the log file.  I understand it has been able to
open the file in your experiments; but the possible failure
should be properly handled.
Norman Diamond - 06 Aug 2007 01:10 GMT
>>>> FILE* pf;
>>>> _tfopen_s(&pf, LOG_FILE_NAME, _T("a"));
[quoted text clipped - 8 lines]
> cannot open the log file.  I understand it has been able to open the file
> in your experiments; but the possible failure should be properly handled.

You mean the "if ( pf ) {" line which you have correctly quoted several
times?

If pf were NULL then the thing wouldn't stop outputting when it hits the
username, the thing wouldn't have output anything at all.
Kalle Olavi Niemitalo - 06 Aug 2007 21:02 GMT
> You mean the "if ( pf ) {" line which you have correctly quoted
> several times?

The scenario is this:

> FILE* pf;

This defines a local variable pf but does not initialize it.  The
initial value is indeterminate.

> _tfopen_s(&pf, LOG_FILE_NAME, _T("a"));

Now suppose this call fails due to lack of permissions.
_tfopen_s returns a nonzero error code, but the documentation
does not promise that it would set pf = NULL in this case.
So the value of pf can remain indeterminate.

> if ( pf ) {

This tests pf.  The indeterminate value can very well appear
non-NULL, in which case, execution enters the block in the
if-statement.

>     int outchars;
>     outchars = _ftprintf_s(pf, szBuf);

Here the indeterminate value of pf goes to _ftprintf_s, which is
likely to dereference the pointer and crash.  (There is also the
format string problem.)

> If pf were NULL then the thing wouldn't stop outputting when it hits
> the username, the thing wouldn't have output anything at all.

Yes, as I wrote, the _tfopen_s succeeded in your experiments.
What I'm concerned with is the error handling for cases that you
did not test.  Or were you going to remove this logging code from
the final version?
Norman Diamond - 07 Aug 2007 01:58 GMT
>> You mean the "if ( pf ) {" line which you have correctly quoted several
>> times?
[quoted text clipped - 12 lines]
> does not promise that it would set pf = NULL in this case.
> So the value of pf can remain indeterminate.

Oh, where your previous message said "reset the pointer", it looked like you
were talking about the file pointer, i.e. what would be called a cursor in a
database system, i.e. the location that gets seeked to or sensed.  Opening
for append mode, the file pointer would be reset to the end of the file,
unless the open fails.

But you meant the program's pointer variable, pf.  You are right.  The
standard fopen function returns a FILE* so there is no way to omit receiving
a null pointer when it fails, but _tfopen_s can omit that effect.

Lesson learned yet again:  When reading existing code, the first step is NOT
to assume the existing code is correct.  If existing code calls a variation
of an API that you haven't called yourself, look up the API, don't assume
the code is correct.  Sigh.

Thank you.

> Or were you going to remove this logging code from the final version?

The release version has calls to logging functions #define'd out of
existence, just like calls to assert.  So it doesn't crash on this
particular code.  Two wrongs made an accidental right.
Norman Diamond - 03 Aug 2007 02:00 GMT
[Norman Diamond:]
[Quotation of additional parts of program not originally quoted:]
>>  FILE* pf;
>>  _tfopen_s(&pf, LOG_FILE_NAME, _T("a"));
[quoted text clipped - 9 lines]
> from wide char to mbcs (current locale CP or console CP). This occurs when
> writing to the console, text file and so on.

That would be enormously odd.  This problem was reproduced in Chinese Vista
with no internationalization whatsoever.  At the moment I don't recall what
the code page number is, but it is only one code page number, used in
China - Hong Kong, with no customization of the system locale or user
locale.  Language packs can't even be installed on that one because it's
Vista Business not Ultimate.  I did add the Japanese keyboard layout though
because the laptop has a Japanese keyboard built in, not a Chinese keyboard.

Nonetheless, if wide printf foos stop output because they are too stupid to
understand their own native default built-in code page after not being
customized at all, then I understand your suggestion that maybe the breakage
occurs in _ftprintf_s instead of _vsnwprintf_s.  I might have time to
investigate this later today, maybe.

> Open the log file in UTF16 mode (i.e. _T("ab") instead of _T("a")), or use
> the following code:
[quoted text clipped - 12 lines]
> va_end(args);
> ======

That meta-debugging code looks like a good suggestion, and I hope to have
time to try it later today.

> or try to set locale (,,_tsetlocale(LC_CTYPE, _T(".932"))'') to CP 932
> before you are calling ftprintf and compare the results.

That would be expected to cause problems.  In both environments where the
problem has been observed, the actual code page was a Chinese code page not
Japanese:

(1) Japanese Vista Ultimate with system locale and user locale and MUI
language all set to Chinese (Hong Kong) and rebooted several times;

(2) Chinese (Hong Kong) Vista Business with default system locale and user
locale, and no MUI.

>> If it called the invalid parameter handler then I think the rest of the
>> code (the caller of DebugLog) would not proceed to get everything else
[quoted text clipped - 3 lines]
> It called wctomb() which convert to the current locale (at the beginning
> it is "C" which means that all chars >= U+0100 are not converted).

Wait a minute.  I understand the possibility that the CRT might have
initialized the locale to the "C" locale, and I should try to figure out if
that happened.  But if it did, then the point where it breaks and stops
converting characters shouldn't be at U+0100, it should be at U+0080.  And
it should happen no matter what the system locale and user locale are.

> After it failed fwprintf_s has failed too and the foo returned number
> chars output so far. The rest of the code runs fine.
>
>> The string is L"$BCfJ8(B2" (without the quotes

WTF, Outlook Express and every other Microsoft tool involved in these
newsgroup postings, WTF.

I put the cursor after "quotes)." and before "  If".  I hit the Enter key to
put in a line break so I can type this next stuff.  Outlook Express puts the
line break after "quotes" and before ").  If".  More incredible editing
capabilities from Microsoft.

OK, end of second digression, back to first digression.

In my previous posting, I didn't type a raw JIS string with escape sequences
for shift-in and shift-out, I typed the actual characters.  The encoding
format going over the wire was in raw JIS, ISO-2022-JP.  Reading my own
previous message in Outlook Express, the message survived the round trip,
with the characters 中 and 文 and 2.  But when reading your message which
quotes my previous message, Outlook Express is showing raw JIS with escape
sequences and 7-bit byte values.  Oh I see, it's because your message format
is Central European.  I think Central European encoding can't handle these
Chinese characters.  Japanese encoding can hande them because these are
among the characters that were copied from China to Japan during recent
millennia.

Hmm, I guess I should set this current message to use UTF-8 encoding...
Done.

OK, where were we.

>> ).  If you really need the numbers, you can look them up as easily as I
>> can.  (The third character is number U+0032.)
>
> Oooo... ,,92 86 95 B6 32'' - 14 chars of text.

No, you're getting garbage because you're missing fonts and you couldn't
even display the original characters correctly.  I looked them up this
morning so here they are:

中 = U+4E2D
文 = U+6587
2 = U+0032

> At the beginning I thought that the first two char codes are confidential
> and you can not disclose it
> explicitly ;)
> Really could not you enumerate codes even at the price of a solution of
> your problem?

Well, a high-priority task came in two days ago and yes it was higher
priority than meta-debugging of debugging routines that look like they're
depending on broken library routines.  (The actual working code of this DLL
had already been successfully debugged.)  But this morning I had time to
look up the codes.
Norman Diamond - 03 Aug 2007 07:43 GMT
OK, I ran approximately this test.  The log file contains a lot of lines.
After every line of ordinary debugging information, there is a line with
STRLEN and OUTCHARS exactly as defined by Cezary Noweta.

After every line that doesn't contain a username, the values of STRLEN and
OUTCHARS are equal.

After every line that does contain a username, the value of STRLEN is what
it should be if the value of szBuf includes the entire formatted string,
i.e. some constant text before the username, the username itself, and some
constant text after the username.  However, the value of OUTCHARS is -1.
The value of OUTCHARS isn't even the number of characters that _ftprintf_s
wrote before aborting, the value is -1.

So _vsnwprintf_s isn't broken, but at the moment _ftprintf_s seems to be
broken.  _ftprintf_s might not be broken though, if the thing is executing
in the "C" locale as someone guessed.  I'll have to figure that out next.

I had to use the unsafe code
 outchars = _ftprintf_s(pf, szBuf);
as suggested by Cezary Noweta instead of the safer code
 _fputts(szBuf, pf);
as recommended by Kalle Olavi Niemitalo because when _fputts succeeds it
returns a nonzero value which doesn't have to match the number of
characters.

After the above experiment, I tried another one.  Using Notepad, I created
the log file in Unicode with no text.  But _tfopen_s with _T("a") did not
inspect the existing file to decide whether to keep Unicode as Unicode, it
barged ahead and converted Unicode to ANSI and wrote the ANSI.  Then opening
the result in Notepad, since the BOM was still there, Notepad faithfully
tried to display garbage  ^_^

Now I have to add some calls to find out what locale the thing is executing
in at the time, is it the Chinese Hong Kong locale (matching the system
locale and user locale) or is it the "C" locale.

> Hello,
>
[quoted text clipped - 73 lines]
>
> Cezary Noweta
Norman Diamond - 03 Aug 2007 08:25 GMT
It does get worse.

I deleted the file and then ran the program with this code:
 _tfopen_s(&pf, LOG_FILE_NAME, _T("a, ccs=UNICODE"));

http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx
*  The flag is only used when no BOM is present or if the file is a new
*  file.

That is a lie.  _tfopen_s created a new file and it created the thing with
ANSI encoding not Unicode.

I deleted the file again, created a file in Notepad containing only an empty
line (CR-LF pair), saved it in Unicode, and then again ran the program with
this code:
 _tfopen_s(&pf, LOG_FILE_NAME, _T("a, ccs=UNICODE"));

http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx
*  If mode is "a, ccs=<encoding>", fopen_s will first try to open the file
*  with both read and write access. If it succeeds, it will read the BOM to
*  determine the encoding for this file;

This time _tfopen_s seems to have performed correctly.  Now let's continue.

http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx
*  When a Unicode stream-I/O function operates in text mode (the default),
*  the source or destination stream is assumed to be a sequence of multibyte
*  characters. Therefore, the Unicode stream-input functions convert
*  multibyte characters to wide characters (as if by a call to the mbtowc
*  function). For the same reason, the Unicode stream-output functions
*  convert wide characters to multibyte characters (as if by a call to the
*  wctomb function).

In other words, it doesn't matter if _tfopen_s performed correctly because
_ftprintf_s is still going to screw it up.  Let's look for confirmation of
this screw-up.

http://msdn2.microsoft.com/en-us/library/c4cy2b8e(VS.80).aspx
*  For the same reason, the Unicode stream-output functions convert wide
*  characters to multibyte characters (as if by a call to the wctomb
*  function).

Yup, no provision at all for keeping Unicode as Unicode.

However, both of those are half-lies.  Half the time, _ftprintf_s violated
MSDN and it kept Unicode as Unicode in the spirit (but not the letter) of
http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx.
The other half of the time, _ftprintf_s screwed up worse.

Notepad opened the file in Unicode.  The display alternates, a bunch of
readable lines, a few lines of garbage, a bunch of readable lines, a few
lines of garbage, etc.

It seems that ccs=UNICODE is unusable.  It changes the result from being
mostly readable (with a little bit of lossage) to being half readable (with
half garbage).

> OK, I ran approximately this test.  The log file contains a lot of lines.
> After every line of ordinary debugging information, there is a line with
[quoted text clipped - 114 lines]
>>
>> Cezary Noweta
Norman Diamond - 03 Aug 2007 08:50 GMT
It gets even more worse.

I added this call:
 _ftprintf_s(pf, _T("%s\n"), _tsetlocale(LC_CTYPE, _T("")));
The output was:
 Chinese_Hong Kong S.A.R..950

So there is absolutely no excuse for _ftprintf_s to screw up on Chinese
characters.  The DLL is not running in the C locale, it's running in the
Chinese Hong Kong locale, code page 950, exactly as it should be.

Here's more MSDN stuff too.
http://msdn2.microsoft.com/en-us/library/x99tb11d(VS.80).aspx
*  LC_CTYPE
*  The character-handling functions (except isdigit, isxdigit, mbstowcs, and
*  mbtowc, which are unaffected).

So mbtowc is one of the exceptions, it wouldn't have been affected even if
the C locale were in use, and presumably it would always use code page 950
and screw up because it's miscoded -- however, wctomb isn't one of the
exceptions, so it would have been affected if the C locale were in use, and
it would screw up differently from the way it actually screws up.

Anyway, thank you whoever it was who said that _vsnwprintf_s isn't broken
and _ftprintf_s.  Sorry I found it hard to believe you.  You're absolutely
right.  _ftprintf_s is broken.

> OK, I ran approximately this test.  The log file contains a lot of lines.
> After every line of ordinary debugging information, there is a line with
[quoted text clipped - 114 lines]
>>
>> Cezary Noweta
Norman Diamond - 03 Aug 2007 09:01 GMT
OMFG.

When I added this call:
 _ftprintf_s(pf, _T("%s\n"), _tsetlocale(LC_CTYPE, _T("")));
it didn't query the DLL's current locale the way MSDN says it will.  It SET
the current locale, and returned it:
 Chinese_Hong Kong S.A.R..950

And, the result of this setting activity did affect the way wctomb operates.
And the result of that setting activity did affect the way _ftprintf_s
operates.

The result was that _ftprintf_s wrote the user name correctly.

In ANSI.

 中文2

The good news is that there's a workaround for the breakage in _ftprintf_s.
The bad news is that I haven't finished learning how bad Windows can be.

> It gets even more worse.
>
[quoted text clipped - 153 lines]
>>>
>>> Cezary Noweta
Norman Diamond - 03 Aug 2007 09:21 GMT
Oh, I screwed up this last test.  MSDN says my call to _tsetlocale() does
set the locale instead of querying.  A null pointer does a query but a null
string sets it to the default from the OS.

OK, good news anyway, when setting to the default from the OS, it worked.

So I still don't know if the DLL actually started up in the C locale
instead, but I'm not in the mood to test it again.

> OMFG.
>
[quoted text clipped - 187 lines]
>>>>
>>>> Cezary Noweta
Cezary Noweta - 03 Aug 2007 14:30 GMT
Hello,

> > Oooo... ,,92 86 95 B6 32'' - 14 chars of text.

> No, you're getting garbage because you're missing fonts and you couldn't
> even display the original characters correctly.  I looked them up this
> morning so here they are:

No - this is good. This code sequence is Japanes ANSI (932) You have written that
your dev environment is Japanese and you coded your posts in ISO-2022-JP, so I have
sent JAP ANSI codes and not CHS ones.

> This problem was reproduced in Chinese Vista
> with no internationalization whatsoever.

> Nonetheless, if wide printf foos stop output because they are too stupid to
> understand their own native default built-in code page after not being
> customized at all, then I understand your suggestion that maybe the breakage
> occurs in _ftprintf_s instead of _vsnwprintf_s.

It does not matter what is your system ANSI CP. Also wide printf is not so stupid.
ISO states that:

"At program startup, the equivalent of
   setlocale(LC_ALL, "C");
is executed."

That's all. Whan you want to play with mixed international language streams, and
especially with Unicde or double-byte character sets, then you must use setlocale().

> I deleted the file and then ran the program with this code:
>   _tfopen_s(&pf, LOG_FILE_NAME, _T("a, ccs=UNICODE"));

> http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx
> *  The flag is only used when no BOM is present or if the file is a new
> *  file.

> That is a lie.  _tfopen_s created a new file and it created the thing with
> ANSI encoding not Unicode.

> I deleted the file again, created a file in Notepad containing only an empty
> line (CR-LF pair), saved it in Unicode, and then again ran the program with
> this code:
>   _tfopen_s(&pf, LOG_FILE_NAME, _T("a, ccs=UNICODE"));

Everything is OK - look at the table below. When you are creating a new file you
should use "ccs=UTF-16LE" and not "ccs=UNICODE" as "UNICODE" creates ANSI files.

> I added this call:
>   _ftprintf_s(pf, _T("%s\n"), _tsetlocale(LC_CTYPE, _T("")));
> The output was:
>   Chinese_Hong Kong S.A.R..950

So now you know that you have set locale to your system codepage. setlocale(LC_CTYPE,
NULL) returns your current locale.

> _ftprintf_s is broken.

> The good news is that there's a workaround for the breakage in _ftprintf_s.

fwprintf is not broken. This is not workaround. This is normal way to achieve an
effect you wanted. Even according to the ISO standerd and not to MSDN or MS at all.

ISO states:

"The wide character output functions convert wide characters to multibyte characters
and write them to the stream as if they were written by successive calls to the
fputwc function. Each conversion occurs as if by a call to the wcrtomb function, with
the conversion state described by the stream’s own mbstate_t object. The byte output
functions write characters to the stream as if by successive calls to the fputc
function."

and

"An encoding error occurs if the character sequence presented to the underlying
mbrtowc function does not form a valid (generalized) multibyte character, or if the
code value passed to the underlying wcrtomb does not correspond to a valid
(generalized) multibyte character. The wide character input/output functions and the
byte input/output functions store the value of the macro EILSEQ in errno if and only
if an encoding error occurs."

After your _ftprintf() has been executed within "C" locale, errno contains EILSEQ.
After you have called setlocale(LC_CTYPE, "") your locale is set to your system
codepage and _ftprintf() works OK. Everything is OK.

> The bad news is that I haven't finished learning how bad Windows can be.

Yea, but not this time ;)

-- best regards

Cezary Noweta
Norman Diamond - 06 Aug 2007 01:28 GMT
>> > Oooo... ,,92 86 95 B6 32'' - 14 chars of text.
>>
[quoted text clipped - 5 lines]
> written that your dev environment is Japanese and you coded your posts in
> ISO-2022-JP, so I have sent JAP ANSI codes and not CHS ones.

The default coding for new posts is ISO-2022-JP.  The default for followup
posts is to use the encoding of the post that is being quoted.  But you set
the encoding of your posts to Central European.

The program was running in a Chinese environment where the system and user
code page was 950.

14 bytes of text is not 14 chars of text.

Internal to the program, the coding was Unicode not ANSI.  Ordinarily it
wouldn't convert to ANSI until _ftprintf_s writes to a file.  From this
discussion I learned that even _ftprintf_s won't convert it to ANSI unless
the program does a call to use the system's code page instead of C locale.

> It does not matter what is your system ANSI CP. Also wide printf is not so
> stupid. ISO states that:
[quoted text clipped - 4 lines]
>
> That's all.

I understand that now.  Thank you.

> Whan you want to play with mixed international language streams, and
> especially with Unicde or double-byte character sets, then you must use
> setlocale().

You still don't understand this part of it though.  There was no mixing of
international language streams.  The execution environment was 100% Chinese.
And the need to use setlocale() applies to single-byte character sets as
much as it does to double-byte character sets.  _ftprintf_s should fail on
several characters in your language when using your code page, and
_ftprintf_s should fail on the English character £ when using Western
Europe's code page, just as quickly as it failed on Chinese characters when
using a Chinese code page.

Everyone has to call setlocale() and tell the CRT to use the system's code
page.  Hopefully we both understand this now.

>> I deleted the file and then ran the program with this code:
>>   _tfopen_s(&pf, LOG_FILE_NAME, _T("a, ccs=UNICODE"));
[quoted text clipped - 14 lines]
> file you should use "ccs=UTF-16LE" and not "ccs=UNICODE" as "UNICODE"
> creates ANSI files.

Oh, you are right.  The flag UNICODE means ANSI.  I wonder why the flag ANSI
doesn't mean UNICODE.  Who allowed some programmer to develop a CRT without
a flag named ANSI and meaning UNICODE?  At least it's a relief to see that I
wasn't the least competent programmer in this discussion  ^_^

Anyway now I understand to put this at the beginning of every program:
 _tsetlocale(LC_CTYPE, _T(""));
Now I wonder how to find out if it's safe to call this from DllMain.
Norman Diamond - 06 Aug 2007 01:34 GMT
> 14 bytes of text is not 14 chars of text.

Ooops, internally I read "char" as "character" instead of "char".  I get a
C- today.

In C, 14 bytes of text is 14 chars of text.  It isn't 14 characters and you
didn't say that it is.  Sorry.
Kalle Olavi Niemitalo - 06 Aug 2007 21:36 GMT
> Anyway now I understand to put this at the beginning of every program:
>  _tsetlocale(LC_CTYPE, _T(""));
> Now I wonder how to find out if it's safe to call this from DllMain.

setlocale() and _wsetlocale() are not exported from Kernel32.dll,
so you should assume that they are not safe to call from DllMain.
If you link the C runtime statically into your DLL and audit its
source code, then it may be safe, until the next library upgrade.

If you link to the C runtime DLL, then setlocale() can also
affect other modules of the program, thereby increasing
dependencies between them.  With _create_locale() and
_ftprintf_s_l(), you could better isolate DLLs from each other.
Kalle Olavi Niemitalo - 04 Aug 2007 14:24 GMT
> I had to use the unsafe code
>  outchars = _ftprintf_s(pf, szBuf);

Surely you could fix the bug with _ftprintf_s(pf, _T("%s"), szBuf).

(In standard C, L"%s" in a format string takes a char * argument,
just like "%s".  In Microsoft C, it instead takes a wchar_t *
argument.  I don't know if there is a way for Microsoft to
correct this violation without breaking countless programs.)
Norman Diamond - 06 Aug 2007 01:32 GMT
>> I had to use the unsafe code
>>  outchars = _ftprintf_s(pf, szBuf);
>
> Surely you could fix the bug with _ftprintf_s(pf, _T("%s"), szBuf).

No, the reason this entire discussion started was because _T("%s") fails.
Finally we understand the reason why _T("%s") fails (it was my fault for not
adding a call to setlocale).  Nonetheless changing _T("%s") to _T("%s")
wouldn't fix it.

Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.