.NET Forum / Languages / Managed C++ / August 2007
_vsnwprintf_s seems to be broken
|
|
Thread rating:  |
Norman Diamond - 31 Jul 2007 03:09 GMT I think the current version of _vsnwprintf_s is broken, in ordinary Windows.
I'm not completely sure yet but it looks like this breakage is worse than previously known Windows CE breakage of StringCchPrintf. For Windows CE breakage of StringCchPrintf, since the %S format died instead of converting ANSI to Unicode, a workaround was to call MultiByteToWideChar and then use the %s format.
For ordinary Windows breakage of _vsnwprintf_s, the %s format is broken, as far as I can tell.
The compilation environment is not internationalized. It's Visual Studio 2005 SP1 + hotfix for Vista, and SDK for Vista, all running on Vista, all in Japanese, no foreign software involved in this environment. The project setting for character set says to use Unicode not ANSI. Function name _vsntprintf_s maps to _vsnwprintf_s, _T("") maps to L"", etc., and everything except _vsnwprintf_s seems to perform properly at execution time. MFC and ATL are not used. The CRT is used as a DLL.
The runtime environment where failure was observed is internationalized. The Chinese MUI pack was downloaded. The user's locale (viewable format or something like that), the user's display language, and the system locale (viewable format for non-Unicode programs) are all set to Chinese traditional Hong Kong. The settings were copied to all reserved and default accounts. The execution PC was rebooted several times. The logon screen and nearly everything else are displayed properly in Chinese. However, the CRT DLL is from Vista RTM, not from Visual Studio 2005 SP1.
The user's username is "中文2" (without the quotes). The user can log on perfectly. The Start menu shows the user's name at the top. Windows Explorer shows the user's name correctly. No renaming or anything else has been done with this user. Ordinary Windows operations work. Execution of my program works, except for calls to _vsnwprintf_s.
Code: static TCHAR szBuf[2048]; _vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR), _T("Username=\"%s\"\n"), userName);
Result: Username="
_vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple Unicode character.
Cezary Noweta - 31 Jul 2007 13:12 GMT Hello,
> Code: > static TCHAR szBuf[2048]; [quoted text clipped - 3 lines] > Result: > Username="
> _vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple > Unicode character. What is the type of userName? Is it va_arg or TCHAR *? v* functions take va_arg params and not TCHAR * ones. Maybe you should use _sntprintf_s in place of _vsntprintf_s?
-- best regards
Cezary Noweta
Norman Diamond - 01 Aug 2007 01:44 GMT Ouch, I missumarized the source code when making this posting. No wonder it looks like the source code was at fault. Here, I'll summarize it more accurately.
_TCHAR userName[48]; DebugLog(_T("Other string=\"%s\"\n"), _T("Hello foreign language")); DebugLog(_T("Username=\"%s\"\n"), userName); [...]
void DebugLog(TCHAR* szForm, ...) { va_list args; va_start(args, szForm); // init valiable length argument list static TCHAR szBuf[2048]; // same size for HexDump _vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR), szForm, args); }
Result: Other string="Hello foreign language" Username="
_vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple Unicode character. In the Japanese version of Vista, in the Japanese version of the CRT, the Japanese version of _vsnwprintf_s can't handle Japanese characters (the Japanese user's username) in Unicode.
> Hello, > [quoted text clipped - 17 lines] > > Cezary Noweta Marc - 03 Aug 2007 05:28 GMT Here is my test program:
#include <tchar.h>
#include <cstdio> #include <cstdarg>
void DebugLog(TCHAR* szForm, ...) { va_list args; va_start(args, szForm); static TCHAR szBuf[2048]; _vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR), szForm, args); _vstprintf_s(szBuf, szForm, args); vwprintf(szForm, args); va_end(args); }
int __cdecl _tmain(int argc, _TCHAR* argv[]) { _TCHAR userName[48] = _T("\u6211\u662f\u4e2d\u570b\u4eba"); DebugLog(_T("Username=%s\n"), userName); return 0; }
Tested on Windows XP (SysLocale 0x411), VS 2005 Express (SP0), and it works like a charm (minus the question marks on the console, but this was expected). Cannot test on WiVi.
Norman Diamond - 03 Aug 2007 06:41 GMT Thank you for suggesting a test program, but it doesn't look like you ran a useful test.
To repeat for the nth time, the environments where this failed have a Chinese system locale and user locale, not Japanese. Only the development environment was Japanese. Your test used the Japanese system locale and unstated user locale.
You said you didn't try Vista, so I think we agree that you didn't observe if you have a repro on Vista. But later today I will try your program on Vista. (I'll have to see what your characters are though, since we might perhaps expect failure if they're non-Chinese characters such as kana or Greek or Cyrillic or accented Italian or whatever.)
> Here is my test program: > [quoted text clipped - 26 lines] > Cannot > test on WiVi. Norman Diamond - 01 Aug 2007 05:42 GMT I have just determined that _vsnwprintf_s is broken in Chinese Vista too, with no internationalization involved in the execution system.
As posted in my other message a few hours ago, here is a corrected summary of the source code:
_TCHAR userName[48]; DebugLog(_T("Other string=\"%s\"\n"), _T("Hello foreign language")); DebugLog(_T("Username=\"%s\"\n"), userName); [...]
void DebugLog(TCHAR* szForm, ...) { va_list args; va_start(args, szForm); // init valiable length argument list static TCHAR szBuf[2048]; // same size for HexDump _vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR), szForm, args); }
Result: Other string="Hello foreign language" Username="
_vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple Unicode character. In the Chinese version of Vista, in the Chinese version of the CRT, the Chinese version of _vsnwprintf_s can't handle Chinese characters (the Chinese user's username) in Unicode.
The rest of the program works, all except the calls to _vsnwprintf_s.
(By the way the valiable spelling in comments was there in the original. I don't know who the original coder was, only that it was coded in Japan. Today I copied a bit too much source code when using the mouse, but I did copy it correctly today.)
>I think the current version of _vsnwprintf_s is broken, in ordinary >Windows. [quoted text clipped - 49 lines] > _vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple > Unicode character. Jochen Kalmbach [MVP] - 01 Aug 2007 06:37 GMT Hi Norman!
>I have just determined that _vsnwprintf_s is broken in Chinese Vista too, > with no internationalization involved in the execution system. Can you please provide al *full* working example?
And please do not use non-ASCII chars in the source-code, so that it can be compiled on other systems with the same result.
> _TCHAR userName[48]; > DebugLog(_T("Other string=\"%s\"\n"), _T("Hello foreign language")); > DebugLog(_T("Username=\"%s\"\n"), userName); "userName" is not initialized...
Greetings Jochen
Norman Diamond - 01 Aug 2007 09:07 GMT > Can you please provide al *full* working example? You mean that I should show the assignment of the value of userName? I don't know if I can or not, because you proceed to say this:
> And please do not use non-ASCII chars in the source-code, The user name was "中文2", without the quotes. I mentioned that part of it correctly yesterday.
> "userName" is not initialized... It was not. It was retrieved from some decryption code which I will not quote. Before being encrypted, it was originally retrieved from an API which I think is one of the NetWksta____ APIs. The userName value was retrieved correctly. The userName value was passed to other APIs for authentication and succeeded. To repeat again, everything worked except for calls to _vsnwprintf_s.
> And please do not use non-ASCII chars in the source-code, so that it can > be compiled on other systems with the same result. Hahahaha. Did I not show enough times that the Japanese and Chinese versions of _vsnwprintf_s worked OK on ASCII characters? They only fail when presented with strings in their own languages.
> Hi Norman! > [quoted text clipped - 14 lines] > Greetings > Jochen Jochen Kalmbach [MVP] - 01 Aug 2007 09:50 GMT Hi Norman!
>> Can you please provide a *full* working example? > > You mean that I should show the assignment of the value of userName? I > don't know if I can or not, because you proceed to say this: > >> And please do not use non-ASCII chars in the source-code, Maybe you can write: TCHAR szUserName[] = {0x1234, 0x2345, 0x789A, 0x0000}; ?????
>> And please do not use non-ASCII chars in the source-code, so that it can >> be compiled on other systems with the same result. > > Hahahaha. Maybe you can write: TCHAR szUserName[] = {0x1234, 0x2345, 0x789A, 0x0000}; ?????
Hahahahaha....
So... please provide a small, full working example with ASCII chars in the source code!
Greetings Jochen
Cezary Noweta - 01 Aug 2007 11:15 GMT Hello,
> > Hahahaha.
> Hahahahaha.... Hey men, what are smoking? For me, it would be nice to have this stuff now ;-P
-- best regards
Cezary Noweta
Norman Diamond - 03 Aug 2007 01:30 GMT >>> Can you please provide a *full* working example? >> [quoted text clipped - 6 lines] > TCHAR szUserName[] = {0x1234, 0x2345, 0x789A, 0x0000}; > ????? 中 = U+4E2D 文 = U+6587 2 = U+0032
>>> And please do not use non-ASCII chars in the source-code, so that it can >>> be compiled on other systems with the same result. [quoted text clipped - 6 lines] > > Hahahahaha.... Well, the user name isn't intended to be constant. The user name is intended to be the actual user name of some actual user, and the DLL receives it by decrypting information that was previously encrypted by some other DLL that was running under control of the actual user.
> So... please provide a small, full working example with ASCII chars in the > source code! TCHAR szUserName[48] = {0x4E2D, 0x6587, 0x0032, 0x0000};
Not tested. I might have time to test it later today.
Kalle Olavi Niemitalo - 01 Aug 2007 07:21 GMT > void DebugLog(TCHAR* szForm, ...) > { [quoted text clipped - 3 lines] > _vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR), szForm, args); > } It seems va_end and output of szBuf[] are missing from this function.
> _vsnwprintf_s dies as soon as the %s format hits a perfectly > valid simple Unicode character. Does _vsnwprintf_s crash or call the invalid parameter handler, or does it return some value (which one)?
> In the Chinese version of Vista, in the Chinese version of the > CRT, the Chinese version of _vsnwprintf_s can't handle Chinese > characters (the Chinese user's username) in Unicode. So presumably you are initializing userName[] in some way. It would be interesting to know the wchar_t values therein. (You posted a string earlier but please give the numbers too.)
Norman Diamond - 01 Aug 2007 09:19 GMT > It seems va_end and output of szBuf[] are missing from this function. FILE* pf; _tfopen_s(&pf, LOG_FILE_NAME, _T("a")); if (pf) _ftprintf_s(pf, szBuf); if (pf) fclose(pf); va_end(args);
Do you also need a transcript of actions in Windows Explorer to open the log file in Notepad and show the contents which my previous messages transcribed?
Do you think that maybe the CRT's _vsnwprintf_s could handle the language of its own version of Windows but the CRT's _ftprintf_s failed because it had harder work to do? I don't quite think so.
> Does _vsnwprintf_s crash or call the invalid parameter handler, > or does it return some value (which one)? If it called the invalid parameter handler then I think the rest of the code (the caller of DebugLog) would not proceed to get everything else working properly with other Windows APIs, I think the rest of the code would abort.
Your question about the return value is a good one. I will add a meta debug log of that information. I probably won't have time this week though because higher priority work has just come in.
> So presumably you are initializing userName[] in some way. > It would be interesting to know the wchar_t values therein. > (You posted a string earlier but please give the numbers too.) The string is L"中文2" (without the quotes). If you really need the numbers, you can look them up as easily as I can. (The third character is number U+0032.)
>> void DebugLog(TCHAR* szForm, ...) >> { [quoted text clipped - 19 lines] > It would be interesting to know the wchar_t values therein. > (You posted a string earlier but please give the numbers too.) Cezary Noweta - 01 Aug 2007 11:05 GMT Hello,
> FILE* pf; > _tfopen_s(&pf, LOG_FILE_NAME, _T("a")); [quoted text clipped - 5 lines] > file in Notepad and show the contents which my previous messages > transcribed? It would be nice but not necessary ;)
> Do you think that maybe the CRT's _vsnwprintf_s could handle the language of > its own version of Windows but the CRT's _ftprintf_s failed because it had > harder work to do? I don't quite think so. Yes - I think so. Wide printf foos stop output when they cannot convert from wide char to mbcs (current locale CP or console CP). This occurs when writing to the console, text file and so on. Open the log file in UTF16 mode (i.e. _T("ab") instead of _T("a")), or use the following code:
====== FILE* pf; _tfopen_s(&pf, LOG_FILE_NAME, _T("a")); if ( pf ) { int outchars; outchars = _ftprintf_s(pf, szBuf); _ftprintf_s(pf, _T("STRLEN: %u; OUTCHARS: %i\n"), _tcslen(szBuf), outchars); fclose(pf); } va_end(args); ======
or try to set locale (,,_tsetlocale(LC_CTYPE, _T(".932"))'') to CP 932 before you are calling ftprintf and compare the results.
> If it called the invalid parameter handler then I think the rest of the code > (the caller of DebugLog) would not proceed to get everything else working > properly with other Windows APIs, I think the rest of the code would abort. It called wctomb() which convert to the current locale (at the beginning it is "C" which means that all chars >= U+0100 are not converted). After it failed fwprintf_s has failed too and the foo returned number chars output so far. The rest of the code runs fine.
> The string is L"$BCfJ8(B2" (without the quotes). If you really need the > numbers, you can look them up as easily as I can. (The third character is > number U+0032.) Oooo... ,,92 86 95 B6 32'' - 14 chars of text. At the beginning I thought that the first two char codes are confidential and you can not disclose it explicitly ;) Really could not you enumerate codes even at the price of a solution of your problem?
-- best regards
Cezary Noweta
Kalle Olavi Niemitalo - 02 Aug 2007 17:44 GMT > Yes - I think so. Wide printf foos stop output when they cannot convert from wide > char to mbcs (current locale CP or console CP). This occurs when writing to the > console, text file and so on. Yes, that could cause the problem. (I expected to see an OutputDebugString call.)
> Open the log file in UTF16 mode (i.e. _T("ab") instead of _T("a")), > or use the following code: From the documentation of fopen_s and _wfopen_s, it appears that the "b" flag only affects control characters, and creating a UTF-16 file requires _T("a, ccs=UTF-16LE") in Visual C++ 2005.
http://msdn2.microsoft.com/library/z5hh6ee9(VS.80).aspx
> FILE* pf; > _tfopen_s(&pf, LOG_FILE_NAME, _T("a")); > if ( pf ) { The documentation of fopen_s and _wfopen_s does not promise that they reset the pointer to NULL on error. On the contrary, they are specified to leave the contents of pFile unchanged in at least some error situations. So I think it would be best to initialize the pointer to NULL, _and_ check the return value rather than the pointer.
> int outchars; > outchars = _ftprintf_s(pf, szBuf); Outputting szBuf as a format string without arguments is likely to crash the program as soon as percent signs appear. _fputts(szBuf, pf) would be safer.
> Oooo... ,,92 86 95 B6 32'' - 14 chars of text. At the beginning I thought that the > first two char codes are confidential and you can not disclose it explicitly ;) I got the characters U+4E2D U+6587 U+0032, although I cannot be certain they weren't corrupted by the software I am using.
Norman Diamond - 03 Aug 2007 02:19 GMT >> Yes - I think so. Wide printf foos stop output when they cannot convert >> from wide char to mbcs (current locale CP or console CP). This occurs >> when writing to the console, text file and so on. > > Yes, that could cause the problem. (I expected to see an > OutputDebugString call.) The target computer has no serial port, but it has an i1394 port, so maybe I can try using Windbag over i1394, if I find a cable and ... hmm, and install Windbag onto some other host that has an i1394 port...
>> Open the log file in UTF16 mode (i.e. _T("ab") instead of _T("a")), or >> use the following code: [quoted text clipped - 4 lines] > > http://msdn2.microsoft.com/library/z5hh6ee9(VS.80).aspx Looks like I need to do more reading and experimenting, when I get the time to try Cezary Noweta's suggestion.
>> FILE* pf; >> _tfopen_s(&pf, LOG_FILE_NAME, _T("a")); >> if ( pf ) { > > The documentation of fopen_s and _wfopen_s does not promise that they > reset the pointer to NULL on error. Um, so what? The problem isn't in _wfopen_s. New text is being appended to the existing debug log exactly as hoped. The problem comes in with either _vsnwprintf_s or _ftprintf_s.
>> int outchars; >> outchars = _ftprintf_s(pf, szBuf); > > Outputting szBuf as a format string without arguments is likely to crash > the program as soon as percent signs appear. _fputts(szBuf, pf) would be > safer. Hmm, yes, thank you. Luckily this week's user name has no percent signs, but I'd better not add any potentially risky metadebugging code to production code ^_^
>> Oooo... ,,92 86 95 B6 32'' - 14 chars of text. At the beginning I thought >> that the first two char codes are confidential and you can not disclose >> it explicitly ;) > > I got the characters U+4E2D U+6587 U+0032, although I cannot be certain > they weren't corrupted by the software I am using. Those match the values that I found this morning, looking them up.
Kalle Olavi Niemitalo - 04 Aug 2007 13:45 GMT >>> FILE* pf; >>> _tfopen_s(&pf, LOG_FILE_NAME, _T("a")); [quoted text clipped - 4 lines] > > Um, so what? So the program can dereference an uninitialized pointer and crash if it cannot open the log file. I understand it has been able to open the file in your experiments; but the possible failure should be properly handled.
Norman Diamond - 06 Aug 2007 01:10 GMT >>>> FILE* pf; >>>> _tfopen_s(&pf, LOG_FILE_NAME, _T("a")); [quoted text clipped - 8 lines] > cannot open the log file. I understand it has been able to open the file > in your experiments; but the possible failure should be properly handled. You mean the "if ( pf ) {" line which you have correctly quoted several times?
If pf were NULL then the thing wouldn't stop outputting when it hits the username, the thing wouldn't have output anything at all.
Kalle Olavi Niemitalo - 06 Aug 2007 21:02 GMT > You mean the "if ( pf ) {" line which you have correctly quoted > several times? The scenario is this:
> FILE* pf; This defines a local variable pf but does not initialize it. The initial value is indeterminate.
> _tfopen_s(&pf, LOG_FILE_NAME, _T("a")); Now suppose this call fails due to lack of permissions. _tfopen_s returns a nonzero error code, but the documentation does not promise that it would set pf = NULL in this case. So the value of pf can remain indeterminate.
> if ( pf ) { This tests pf. The indeterminate value can very well appear non-NULL, in which case, execution enters the block in the if-statement.
> int outchars; > outchars = _ftprintf_s(pf, szBuf); Here the indeterminate value of pf goes to _ftprintf_s, which is likely to dereference the pointer and crash. (There is also the format string problem.)
> If pf were NULL then the thing wouldn't stop outputting when it hits > the username, the thing wouldn't have output anything at all. Yes, as I wrote, the _tfopen_s succeeded in your experiments. What I'm concerned with is the error handling for cases that you did not test. Or were you going to remove this logging code from the final version?
Norman Diamond - 07 Aug 2007 01:58 GMT >> You mean the "if ( pf ) {" line which you have correctly quoted several >> times? [quoted text clipped - 12 lines] > does not promise that it would set pf = NULL in this case. > So the value of pf can remain indeterminate. Oh, where your previous message said "reset the pointer", it looked like you were talking about the file pointer, i.e. what would be called a cursor in a database system, i.e. the location that gets seeked to or sensed. Opening for append mode, the file pointer would be reset to the end of the file, unless the open fails.
But you meant the program's pointer variable, pf. You are right. The standard fopen function returns a FILE* so there is no way to omit receiving a null pointer when it fails, but _tfopen_s can omit that effect.
Lesson learned yet again: When reading existing code, the first step is NOT to assume the existing code is correct. If existing code calls a variation of an API that you haven't called yourself, look up the API, don't assume the code is correct. Sigh.
Thank you.
> Or were you going to remove this logging code from the final version? The release version has calls to logging functions #define'd out of existence, just like calls to assert. So it doesn't crash on this particular code. Two wrongs made an accidental right.
Norman Diamond - 03 Aug 2007 02:00 GMT [Norman Diamond:] [Quotation of additional parts of program not originally quoted:]
>> FILE* pf; >> _tfopen_s(&pf, LOG_FILE_NAME, _T("a")); [quoted text clipped - 9 lines] > from wide char to mbcs (current locale CP or console CP). This occurs when > writing to the console, text file and so on. That would be enormously odd. This problem was reproduced in Chinese Vista with no internationalization whatsoever. At the moment I don't recall what the code page number is, but it is only one code page number, used in China - Hong Kong, with no customization of the system locale or user locale. Language packs can't even be installed on that one because it's Vista Business not Ultimate. I did add the Japanese keyboard layout though because the laptop has a Japanese keyboard built in, not a Chinese keyboard.
Nonetheless, if wide printf foos stop output because they are too stupid to understand their own native default built-in code page after not being customized at all, then I understand your suggestion that maybe the breakage occurs in _ftprintf_s instead of _vsnwprintf_s. I might have time to investigate this later today, maybe.
> Open the log file in UTF16 mode (i.e. _T("ab") instead of _T("a")), or use > the following code: [quoted text clipped - 12 lines] > va_end(args); > ====== That meta-debugging code looks like a good suggestion, and I hope to have time to try it later today.
> or try to set locale (,,_tsetlocale(LC_CTYPE, _T(".932"))'') to CP 932 > before you are calling ftprintf and compare the results. That would be expected to cause problems. In both environments where the problem has been observed, the actual code page was a Chinese code page not Japanese:
(1) Japanese Vista Ultimate with system locale and user locale and MUI language all set to Chinese (Hong Kong) and rebooted several times;
(2) Chinese (Hong Kong) Vista Business with default system locale and user locale, and no MUI.
>> If it called the invalid parameter handler then I think the rest of the >> code (the caller of DebugLog) would not proceed to get everything else [quoted text clipped - 3 lines] > It called wctomb() which convert to the current locale (at the beginning > it is "C" which means that all chars >= U+0100 are not converted). Wait a minute. I understand the possibility that the CRT might have initialized the locale to the "C" locale, and I should try to figure out if that happened. But if it did, then the point where it breaks and stops converting characters shouldn't be at U+0100, it should be at U+0080. And it should happen no matter what the system locale and user locale are.
> After it failed fwprintf_s has failed too and the foo returned number > chars output so far. The rest of the code runs fine. > >> The string is L"$BCfJ8(B2" (without the quotes WTF, Outlook Express and every other Microsoft tool involved in these newsgroup postings, WTF.
I put the cursor after "quotes)." and before " If". I hit the Enter key to put in a line break so I can type this next stuff. Outlook Express puts the line break after "quotes" and before "). If". More incredible editing capabilities from Microsoft.
OK, end of second digression, back to first digression.
In my previous posting, I didn't type a raw JIS string with escape sequences for shift-in and shift-out, I typed the actual characters. The encoding format going over the wire was in raw JIS, ISO-2022-JP. Reading my own previous message in Outlook Express, the message survived the round trip, with the characters 中 and 文 and 2. But when reading your message which quotes my previous message, Outlook Express is showing raw JIS with escape sequences and 7-bit byte values. Oh I see, it's because your message format is Central European. I think Central European encoding can't handle these Chinese characters. Japanese encoding can hande them because these are among the characters that were copied from China to Japan during recent millennia.
Hmm, I guess I should set this current message to use UTF-8 encoding... Done.
OK, where were we.
>> ). If you really need the numbers, you can look them up as easily as I >> can. (The third character is number U+0032.) > > Oooo... ,,92 86 95 B6 32'' - 14 chars of text. No, you're getting garbage because you're missing fonts and you couldn't even display the original characters correctly. I looked them up this morning so here they are:
中 = U+4E2D 文 = U+6587 2 = U+0032
> At the beginning I thought that the first two char codes are confidential > and you can not disclose it > explicitly ;) > Really could not you enumerate codes even at the price of a solution of > your problem? Well, a high-priority task came in two days ago and yes it was higher priority than meta-debugging of debugging routines that look like they're depending on broken library routines. (The actual working code of this DLL had already been successfully debugged.) But this morning I had time to look up the codes.
Norman Diamond - 03 Aug 2007 07:43 GMT OK, I ran approximately this test. The log file contains a lot of lines. After every line of ordinary debugging information, there is a line with STRLEN and OUTCHARS exactly as defined by Cezary Noweta.
After every line that doesn't contain a username, the values of STRLEN and OUTCHARS are equal.
After every line that does contain a username, the value of STRLEN is what it should be if the value of szBuf includes the entire formatted string, i.e. some constant text before the username, the username itself, and some constant text after the username. However, the value of OUTCHARS is -1. The value of OUTCHARS isn't even the number of characters that _ftprintf_s wrote before aborting, the value is -1.
So _vsnwprintf_s isn't broken, but at the moment _ftprintf_s seems to be broken. _ftprintf_s might not be broken though, if the thing is executing in the "C" locale as someone guessed. I'll have to figure that out next.
I had to use the unsafe code outchars = _ftprintf_s(pf, szBuf); as suggested by Cezary Noweta instead of the safer code _fputts(szBuf, pf); as recommended by Kalle Olavi Niemitalo because when _fputts succeeds it returns a nonzero value which doesn't have to match the number of characters.
After the above experiment, I tried another one. Using Notepad, I created the log file in Unicode with no text. But _tfopen_s with _T("a") did not inspect the existing file to decide whether to keep Unicode as Unicode, it barged ahead and converted Unicode to ANSI and wrote the ANSI. Then opening the result in Notepad, since the BOM was still there, Notepad faithfully tried to display garbage ^_^
Now I have to add some calls to find out what locale the thing is executing in at the time, is it the Chinese Hong Kong locale (matching the system locale and user locale) or is it the "C" locale.
> Hello, > [quoted text clipped - 73 lines] > > Cezary Noweta Norman Diamond - 03 Aug 2007 08:25 GMT It does get worse.
I deleted the file and then ran the program with this code: _tfopen_s(&pf, LOG_FILE_NAME, _T("a, ccs=UNICODE"));
http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx * The flag is only used when no BOM is present or if the file is a new * file.
That is a lie. _tfopen_s created a new file and it created the thing with ANSI encoding not Unicode.
I deleted the file again, created a file in Notepad containing only an empty line (CR-LF pair), saved it in Unicode, and then again ran the program with this code: _tfopen_s(&pf, LOG_FILE_NAME, _T("a, ccs=UNICODE"));
http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx * If mode is "a, ccs=<encoding>", fopen_s will first try to open the file * with both read and write access. If it succeeds, it will read the BOM to * determine the encoding for this file;
This time _tfopen_s seems to have performed correctly. Now let's continue.
http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx * When a Unicode stream-I/O function operates in text mode (the default), * the source or destination stream is assumed to be a sequence of multibyte * characters. Therefore, the Unicode stream-input functions convert * multibyte characters to wide characters (as if by a call to the mbtowc * function). For the same reason, the Unicode stream-output functions * convert wide characters to multibyte characters (as if by a call to the * wctomb function).
In other words, it doesn't matter if _tfopen_s performed correctly because _ftprintf_s is still going to screw it up. Let's look for confirmation of this screw-up.
http://msdn2.microsoft.com/en-us/library/c4cy2b8e(VS.80).aspx * For the same reason, the Unicode stream-output functions convert wide * characters to multibyte characters (as if by a call to the wctomb * function).
Yup, no provision at all for keeping Unicode as Unicode.
However, both of those are half-lies. Half the time, _ftprintf_s violated MSDN and it kept Unicode as Unicode in the spirit (but not the letter) of http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx. The other half of the time, _ftprintf_s screwed up worse.
Notepad opened the file in Unicode. The display alternates, a bunch of readable lines, a few lines of garbage, a bunch of readable lines, a few lines of garbage, etc.
It seems that ccs=UNICODE is unusable. It changes the result from being mostly readable (with a little bit of lossage) to being half readable (with half garbage).
> OK, I ran approximately this test. The log file contains a lot of lines. > After every line of ordinary debugging information, there is a line with [quoted text clipped - 114 lines] >> >> Cezary Noweta Norman Diamond - 03 Aug 2007 08:50 GMT It gets even more worse.
I added this call: _ftprintf_s(pf, _T("%s\n"), _tsetlocale(LC_CTYPE, _T(""))); The output was: Chinese_Hong Kong S.A.R..950
So there is absolutely no excuse for _ftprintf_s to screw up on Chinese characters. The DLL is not running in the C locale, it's running in the Chinese Hong Kong locale, code page 950, exactly as it should be.
Here's more MSDN stuff too. http://msdn2.microsoft.com/en-us/library/x99tb11d(VS.80).aspx * LC_CTYPE * The character-handling functions (except isdigit, isxdigit, mbstowcs, and * mbtowc, which are unaffected).
So mbtowc is one of the exceptions, it wouldn't have been affected even if the C locale were in use, and presumably it would always use code page 950 and screw up because it's miscoded -- however, wctomb isn't one of the exceptions, so it would have been affected if the C locale were in use, and it would screw up differently from the way it actually screws up.
Anyway, thank you whoever it was who said that _vsnwprintf_s isn't broken and _ftprintf_s. Sorry I found it hard to believe you. You're absolutely right. _ftprintf_s is broken.
> OK, I ran approximately this test. The log file contains a lot of lines. > After every line of ordinary debugging information, there is a line with [quoted text clipped - 114 lines] >> >> Cezary Noweta Norman Diamond - 03 Aug 2007 09:01 GMT OMFG.
When I added this call: _ftprintf_s(pf, _T("%s\n"), _tsetlocale(LC_CTYPE, _T(""))); it didn't query the DLL's current locale the way MSDN says it will. It SET the current locale, and returned it: Chinese_Hong Kong S.A.R..950
And, the result of this setting activity did affect the way wctomb operates. And the result of that setting activity did affect the way _ftprintf_s operates.
The result was that _ftprintf_s wrote the user name correctly.
In ANSI.
中文2
The good news is that there's a workaround for the breakage in _ftprintf_s. The bad news is that I haven't finished learning how bad Windows can be.
> It gets even more worse. > [quoted text clipped - 153 lines] >>> >>> Cezary Noweta Norman Diamond - 03 Aug 2007 09:21 GMT Oh, I screwed up this last test. MSDN says my call to _tsetlocale() does set the locale instead of querying. A null pointer does a query but a null string sets it to the default from the OS.
OK, good news anyway, when setting to the default from the OS, it worked.
So I still don't know if the DLL actually started up in the C locale instead, but I'm not in the mood to test it again.
> OMFG. > [quoted text clipped - 187 lines] >>>> >>>> Cezary Noweta Cezary Noweta - 03 Aug 2007 14:30 GMT Hello,
> > Oooo... ,,92 86 95 B6 32'' - 14 chars of text.
> No, you're getting garbage because you're missing fonts and you couldn't > even display the original characters correctly. I looked them up this > morning so here they are: No - this is good. This code sequence is Japanes ANSI (932) You have written that your dev environment is Japanese and you coded your posts in ISO-2022-JP, so I have sent JAP ANSI codes and not CHS ones.
> This problem was reproduced in Chinese Vista > with no internationalization whatsoever.
> Nonetheless, if wide printf foos stop output because they are too stupid to > understand their own native default built-in code page after not being > customized at all, then I understand your suggestion that maybe the breakage > occurs in _ftprintf_s instead of _vsnwprintf_s. It does not matter what is your system ANSI CP. Also wide printf is not so stupid. ISO states that:
"At program startup, the equivalent of setlocale(LC_ALL, "C"); is executed."
That's all. Whan you want to play with mixed international language streams, and especially with Unicde or double-byte character sets, then you must use setlocale().
> I deleted the file and then ran the program with this code: > _tfopen_s(&pf, LOG_FILE_NAME, _T("a, ccs=UNICODE"));
> http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx > * The flag is only used when no BOM is present or if the file is a new > * file.
> That is a lie. _tfopen_s created a new file and it created the thing with > ANSI encoding not Unicode.
> I deleted the file again, created a file in Notepad containing only an empty > line (CR-LF pair), saved it in Unicode, and then again ran the program with > this code: > _tfopen_s(&pf, LOG_FILE_NAME, _T("a, ccs=UNICODE")); Everything is OK - look at the table below. When you are creating a new file you should use "ccs=UTF-16LE" and not "ccs=UNICODE" as "UNICODE" creates ANSI files.
> I added this call: > _ftprintf_s(pf, _T("%s\n"), _tsetlocale(LC_CTYPE, _T(""))); > The output was: > Chinese_Hong Kong S.A.R..950 So now you know that you have set locale to your system codepage. setlocale(LC_CTYPE, NULL) returns your current locale.
> _ftprintf_s is broken.
> The good news is that there's a workaround for the breakage in _ftprintf_s. fwprintf is not broken. This is not workaround. This is normal way to achieve an effect you wanted. Even according to the ISO standerd and not to MSDN or MS at all.
ISO states:
"The wide character output functions convert wide characters to multibyte characters and write them to the stream as if they were written by successive calls to the fputwc function. Each conversion occurs as if by a call to the wcrtomb function, with the conversion state described by the streams own mbstate_t object. The byte output functions write characters to the stream as if by successive calls to the fputc function."
and
"An encoding error occurs if the character sequence presented to the underlying mbrtowc function does not form a valid (generalized) multibyte character, or if the code value passed to the underlying wcrtomb does not correspond to a valid (generalized) multibyte character. The wide character input/output functions and the byte input/output functions store the value of the macro EILSEQ in errno if and only if an encoding error occurs."
After your _ftprintf() has been executed within "C" locale, errno contains EILSEQ. After you have called setlocale(LC_CTYPE, "") your locale is set to your system codepage and _ftprintf() works OK. Everything is OK.
> The bad news is that I haven't finished learning how bad Windows can be. Yea, but not this time ;)
-- best regards
Cezary Noweta
Norman Diamond - 06 Aug 2007 01:28 GMT >> > Oooo... ,,92 86 95 B6 32'' - 14 chars of text. >> [quoted text clipped - 5 lines] > written that your dev environment is Japanese and you coded your posts in > ISO-2022-JP, so I have sent JAP ANSI codes and not CHS ones. The default coding for new posts is ISO-2022-JP. The default for followup posts is to use the encoding of the post that is being quoted. But you set the encoding of your posts to Central European.
The program was running in a Chinese environment where the system and user code page was 950.
14 bytes of text is not 14 chars of text.
Internal to the program, the coding was Unicode not ANSI. Ordinarily it wouldn't convert to ANSI until _ftprintf_s writes to a file. From this discussion I learned that even _ftprintf_s won't convert it to ANSI unless the program does a call to use the system's code page instead of C locale.
> It does not matter what is your system ANSI CP. Also wide printf is not so > stupid. ISO states that: [quoted text clipped - 4 lines] > > That's all. I understand that now. Thank you.
> Whan you want to play with mixed international language streams, and > especially with Unicde or double-byte character sets, then you must use > setlocale(). You still don't understand this part of it though. There was no mixing of international language streams. The execution environment was 100% Chinese. And the need to use setlocale() applies to single-byte character sets as much as it does to double-byte character sets. _ftprintf_s should fail on several characters in your language when using your code page, and _ftprintf_s should fail on the English character £ when using Western Europe's code page, just as quickly as it failed on Chinese characters when using a Chinese code page.
Everyone has to call setlocale() and tell the CRT to use the system's code page. Hopefully we both understand this now.
>> I deleted the file and then ran the program with this code: >> _tfopen_s(&pf, LOG_FILE_NAME, _T("a, ccs=UNICODE")); [quoted text clipped - 14 lines] > file you should use "ccs=UTF-16LE" and not "ccs=UNICODE" as "UNICODE" > creates ANSI files. Oh, you are right. The flag UNICODE means ANSI. I wonder why the flag ANSI doesn't mean UNICODE. Who allowed some programmer to develop a CRT without a flag named ANSI and meaning UNICODE? At least it's a relief to see that I wasn't the least competent programmer in this discussion ^_^
Anyway now I understand to put this at the beginning of every program: _tsetlocale(LC_CTYPE, _T("")); Now I wonder how to find out if it's safe to call this from DllMain.
Norman Diamond - 06 Aug 2007 01:34 GMT > 14 bytes of text is not 14 chars of text. Ooops, internally I read "char" as "character" instead of "char". I get a C- today.
In C, 14 bytes of text is 14 chars of text. It isn't 14 characters and you didn't say that it is. Sorry.
Kalle Olavi Niemitalo - 06 Aug 2007 21:36 GMT > Anyway now I understand to put this at the beginning of every program: > _tsetlocale(LC_CTYPE, _T("")); > Now I wonder how to find out if it's safe to call this from DllMain. setlocale() and _wsetlocale() are not exported from Kernel32.dll, so you should assume that they are not safe to call from DllMain. If you link the C runtime statically into your DLL and audit its source code, then it may be safe, until the next library upgrade.
If you link to the C runtime DLL, then setlocale() can also affect other modules of the program, thereby increasing dependencies between them. With _create_locale() and _ftprintf_s_l(), you could better isolate DLLs from each other.
Kalle Olavi Niemitalo - 04 Aug 2007 14:24 GMT > I had to use the unsafe code > outchars = _ftprintf_s(pf, szBuf); Surely you could fix the bug with _ftprintf_s(pf, _T("%s"), szBuf).
(In standard C, L"%s" in a format string takes a char * argument, just like "%s". In Microsoft C, it instead takes a wchar_t * argument. I don't know if there is a way for Microsoft to correct this violation without breaking countless programs.)
Norman Diamond - 06 Aug 2007 01:32 GMT >> I had to use the unsafe code >> outchars = _ftprintf_s(pf, szBuf); > > Surely you could fix the bug with _ftprintf_s(pf, _T("%s"), szBuf). No, the reason this entire discussion started was because _T("%s") fails. Finally we understand the reason why _T("%s") fails (it was my fault for not adding a call to setlocale). Nonetheless changing _T("%s") to _T("%s") wouldn't fix it.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|