> Hi,
>
[quoted text clipped - 37 lines]
> reason the C++ compiler team decided not to try to optimize to this
> level?
The LOOP and MOVS instructions are horribly slow on modern CPUs because they
don't make effective use of the deep pipeline in the CPU. The longer
instruction sequence actually executes many times faster.
IIRC, VC++ did generate LOOP/MOVS years ago (VC1-4 maybe?), but has gone
away from using those constructs since maybe the Pentium.
-cd
Egbert Nierop (MVP for IIS) - 20 Mar 2006 16:20 GMT
>> Hi,
>> DWORD anArray [10000];
>>
[quoted text clipped - 24 lines]
> they don't make effective use of the deep pipeline in the CPU. The longer
> instruction sequence actually executes many times faster.
Interesting!
This seems to prove the remark of some C++ / ASM programmer somewhere on the
web. He stated that he could not optimize code anymore better than C++ did.
Normally, I tend to think 'ok, so he was not up to the task' but I seemed
wrong (again :-) ) .
Egbert Nierop (MVP for IIS) - 20 Mar 2006 17:36 GMT
>> Hi,
Nope,
I've beaten the C++ optimization by 25% (by testing 100MB !) but this might
be true on a ATHLON 64, not for other CPUS possibly...
Anyway, you were right, that one cannot state, the less ASM instructions,
the faster!
ps: Function below is not meant to 'decode' be for real (it skips unicode
coding). Just for fun...
void __stdcall AnsiToBstr(PCSTR ansi, BSTR bstr, int writtenLen)
{
//#ifdef _M_IX86
DWORD ticks = GetTickCount();
__asm XOR AH, AH // just to clear the high part of our unicode char (= 2
bytes)
__asm MOV ECX, writtenLen // initialize our loop
__asm DEC ECX // our loop counter
__asm MOV EDI, [bstr] // destination index
__asm MOV ESI, [ansi] // source index
__asm labell:
__asm MOV AL, BYTE PTR [ESI] // copy a string byte
__asm MOV [EDI], AX
__asm INC EDI
__asm INC EDI
__asm INC ESI
__asm DEC ECX
__asm JNZ labell
//#else
wprintf(L"%d\n", GetTickCount() - ticks);
ticks = GetTickCount();
for (int loopit =
writtenLen - 1;
loopit != 0;
loopit--, bstr++, ansi++)
bstr[0] = ansi[0];
wprintf(L"%d\n", GetTickCount() - ticks);
//#endif
}