Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / Managed C++ / March 2006

Tip: Looking for answers? Try searching our database.

Curious about loop optimization C++ - assembly

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Egbert Nierop (MVP for IIS) - 20 Mar 2006 10:42 GMT
Hi,

Out of curiousity, I sometimes look at the produced assembly after
compilation in release mode.

What you often see, is that CPP, always fully addresses registers to copy
values from a to b...

While stosb,stosw, stosd etc and the same for movs[x] are one statement, and
internally use registers ESI and EDI (source, destination) to copy data.

This seems (imho) more efficient, however, CPP never uses this construct...
it always uses  a lot more instructions.

imagine this loop (I simplified the idea, of course, memcpy would be
normally used)

DWORD anArray [10000];

// copy array while skipping uneven element positions

for (int mycounter=5000; mycounter != 0; mycounter--, element+=2)
   anArray[element] = somesource[element];

could be optimized to

setup source and destination

MOV EDI, [anArray]
MOV ESI, [somesource]
MOV ECX, myCounter
DEC ECX
CLD // forward copy

mylabel:
MOVSD     <--- actual loop and copy instruction
LOOP mylabel <-- decrement ECX until ECX == 0

Q: is the mentioned construct, simply not so efficient or is there a reason
the C++ compiler team decided not to try to optimize to this level?
Carl Daniel [VC++ MVP] - 20 Mar 2006 15:56 GMT
> Hi,
>
[quoted text clipped - 37 lines]
> reason the C++ compiler team decided not to try to optimize to this
> level?

The LOOP and MOVS instructions are horribly slow on modern CPUs because they
don't make effective use of the deep pipeline in the CPU.  The longer
instruction sequence actually executes many times faster.

IIRC, VC++ did generate LOOP/MOVS years ago (VC1-4 maybe?), but has gone
away from using those constructs since maybe the Pentium.

-cd
Egbert Nierop (MVP for IIS) - 20 Mar 2006 16:20 GMT
>> Hi,

>> DWORD anArray [10000];
>>
[quoted text clipped - 24 lines]
> they don't make effective use of the deep pipeline in the CPU.  The longer
> instruction sequence actually executes many times faster.

Interesting!

This seems to prove the remark of some C++ / ASM programmer somewhere on the
web. He stated that he could not optimize code anymore better than C++ did.
Normally, I tend to think 'ok, so he was not up to the task' but I seemed
wrong (again :-) ) .
Egbert Nierop (MVP for IIS) - 20 Mar 2006 17:36 GMT
>> Hi,

Nope,
I've beaten the C++ optimization by 25% (by testing 100MB !) but this might
be true on a ATHLON 64, not for other CPUS possibly...

Anyway, you were right, that one cannot state, the less ASM instructions,
the faster!

ps: Function below is not meant to 'decode' be for real (it skips unicode
coding). Just for fun...

void __stdcall AnsiToBstr(PCSTR ansi, BSTR bstr, int writtenLen)
{
//#ifdef _M_IX86
DWORD ticks = GetTickCount();
__asm XOR AH, AH  // just to clear the high part of our unicode char (= 2
bytes)
__asm MOV ECX, writtenLen // initialize our loop
__asm DEC ECX   // our loop counter
__asm MOV EDI, [bstr] // destination index
__asm MOV ESI, [ansi] // source index
__asm labell:
__asm MOV AL, BYTE PTR [ESI] // copy a string byte
__asm MOV [EDI], AX
__asm INC EDI
__asm INC EDI
__asm INC ESI
__asm DEC  ECX
__asm JNZ labell

//#else
wprintf(L"%d\n", GetTickCount() - ticks);
ticks = GetTickCount();

for (int loopit =
 writtenLen - 1;
 loopit != 0;
 loopit--, bstr++, ansi++)
bstr[0] = ansi[0];

wprintf(L"%d\n", GetTickCount() - ticks);

//#endif
}

Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.