.NET Forum / .NET Framework / General / January 2005
Performance: VC++ 33% slower then Builder 5 on LineTo() API call??
|
|
Thread rating:  |
Gustavo L. Fabro - 07 Jan 2005 18:29 GMT Greetings!
Getting straight to the point, here are the results of my experiment. I've included my comments and questions after them.
The timing: (The total time means the sum of each line's drawing time. Time is measured in clock ticks (from QueryPerformanceCounter() API). The processor resolution (QueryPerformanceFrequency()) for my machine is 3579545). ------------------------------------------ Visual Studio .NET 2003 Total time: 717230 Average: 89.8165625
Borland Builder 5: Total Time: 482151 Average: 61.0975
The code (for the DLL): ------------------------------------------ DrawDll.h #ifdef DRAWDLL_EXPORTS #define DRAWDLL_API __declspec(dllexport) #else #define DRAWDLL_API __declspec(dllimport) #endif
class DRAWDLL_API CDrawDll { public: CDrawDll(void); void MyMethod(HWND handle); };
DrawDll.cpp #include "stdafx.h" #include "DrawDll.h" #include <stdio.h>
BOOL APIENTRY DllMain( HANDLE hModule, DWORD ul_reason_for_call, LPVOID lpReserved ) { return TRUE; }
void CDrawDll::MyMethod(HWND handle) {
HDC hDC = ::GetDC(handle);
LARGE_INTEGER m_StartCounter; // start time LARGE_INTEGER m_EndCounter; // finish time __int64 m_ElapsedTime; char buff2[255];
//For 800 different positions for(int x=0;x<800;x++) { //10 times on each position for(int rep=0;rep<10;rep++) { QueryPerformanceCounter (&m_StartCounter);
::MoveToEx(hDC, x,0, NULL); ::LineTo(hDC, 50+x,50);
QueryPerformanceCounter (&m_EndCounter);
//get and store finishing time and calc elapsed time(ticks) m_2ElapsedTime = (m_2EndCounter.QuadPart - m_2StartCounter.QuadPart ); sprintf(buff2, "%d\n", m_2ElapsedTime); OutputDebugString(buff2); } }
ReleaseDC(handle, hDC);
}
CDrawDll::CDrawDll() { return; }
The explanation ---------------------------------------------------------
In the translation process from a big project to Visual Studio, I started facing some performance problems. Things were much slower on the VS compiled executables. I went to study what exactly was happening and got to some staring (to my point of view) conclusions.
I made a DLL and compiled it on Builder and Visual C++ .NET, with all optimizations enabled for both compilers. The DLL has a class with only one function, that gets a handle for a DC and draws 8.000 lines on it.
I made 2 executables that run the function from the DLL (compiled with both compilers too).
The results were astonishing, for me, and I'd like an explanation for what is happening.
I've run the test several times and the results are always of the same magnitude. How can that be, if the only thing I'm doing is MoveTo() and LineTo() API calls?
It's something simple! I'm not playing with the disk, loading large chunks of memory, using managed extensions (I created a 'pure' Win32 project under VS), anything that could relate with performance. Only 2 simple API calls.
Is Visual C++ really THAT MUCH slower?
I have the complete code and compiled executables here and will be glad to send to anyone who wants to replicate the test. As for this posting is concerned:
- Is VS compiled DLLs and/or executables inherently slower then, for instance, Builder 5? - Why does a simple API call takes that longer? Isn't it the same API call? Shouldn't the call be fast and the API function itself take longer? - Is there anything I can do/try to make the code run faster?
We would like to migrate other big projects for Visual C++, but now we're having second thoughts!
Waiting for a light,
Gustavo L. Fabro
Jonathan Allen - 07 Jan 2005 20:30 GMT Could you give me an example of when I would want to call that function 8000 times in a tight loop?
Jonathan
> Greetings! > [quoted text clipped - 133 lines] > > Gustavo L. Fabro Gustavo L. Fabro - 08 Jan 2005 00:16 GMT > Could you give me an example of when I would want to call that function > 8000 > times in a tight loop? In a CAD application, for instance. The function is only called one time, what it does is draw 8000 lines.
On a regular CAD drawing much more then 8.000 lines are needed for the complete drawing to take place.
> Jonathan > [quoted text clipped - 135 lines] >> >> Gustavo L. Fabro Tim Robinson - 08 Jan 2005 00:52 GMT >>Could you give me an example of when I would want to call that function >>8000 [quoted text clipped - 7 lines] > complete > drawing to take place. In a CAD program you wouldn't be calling QueryPerformanceCounter or OutputDebugString for each line:
for(int rep=0;rep<10;rep++) { QueryPerformanceCounter (&m_StartCounter);
::MoveToEx(hDC, x,0, NULL); ::LineTo(hDC, 50+x,50);
QueryPerformanceCounter (&m_EndCounter);
//get and store finishing time and calc elapsed time(ticks) m_2ElapsedTime = (m_2EndCounter.QuadPart - m_2StartCounter.QuadPart ); sprintf(buff2, "%d\n", m_2ElapsedTime); OutputDebugString(buff2); }
QPC and ODS both have high overhead: each involve a transition to kernel mode and back; QPC samples the hardware timer; and when a debugger is attached, ODS effectively triggers an exception, which causes a full context switch to the debugger and back.
Move the benchmarking code to the outside of the outer loop -- time the whole operation -- and then compare results.
 Signature Tim Robinson (MVP, Windows SDK) http://mobius.sourceforge.net/
Phil Frisbie, Jr. - 07 Jan 2005 22:20 GMT > Greetings! > [quoted text clipped - 15 lines] > Total Time: 482151 > Average: 61.0975 Did you look at the assembly produced by both compilers?
But artificial tests like this rarely mean anything in real applications...
 Signature Phil Frisbie, Jr. Hawk Software http://www.hawksoft.com
Gustavo L. Fabro - 08 Jan 2005 00:48 GMT > Did you look at the assembly produced by both compilers? By this time I unfortunately don't have the necessary knowledge in assembly language to be able to tell something concrete out of 2 given codes. If that helps I can disassemble both DLLs and post the code here!
> But artificial tests like this rarely mean anything in real > applications... I'm afraid this is not the case here. This test is just a replication of something I have seen in practice. Our CAD application took 5 times longer to draw the same file in the screen with the VS compiled version then with our Builder compiled one.
As the application itself has lots of classes, DLLs, and we used managed and unmanaged C++ in the middle, I tried to first check out if the API calls themselves, after all the processing (of elements, points positions, etc) were running at the same speed. In case that was true, I would then try to focus on the managed/unmanaged approach, DLL interaction and other factors.
But when I saw that even the API drawing calls themselves were taking longer, I got intrigued... And decided to do this test! Hence the results here demonstrated and the question: Is it *really* like this?
Fabro
Luis Miguel Huapaya - 10 Jan 2005 17:43 GMT Just out of curiosity. How come you are not using hardware to render your lines (i.e. DirectX). If performance is an issue, using DirectX to draw lines would give you a seemingly infinite boost in performance compared to rendering your lines in software (even anti-aliased lines).
Just curious.
cheers, Luis Miguel Huapaya
> > Did you look at the assembly produced by both compilers? > [quoted text clipped - 23 lines] > > Fabro Gustavo L. Fabro - 10 Jan 2005 18:19 GMT > Just out of curiosity. How come you are not using hardware to render your > lines (i.e. DirectX). If performance is an issue, using DirectX to draw > lines > would give you a seemingly infinite boost in performance compared to > rendering your lines in software (even anti-aliased lines). Hmmm... As far as I know (or knew), GDI calls are accelerated by hardware when available (and when the "Hardware Acceleration" slider in Control Panel, Video, Configuration, Advanced, Problem Solving is not all to the left).
The profiling for the problem of this post, for instance, was made using a computer with the "Hardware Acceleration" slider a couple of notches to the left. It took an average of 270206 ticks to draw 8.000 lines. With hardware acceleration fully enabled, the time droped to 62123.
Am I wrong? If DirectX could give an infinite boost in performance I would definitely be interested!
Fabro
> Just curious. > [quoted text clipped - 34 lines] >> >> Fabro Tom Widmer - 11 Jan 2005 09:46 GMT >>Just out of curiosity. How come you are not using hardware to render your >>lines (i.e. DirectX). If performance is an issue, using DirectX to draw [quoted text clipped - 14 lines] > Am I wrong? If DirectX could give an infinite boost in performance > I would definitely be interested! Yes, such 2D calls generally are accelerated by hardware. However, for the ultimate in speed, you should perhaps render using 3D hardware, although this requires a lot of extra programming work. This would be appropriate for a CAD application though, perhaps.
Tom
Gustavo L. Fabro - 10 Jan 2005 17:26 GMT Greetings!
Thanks everybody for the comments. I've ran the tests again, and indeed it was my mistake.
As Tim suggested, With the profiling code on the outer loop (eliminating a great overhead) and putting the call in a better place (I was using menus, but that XP's "fading effect" time was interfering in the timing) the results I got matched what I expected in the first place:
Visual Studio: 269996
Borland: 270206
I can now go through the code and try to find what is really affecting the speed (I had stopped when I saw this).
Answering to Carl, I wasn't compiling using managed code. Will do so later on in my quest to see what is happening in our program.
And commenting Ken's reply, I appreciate the tips for reducing context switching time's interference in the profiles for a better timing. Will use that next time if I find myself in a similar situation!
Fabro
Carl Daniel [VC++ MVP] - 08 Jan 2005 00:58 GMT > Greetings! > > Getting straight to the point, here are the results > of my experiment. I've included my comments and questions > after them. What command-line options are you using for the VC++ build? If you're compiling it as managed code (/clr) I wouldn't be surprised to see a 33% speed reduction since you'd be transitioning in and out of managed code several times per iteration of your timing loop.
-cd
Ken Hagan - 10 Jan 2005 11:20 GMT > I've run the test several times and the results are always of the > same magnitude. How can that be, if the only thing I'm doing is [quoted text clipped - 6 lines] > > Is Visual C++ really THAT MUCH slower? Well, first off, as you state yourself, the portion of the code that your compilers generated is only a fraction of the full overhead. The work of the API calls is done by the same (OS) code in both cases, so the results are not comparing VC with Builder 5.
Having said that, if your original application shows the same behaviour then it is quite reasonable for you to ask for an explanation!
Try comparing the interval "m_ElapsedTime" with a millisecond or so. (357954 in your case). If the APIs take more than that, then you've suffered a context switch and you should ignore that time interval. If this is the case, the question ceases to be "why is VC slower" but becomes "why is VC provoking context switches" and the answer probably lies in the run-time library rather than the compiler's code generation.
A similar test is to use an array of "m_ElapsedTime[10]" and collect ten iterations of the inner loop between tracing. Yet another test might be to insert Sleep(0) at the start of the inner loop. If either of these affects the results, your problem is context switching.
Another variation is to use the RDTSC instruction...
__declspec(naked) __int64 Rdtsc() { __asm rdtsc; __asm ret; }
This is a higher resolution timer with much lower calling overheads.
Oh, and lastly, %d isn't the correct format for an __int64 variable.
Derrick Coetzee [MSFT] - 11 Jan 2005 00:27 GMT > for(int rep=0;rep<10;rep++) > { [quoted text clipped - 15 lines] > Total Time: 482151 > Average: 61.0975 I can't explain the speed difference in your experiment, but I can say that if you are writing or porting an application for which drawing primitives are a critical bottleneck, such as the CAD applications you cite in a later post, you should seriously consider using a performance-oriented graphics library such as DirectX or OpenGL, which takes advantage of modern hardware. The GDI is, quite frankly, rarely up to the task of serious graphics work, just simple business graphics such as bar charts and buttons.
 Signature Derrick Coetzee, Microsoft Speech Server developer This posting is provided "AS IS" with no warranties, and confers no rights. Use of included code samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|