Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / CLR / April 2007

Tip: Looking for answers? Try searching our database.

Managed vs Unmanaged Bare Bones Performance Test

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
adhingra - 19 Apr 2007 21:52 GMT
At our company we are currently at a decisive point to choose between managed
and unmanaged code on the basis of their performance. I have read stuff about
this on various blogs and other websites. Then I decided to take my own test
as I am more concerned with basic performance at this point.

By basic I mean, just the basic stuff inside the CLR i.e. function calling
cost, for loop, variable declaration, etc. Let us not consider GC, memory
allocation costs, etc.

To my surprise the managed code I generated in my test through C# was
lagging behind to a considerable degree when compared with the code generated
by the C++ compiler.

I was wondering if someone can take a quick look at this and tell me why is
this the case. I was under the assumption, once the JIT happens, the CLR
virtual machine and JIT will give the same performance as native C++ compiler
does (as we are talking basic stuff only - no objects, just pure language
constructs and primitive data types).

I created two sample console applications (one in C# and other in C++). They
both call a function passing an int by value from inside a for loop. Nothing
happens inside the function. I used QueryPerformance.... apis for
measurement. (Code is pasted at the bottom of this posting).

Here are the results (for release mode running from console, with default
settings in the IDE)

C# Test     for loop (50000 iterations)        0.000023931    (23 micro seconds)
C++ Test    for loop (50000 iterations)        0.000000350    (0.35 micro seconds)

So its like C++ compiler is about 20 times faster than the managed CLR
Jitter. And if I also remove time taken for the QueryPerf...... apis then the
diff is even more

Can anyone please elaborate.

Thanks
adhingra

===========================================
C# Code                PROGRAM.CS
===========================================

using System;
using System.Collections.Generic;
using System.Text;
using System.Runtime.InteropServices;

namespace ConsoleApp
{
   class Program
   {
       //API declarations for frequency timers
       [DllImport("kernel32.dll")]
       extern static short QueryPerformanceCounter(ref long x);
       [DllImport("kernel32.dll")]
       extern static short QueryPerformanceFrequency(ref long x);

       static long m_lStart = 0, m_lStop = 0, m_lFreq = 0;
       static long m_lOverhead = 0;
       static decimal m_mTotalTime = 0;

       static void Main(string[] args)
       {
            //get the CPU frequency
            QueryPerformanceFrequency(ref m_lFreq);

           //record the overhead for calling the performance counter API
           QueryPerformanceCounter(ref m_lStart);
           QueryPerformanceCounter(ref m_lStop);

           m_lOverhead = m_lStop - m_lStart;

           Console.WriteLine("Starting with a simple For Loop calling a
simple function");

           QueryPerformanceCounter(ref m_lStart);
           for (int i = 0; i < 50000; i++)
           {
               Run(i);
           }
           QueryPerformanceCounter(ref m_lStop);

           long lDiff = m_lStop - m_lStart;
           Console.WriteLine(lDiff);
           //Comment or Uncomment the overhead lines to see the times drop
           //
           //if (lDiff > m_lOverhead)
           //{
           //    lDiff = lDiff - m_lOverhead;
           //}

           m_mTotalTime = ((Decimal)lDiff)/((Decimal)m_lFreq);
           Console.WriteLine(m_mTotalTime);

           Console.WriteLine("Press Enter to Continue");
           Console.ReadLine();
       }

       static void Run(int i)
       {
           //Console.WriteLine(i);
       }
   }
}

===============================================
C++ Code            ConsoleApp.cpp
===============================================

// ConsoleApp.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

void Run(int i)
{
   //printf("%d\n",i);
}

int _tmain(int argc, _TCHAR* argv[])
{
    LARGE_INTEGER m_start, m_stop, m_freq;
    ::QueryPerformanceFrequency(&m_freq);

        //record the overhead for calling the performance counter API
        ::QueryPerformanceCounter(&m_start);
        ::QueryPerformanceCounter(&m_stop);

        LONGLONG m_overhead = m_stop.QuadPart - m_start.QuadPart;
        m_start.QuadPart = 0;   
        m_stop.QuadPart = 0;

    printf("%s\n","Starting with a simple For Loop calling a simple function");

   QueryPerformanceCounter(&m_start);
   for (int i = 0; i < 50000; i++)
   {
       Run(i);
   }
   QueryPerformanceCounter(&m_stop);

    LONGLONG lDiff = m_stop.QuadPart - m_start.QuadPart;
    printf("%d\n",lDiff);
   //Comment or Uncomment the overhead lines to see the times drop
   //
   //if (lDiff > m_overhead)
   //{
   //    lDiff = lDiff - m_overhead;
   //}

    double totalTime = ((double)lDiff) / ((double)m_freq.QuadPart);
    printf("%15.15f\n",totalTime);

    printf("%s", "Press Enter to Continue");

    int c = getchar();
    return 0;
}
Willy Denoyette [MVP] - 19 Apr 2007 22:22 GMT
> At our company we are currently at a decisive point to choose between managed
> and unmanaged code on the basis of their performance. I have read stuff about
[quoted text clipped - 155 lines]
> return 0;
> }

This kind of benchmarh is meaningless..
The reason for the huge difference is that the  C++ compiler hoists the loop, as it sees no
sensible reason to call an empty function 50000 times, the C# compiler does not do this, it
simply calls the function which only contains a ret.
So what you are comparing is the time taken for a return from QueryPerformanceCounter plus
the time to call QueryPerformanceCounter, against a the time taken to call 50000 times an
empty function.

Willy.
Ben Voigt - 19 Apr 2007 22:38 GMT
> This kind of benchmarh is meaningless..
> The reason for the huge difference is that the  C++ compiler hoists the
> loop, as it sees no sensible reason to call an empty function 50000 times,
> the C# compiler does not do this, it simply calls the function which only
> contains a ret.

Inlining and optimizing away a call to an empty function is well within the
capabilities of the CLR JIT.

> So what you are comparing is the time taken for a return from
> QueryPerformanceCounter plus the time to call QueryPerformanceCounter,
> against a the time taken to call 50000 times an empty function.
>
> Willy.
Jon Skeet [C# MVP] - 19 Apr 2007 22:42 GMT
> > This kind of benchmarh is meaningless..
> > The reason for the huge difference is that the  C++ compiler hoists the
[quoted text clipped - 4 lines]
> Inlining and optimizing away a call to an empty function is well within the
> capabilities of the CLR JIT.

That was my thought too. I suspect it'll still perform the loop
iteration, however, whereas the C++ compiler may well have removed that
loop completely, which still means it's not a good benchmark.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Ben Voigt - 19 Apr 2007 22:48 GMT
>> > This kind of benchmarh is meaningless..
>> > The reason for the huge difference is that the  C++ compiler hoists the
[quoted text clipped - 11 lines]
> iteration, however, whereas the C++ compiler may well have removed that
> loop completely, which still means it's not a good benchmark.

Oh, and if it's desired not to have the loop optimized away, touch a
volatile variable from inside the function.
Willy Denoyette [MVP] - 20 Apr 2007 11:15 GMT
>>> > This kind of benchmarh is meaningless..
>>> > The reason for the huge difference is that the  C++ compiler hoists the
[quoted text clipped - 11 lines]
> Oh, and if it's desired not to have the loop optimized away, touch a volatile variable
> from inside the function.

True, with the following results:

C# (JIT32)
244
0,0000681650880209635582175947
C# (JIT64)
86
0,0000240253998762412541258735

C++ (O2 switch) 32 bit and 64 bit
164
0.000045815878834

see, C++ is faster than JIT 32 but slower than JIT64 code, for this particular case (50000
iterations),  however such kind of micro-benchmarks have absolutely no value. For instance
make the loop count an odd number (eg. 49999) and you get  the same results  for C# JIT64
and C++.
Note that the JIT32 is known as not to being a great loop optimizer ;-).

Willy.
Willy Denoyette [MVP] - 20 Apr 2007 10:27 GMT
>> This kind of benchmarh is meaningless..
>> The reason for the huge difference is that the  C++ compiler hoists the loop, as it sees
[quoted text clipped - 3 lines]
> Inlining and optimizing away a call to an empty function is well within the capabilities
> of the CLR JIT.

True for optimized builds, the call is hoisted, but the loop is not hoisted by the JIT, the
C++ compiler (optimized build) effectively hoists the loop.
What's produced by the JIT depends on the version of the CLR.

this snip of the code:
           for (int i = 0; i < 50000; i++)
           {
               Run(i);
           }

is turned into into:

xor     r11d,r11d
add     r11d,4
cmp     r11d,0C350h
jl      00000642`80150341 (jump to add r11d, 4 if less than)

by the JIT64, while the JIT32 (both v2 of the CLR), produces

xor     eax,eax
add     eax,1
cmp     eax,0C350h
jl      001e014b (jump to add eax, 1 if less than)

see the subtle difference:
add r11d, 4
and
add eax, 1

here the JIT64 is cheating , no big deal in this case, but I would prefer some more
consistent behavior across JIT versions, here I mean hoist the loop, or keep the loop as is,
but don't cheat.

Willy.
Ben Voigt - 19 Apr 2007 22:43 GMT
> Here are the results (for release mode running from console, with default
> settings in the IDE)
[quoted text clipped - 6 lines]
> the
> diff is even more

Did you actually measure the time for QueryPerf?  Ok, I see that you did.
Those are native Win32 APIs, C++ will call them much faster than C#.

.35 microseconds is an extremely short time.  Even 23 is too short for a
useful benchmark.  Run more iterations.  In fact, run 50000 iterations
first, ignoring the result, to force .NET to precompile everything.  Then
run a half billion or so iterations and compare the results.
adhingra - 19 Apr 2007 23:10 GMT
Sorry

I am late with my comments. Shortly after posting this, I realized that this
is a problem with my test as the C++ compiler is optimizing the whole thing
away. (Looked at the disassembly)

However this does not make the benchmark obsolete, rather than measuring the
performance, it actually measured the smartness of the two compilers. I did
some more research and talked to one of my collegeues here at work who is an
expert with C++ and even try making the code do more so that I can fool the
C++ compiler to actually call the function. But the guy is way too smart and
I was told the reason behind this extreme smartness is "Whole Program
Optimization" offered by the VS 2005 Linker.

If the compilation unit is different (i.e. my function is in a different cpp
file) this would not have happened in VS2003, but 2005 is a different beast
of its own with this whole program optimization. The linker no longer just
combine objs anymore, its more like an interpreter now and smart enough to
chip chop objs

But Like Ben pointed out inlining and optimizing are in the feature set of
the Jitter too.
I think I know may be why the Jitter in managed code does not do it because
the Jitter the compiling the one function at a time and it does not have the
luxury due to time constraint to check the whole program and see that the
whether the results of a function are used any where are not.

However I still think it should have jitted away an empty function.

Thanks All
adhingra
Barry Kelly - 20 Apr 2007 00:06 GMT
I wish you wouldn't multipost.

> However this does not make the benchmark obsolete, rather than measuring the
> performance, it actually measured the smartness of the two compilers.

It measured how good the C++ compiler is at doing nothing, versus the
.NET JIT compiler. I agree, C++ is good for nothing.

:)

> I did
> some more research and talked to one of my collegeues here at work who is an
> expert with C++ and even try making the code do more so that I can fool the
> C++ compiler to actually call the function. But the guy is way too smart and
> I was told the reason behind this extreme smartness is "Whole Program
> Optimization" offered by the VS 2005 Linker.

.NET necessarily does whole program optimization because compilation
happens so late; but it is constrained by the amount of time it has to
work with - compilation must occur quickly. Performance will improve
over time, when .NET adds techniques that are common in Java, such as
recompiling with more aggressive optimization after many iterations.

-- Barry

Signature

http://barrkel.blogspot.com/

Jon Skeet [C# MVP] - 20 Apr 2007 07:27 GMT
<snip>

> Performance will improve over time, when .NET adds techniques that
> are common in Java, such as recompiling with more aggressive
> optimization after many iterations.

It'll be interesting to see whether or not this ever happens. In Java,
it made a huge difference, because by having dynamic optimisation (and
de-optimisation) you can inline virtual methods until they're first
overridden. That's really important when the language makes methods
virtual by default, but not as important in a world which requires you
to specify that methods are virtual (which at least C# does - not sure
about VB.NET).

There are other improvements as well, of course, and it could improve
start-up time (one would hope) but the effects won't be quite as huge
as they were in the Java world.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Jon Skeet [C# MVP] - 20 Apr 2007 07:25 GMT
> I am late with my comments. Shortly after posting this, I realized that this
> is a problem with my test as the C++ compiler is optimizing the whole thing
> away. (Looked at the disassembly)
>
> However this does not make the benchmark obsolete, rather than measuring the
> performance, it actually measured the smartness of the two compilers.

It measures the smartness of the compilers in *one* particular
situation. Do you often run a loop which does nothing? I know I don't.

<snip>

> However I still think it should have jitted away an empty function.

I strongly suspect that it did, by inlining. It just didn't optimise
away the loop itself.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too


Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.