Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / CLR / May 2006

Tip: Looking for answers? Try searching our database.

Fast tailcall

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Ole Nielsby - 10 May 2006 23:48 GMT
Are there any things one can do to assist the jit in generating
efficient tail call optimization?

If a method with many parameters ends calling a similar
method with a similar but not identical signature, and some
of the parameters are passed unchanged, will the jit then
discover it can simply leave the parameters on the stack?

If two signatures are of differnt length but have some
common parameters that are, in some cases, passed
unchanged from one call to the other, would it be better
to place these parameters in the beginning or in the end
of the parameter list?

Will tail calls be faster if the signatures are padded with
dummies to have the same length?

Thanks/Ole N.
Tasos Vogiatzoglou - 16 May 2006 14:10 GMT
I remember seeing a tail jump command in CLR but I do not think that
the compiler supports that.

I've tried some experiments that would for sure lead to tail call
optimization but the disassembly didn't indicate a tail jump.

Is tail jump supported by the C# compiler ?

Regards
Tasos
Barry Kelly - 16 May 2006 14:45 GMT
> I remember seeing a tail jump command in CLR but I do not think that
> the compiler supports that.
[quoted text clipped - 3 lines]
>
> Is tail jump supported by the C# compiler ?

I'm fairly sure it isn't. One of the biggest reasons (AFAIK) is because
it removes a frame from the stack for code access security checks.

-- Barry
Tasos Vogiatzoglou - 16 May 2006 19:39 GMT
I don't think this is the reason (after all tailcall opcode exists).
Sure it poses some restrictions but the CLR documentation states that
only full trusted code can perform a tailcall.
Barry Kelly - 16 May 2006 20:27 GMT
> I don't think this is the reason (after all tailcall opcode exists).
> Sure it poses some restrictions but the CLR documentation states that
> only full trusted code can perform a tailcall.

This is a brilliant reason why the optimization can't be implemented
generally, isn't it?

Since the C# compiler would have to choose at *compile* time (not *JIT
compile* time) to use tail.call, how could it know if the code is going
to be fully trusted or not?

It makes perfect sense as the reason to me.

-- Barry
Jeremy Chapman - 16 May 2006 23:44 GMT
Isn't there a way to ensure the code only runs fully trusted (using
attributes)?  If so, then you could know at compile time whether or not you
can implement this optimization.

>> I don't think this is the reason (after all tailcall opcode exists).
>> Sure it poses some restrictions but the CLR documentation states that
[quoted text clipped - 10 lines]
>
> -- Barry
Barry Kelly - 17 May 2006 02:35 GMT
> Isn't there a way to ensure the code only runs fully trusted (using
> attributes)?  If so, then you could know at compile time whether or not you
> can implement this optimization.

It might make sense via a compiler switch. .NET modules(1) can be
compiled separately and linked together later to form an assembly, so
any such security attributes may be in a different module from one being
compiled to support tail.call, so just an attribute probably wouldn't do
the trick.

(1) See csc flags /target:module, /addmodule:<file list>

-- Barry
Ole Nielsby - 16 May 2006 20:59 GMT
> > I'm fairly sure [tail jump] isn't [supported by the C# compiler].
> > One of the biggest reasons (AFAIK) is because it removes a
[quoted text clipped - 3 lines]
> Sure it poses some restrictions but the CLR documentation
> states that only full trusted code can perform a tailcall.

Which seems consistent with Barry K.'s explanation: If security
checks require the frame to be there, untrusted code can't be
allowed to tail jump.

I'm still puzzled by the clr design (was this restriction really
necessary or is it there because of a half-hearted tail call
implementation?) but the C# compiler team's decision not
to support tail calls does make sense to me, at last.

Regards/Ole Nielsby
Tasos Vogiatzoglou - 17 May 2006 07:14 GMT
I assume that if the tail. command is supported by the jit (I do not
think that is supported) it will be used in rather rare conditions of
fully trusted code and perhaps code that does not access the execution
stack (via stacktrace or sth) .

But consider the following :

method private hidebysig instance int32 factorial(int32 x,int32 a) cil
managed
{
     // Code size       19 (0x13)
     .maxstack  4

     IL_0000:  ldarg.1
     IL_0001:  ldc.i4.1

     IL_0002:  bge.s      IL_0006
     IL_0004:  ldarg.2

     IL_0005:  ret
     IL_0006:  ldarg.0

     IL_0007:  ldarg.1
     IL_0008:  ldc.i4.1

     IL_0009:  sub
     IL_000a:  ldarg.1
     IL_000b:  ldarg.2
     IL_000c:  mul

     IL_000d:  call       instance int32
Class1::factorial(int32,int32)
     IL_0012:  ret
} // end of method Class1::factorial

This is an optimized version of a factorial method by the C# compiler.

Consider also this

method private hidebysig instance int32 factorial(int32 x,int32 a) cil
managed
{
     // Code size       19 (0x13)
     .maxstack  4

     IL_0000:  ldarg.1
     IL_0001:  ldc.i4.1

     IL_0002:  bge.s      IL_0006
     IL_0004:  ldarg.2

     IL_0005:  ret
     IL_0006:  ldarg.0

     IL_0007:  ldarg.1
     IL_0008:  ldc.i4.1

     IL_0009:  sub
     IL_000a:  ldarg.1
     IL_000b:  ldarg.2
     IL_000c:  mul
     IL_000d:  tail.
     IL_000e:  call       instance int32
Class1::factorial(int32,int32)
     IL_0012:  ret
} // end of method Class1::factorial

The second is a method that discards the stack (tail.) . If I am not
doing something wrong there is a big drop of the speed when I added the
tail. command.

514 ms (with tailcall) / 77 ms (without tailcall).

I cannot understand this ... Can anyone provide any helpful insight ?

Regards,
Tasos
Barry Kelly - 17 May 2006 14:01 GMT
> I assume that if the tail. command is supported by the jit (I do not
> think that is supported) it will be used in rather rare conditions of
> fully trusted code and perhaps code that does not access the execution
> stack (via stacktrace or sth) .

I think it is not optimized because no mainstream language currently
uses it. It certainly is implemented in so far as the stack does not
grow when you use tail. call to jump to the start of a method.

> 514 ms (with tailcall) / 77 ms (without tailcall).
>
> I cannot understand this ... Can anyone provide any helpful insight ?

The fact is, tail. call is not optimized via JIT to a jump currently.
And when you try to debug all this under VS 2005, it lies to you about
the code!

I implemented a similar piece of code to you. I started with this:

---8<---
using System;

class App
{
   static double ArithmeticSum(int number, double result)
   {
       if (number == 0)
           return result;
       return ArithmeticSum(number - 1, number + result);
   }
   
   static void Main()
   {
       double result = 0;
       for (int i = 0; i < 10000; ++i)
           result = ArithmeticSum(10000, 1);
       Console.WriteLine(result);
   }
}
--->8---

I disassembled with ildasm, and rewrote ArithmeticSum to use tail. call:

---8<---
// ...
   IL_000c:  ldarg.1
   IL_000d:  stloc.0
   IL_000e:  br.s       exit
// ...
   IL_0015:  ldarg.1
   IL_0016:  add
   
   tail. call float64 App::ArithmeticSum(int32, float64)
   ret
   
exit:
    ldloc.0
    ret
 } // end of method App::ArithmeticSum
--->8---

I reassembled with ilasm /debug=OPT and, like you, I found that the
tail-call version was much slower. So, I started "devenv /debugexe
Test.exe", and stepped into the code.

This is what VS says about the JIT compiled code in the disassembly
window:

---8<---
   IL_0000:  nop
00000000  push        ebp  
00000001  mov         ebp,esp
00000003  push        edi  
00000004  push        esi  
00000005  push        ebx  
00000006  push        eax  
00000007  fld         qword ptr [ebp+8]
   IL_0001:  ldarg.0
0000000a  test        ecx,ecx
0000000c  setne       al  
0000000f  movzx       eax,al
   IL_0009:  ldloc.1
00000012  test        eax,eax
00000014  jne         00000018

   IL_000c:  ldarg.1
00000016  jmp         00000039

   IL_0010:  ldarg.0
00000018  mov         dword ptr [ebp-10h],ecx
0000001b  fild        dword ptr [ebp-10h]
0000001e  faddp       st(1),st
00000020  sub         esp,8
00000023  fstp        qword ptr [esp]
00000026  dec         ecx  
00000027  mov         eax,dword ptr ds:[00923028h]
0000002d  push        2    
0000002f  push        2    
00000031  push        1    
00000033  push        eax  
00000034  call        791B69B0         // <------- NOTE
00000039  pop         ecx  
   
exit:
    ldloc.0
0000003a  pop         ebx  
0000003b  pop         esi  
0000003c  pop         edi  
0000003d  pop         ebp  
0000003e  ret         8    
--->8---

The important bit to note is the call to 791B69B0. Even in VS 2005 mixed
mode debugging, it won't let you step into this code. When you try to
step into it, the instruction pointer jumps back to the start of the
method - effectively the call is implementing the tail call, but VS is
"helpfully" hiding the details.

So, its time to crack open SOS: In immediate window:

---8<---
.load sos
!u 791B69B0
--->8---

And I got this:

---8<---
Unmanaged code
791B69B0 F8               clc
791B69B1 ??               db          ffh
791B69B2 ??               db          ffh
791B69B3 FF0400           inc         dword ptr [eax+eax]
791B69B6 0000             add         byte ptr [eax],al
791B69B8 0100             add         dword ptr [eax],eax
791B69BA 0000             add         byte ptr [eax],al
791B69BC 0000             add         byte ptr [eax],al
791B69BE 0C02             or          al,2
791B69C0 1000             adc         byte ptr [eax],al
--->8---

There's something fishy going on here: this code isn't meaningfully
executable!

So, I looked up my dear friend App.ArithmeticSum:

---8<---
!name2ee Test.exe App.ArithmeticSum
Module: 00922c14 (Test.exe)
Token: 0x06000001
MethodDesc: 00922fd8
Name: App.ArithmeticSum(Int32, Double)
JITTED Code Address: 00de0100

!u 00de0100
Normal JIT generated code
App.ArithmeticSum(Int32, Double)
Begin 00de0100, size 41
>>> 00DE0100 55               push        ebp
00DE0101 8BEC             mov         ebp,esp
00DE0103 57               push        edi
00DE0104 56               push        esi
00DE0105 53               push        ebx
00DE0106 50               push        eax
00DE0107 DD4508           fld         qword ptr [ebp+8]
00DE010A 85C9             test        ecx,ecx
00DE010C 0F95C0           setne       al
00DE010F 0FB6C0           movzx       eax,al
00DE0112 85C0             test        eax,eax
00DE0114 7502             jne         00DE0118
00DE0116 EB21             jmp         00DE0139
00DE0118 894DF0           mov         dword ptr [ebp-10h],ecx
00DE011B DB45F0           fild        dword ptr [ebp-10h]
00DE011E DEC1             faddp       st(1),st
00DE0120 83EC08           sub         esp,8
00DE0123 DD1C24           fstp        qword ptr [esp]
00DE0126 49               dec         ecx
00DE0127 8B0528309200     mov         eax,dword ptr ds:[00923028h]
00DE012D 6A02             push        2
00DE012F 6A02             push        2
00DE0131 6A01             push        1
00DE0133 50               push        eax
00DE0134 E877691B79       call        79F96AB0 (JitHelp:
CORINFO_HELP_TAILCALL)
00DE0139 59               pop         ecx
00DE013A 5B               pop         ebx
00DE013B 5E               pop         esi
00DE013C 5F               pop         edi
00DE013D 5D               pop         ebp
00DE013E C20800           ret         8
--->8---

Here, note the LIE THAT VISUAL STUDIO TOLD!

---8<---
   call        79F96AB0 (JitHelp: CORINFO_HELP_TAILCALL)
--->8---

This address, 79F96AB0, is different from the one in VS's most excellent
diassembly view, 791B69B0.

A peek inside this method (which, as we can see from timings, must be
quite expensive and is thus probably pretty complex):

---8<---
!u 79F96AB0
Unmanaged code
79F96AB0 FF155012387A     call        dword ptr ds:[7A381250h]
79F96AB6 50               push        eax
79F96AB7 51               push        ecx
79F96AB8 52               push        edx
79F96AB9 F6058C44397AFF   test        byte ptr ds:[7A39448Ch],0FFh
79F96AC0 7409             je          79F96ACB
79F96AC2 F740045F000000   test        dword ptr [eax+4],5Fh
79F96AC9 7422             je          79F96AED
79F96ACB 68DDDDDDDD       push        0DDDDDDDDh
79F96AD0 68CCCCCCCC       push        0CCCCCCCCh--->8---
--->8---

Wonder what this is doing? A quick grep through the SSCLI 2.0 sources
gives this line:

---8<---
./inc/jithelpers.h:    JITHELPER(CORINFO_HELP_TAILCALL, JIT_TailCall
)
--->8---

Another grep for this JIT_TailCall gives this:

---8<---
./vm/i386/jithelp.asm:PUBLIC JIT_TailCall
--->8---

A peek inside this file gives this information (excerpted):

---8<---
       call    _GetThread  ; eax = Thread*
       push    eax         ; Thread*

       ; save ArgumentRegisters
       push    ecx
       push    edx

ExtraSpace      = 12    ; pThread, ecx, edx

       ; For GC stress, we always need to trip for GC
       test    _g_TailCanSkipTripForGC, 0FFh
       jz      TripForGC
       
       ; Trip for GC only if necessary
       test    dword ptr [eax+Thread_m_State], TS_CatchAtSafePoint_ASM
       jz      NoTripForGC

TripForGC:

; Create a MachState struct on the stack

; return address is already on the stack, but is separated from stack
; arguments by the extra arguments of JIT_TailCall. So we cant use it
directly

       push    0DDDDDDDDh

; Esp on unwind. Not needed as we it is deduced from the target method

       push    0CCCCCCCCh
--->8---

This looks like an exact match for the disassembled code above,
especially given the magic numbers pushed.

So, if one is looking for reasons why tail. call is slow, one must look
inside clr/src/vm/i386/jithelp.asm to see the work it is doing.

Here are some reasons:

* The JIT doesn't attempt in any way to optimize for the tail. call
case, so it doesn't generate machine code which would be compatible with
a simple JMP.

* The JIT_TailCall must work for all possible calls, in the presence of
exception propagation, and be tolerent of possible GCs.

The best hope for getting Microsoft to optimize this is for people to
complain that an important piece of software (i.e. language) that relies
on tail. call runs too slowly on the CLR.

-- Barry
Tasos Vogiatzoglou - 17 May 2006 14:14 GMT
Wonderful :) This was wonderful ...
Barry Kelly - 17 May 2006 15:23 GMT
> So, if one is looking for reasons why tail. call is slow, one must look
> inside clr/src/vm/i386/jithelp.asm to see the work it is doing.

This page hints at some reasons:

http://64.233.183.104/search?q=cache:r3DpWXcJOo0J:beta.blogs.msdn.com/shrib/atom
.aspx+JIT_TailCall&hl=en&ct=clnk&cd=2


-- Barry

Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.