Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / CLR / January 2007

Tip: Looking for answers? Try searching our database.

CLR, Value/Ref types, and the thread stack/heap

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Anthony Paul - 10 Jan 2007 13:10 GMT
Hello everyone,

Would anyone happen to know how the CLR determines what types are on
the stack? Specifically, if I have the following code :

void main
{
Object o = new Object(); // class
Point p = new Point(0, 0); // structure
...
}

My understanding is that what occurs should be the following (assuming
that this is the first block of code that the CLR encounters) :

// init
1) Determines the types in the block of code, in this case Object and
Point.
2) Creates the Object Type and Point Type on the heap.
3) Allocates memory for the Object and Point variables on the thread
stack.
4) Compiles block of code.

// execution
4) Creates an instance of the Object type (ie. an object of type
Object) and places it on the heap. The internal Type object pointer is
set to point to the Object Type (which was created in step 2.) and the
address of this new instance is saved in the o variable on the thread
stack.
5) Creates an instance of the Point type (ie. a struct of type Point)
and places it on the thread stack (calling the constructor to init the
x & y values to 0, etc...)

Here is where I'm confused... I assumed that in order for the CLR to
determine what o is, it would go to the instance that o pointed to on
the heap, check the 'Type object' pointer and voila! However, since the
'Type object' pointer only exists for reference types, I was wondering
how it determined the type of a value type since value types don't have
the 'Type object' pointer or sync block fields. Then I began to wonder
how the CLR was able to differentiate between the stack variables in
the first place; that is, how does it know that the o variable is a
reference and that p is a value type?

I don't think that my original assumption was correct... the CLR can't
possibly determine the type of the o variable via the 'Type object'
pointer of the instance that it's pointing to because that would not
allow for polymorphism, so somehow this information must be saved on
the stack or elsewhere.

Would one of you gurus clarify this for me or point me to a FAQ or site
that would have an answer?

Thanks!

Anthony
Barry Kelly - 10 Jan 2007 14:35 GMT
> Would anyone happen to know how the CLR determines what types are on
> the stack?

It comes from the MSIL instructions. Read the CLI spec for exhaustive
details, but simple examples:

* ldfld takes a field token, so the type is described by the type of the
field
* ldloc takes a local variable index, so the type is described by the
list of declared locals for that method
* similarly, ldarg takes an argument index, and the type is in the
declared parameters for that method
* call takes a method token, so the type is described by the return type
on the method
* operators like add, sub, etc. push types depending on the argument
types, so the IL needs to be analysed to determine the type. For
example, pushing two I4 (Int32) followed by an 'add' will result in an
I4 on the stack. Similarly for doubles, longs, etc.

> Specifically, if I have the following code :
>
> void main
> {
> Object o = new Object(); // class

This news up an object and stores it in a local. In fact, this is the
code produced:

   // this guy puts a System.Object on the stack, because that's the
   // constructor called
   IL_0001:  newobj     instance void [mscorlib]System.Object::.ctor()

   // and this guy pops it off the stack and stores it in local 0.
   IL_0006:  stloc.0

> My understanding is that what occurs should be the following (assuming
> that this is the first block of code that the CLR encounters) :

Don't confuse the CLR with the C# compiler. The CLR doesn't encounter
C#; the C# compiler converts it into MSIL, and the CLR only encounters
MSIL.

> // init
> 1) Determines the types in the block of code, in this case Object and
> Point.

Yes, a single scan through a method (in MSIL) is enough to validate the
code, but the CLR generates object code when it sees the method. This
means it could be doing a lot more, such as constructing trees / dags
(directed acyclic graphs, which occur esp. when instructions like 'dup'
are used) from the MSIL instruction stream, and then passing those
structures (which are AST-like) to a more traditional compiler back end.

See an introductory compiler text for more details on that.

> 2) Creates the Object Type and Point Type on the heap.

By 'Object Type', do you mean the reference that you get when you
evaluate 'typeof(Object)'? If so, then no, that's not necessary (unless
you evaluate that expression).

What the CLR does have for each type in memory is a method table, which
is analagous to a vtable in COM or C++ parlance. The first hidden
'field' in the memory for instances on the heap (both reference types
and boxed value types) is a pointer to this method table. These guys may
get allocated lazily or not, depending on how the runtime is written (I
don't know or particularly care about the details).

However, you don't ever need to know this in correct, portable,
verifiable, safe etc. C# code.

> 3) Allocates memory for the Object and Point variables on the thread
> stack.

Allocation of local variables can only occur at execution time, not
initialization time. So, I'll interpret this as asking what happens at
execution time.

When the 'newobj' instruction executes, it allocates cleared memory from
the GC, fills in the method table pointer, and passes it off to the
constructor named in the immediate argument to the 'newobj' instruction.
The 'return value' of this constructor is a reference to the initialized
object on the GC heap, and it's this that's stored *conceptually* on the
MSIL stack.

However, the CLR generates assembly code for MSIL, it does not directly
execute MSIL. Thus, the return value from the constructor is most
probably in a register. Also, the local variable 'o' may have been
allocated a register; thus, the instruction immediately following the
call to the Object::.ctor() constructor (which may have been inlined,
don't forget optimizations etc.) will probably be a simple move, to copy
the value from one register to another.

Point is different, because it's a value type. Value types typically get
space allocated on the stack, and get initialized by loading a pointer
to the location and passing this pointer to the constructor, or
alternatively using 'initobj' to zero out the memory. The MSIL for your
code is this:

   // This guy loads the address of V_1, which is the local variable
   // for the point.
   IL_0007:  ldloca.s   V_1

   // And this guy both zeros out the memory for V_1.
   // This is because default constructors of value types always
   // zero out memory - you can't declare a default constructor (one
   // taking no arguments) on a value type.
   IL_0009:  initobj    [System.Drawing]System.Drawing.Point

BTW, you can use 'ildasm' (part of the .NET SDK) to disassemble any .NET
executable, to see the MSIL code generated.

> 4) Compiles block of code.

Now you've got things backwards. The compilation happens at
initialization time, when a method is first called (if JIT is
happening), or possibly earlier if Pre-JIT of some kind is being used.

The steps are like this:

1) C# code in test.cs
2) call 'csc test.cs' to produce 'test.exe'

*** ship to user

3) run test.exe
  At this point, the CLR may do some initialization. However, many
  methods aren't compiled until first called.
4) method is called (for the first time). The CLR does some analysis and
produces machine code. This is the just in time part of JIT compilation.
On subsequent calls, the machine code is already produced, and is called
directly.
5) method is executed.

> // execution
> 4) Creates an instance of the Object type (ie. an object of type
[quoted text clipped - 5 lines]
> and places it on the thread stack (calling the constructor to init the
> x & y values to 0, etc...)

I covered these guys in the 'allocation of memory' bit above; memory
can't be allocated on the stack until the code is executing, because the
top of the stack keeps moving around as methods are called and returned
from.

> Here is where I'm confused... I assumed that in order for the CLR to
> determine what o is,

The type of 'o' is in the metadata in the assembly on disk. It's
described in the table of local variables at the start of the metadata
for the method.

> it would go to the instance that o pointed to on
> the heap, check the 'Type object' pointer and voila! However, since the
> 'Type object' pointer only exists for reference types, I was wondering
> how it determined the type of a value type since value types don't have
> the 'Type object' pointer or sync block fields.

Thus your question is answered: the CLR doesn't do this dynamically,
it's not like Python or Ruby or similar. It's all determined statically.

> Then I began to wonder
> how the CLR was able to differentiate between the stack variables in
> the first place; that is, how does it know that the o variable is a
> reference and that p is a value type?

Again, this is answered: the C# compiler, in producing the assembly's
metadata, writes out the types of all locals as part of the method.

> I don't think that my original assumption was correct... the CLR can't
> possibly determine the type of the o variable via the 'Type object'
> pointer of the instance that it's pointing to because that would not
> allow for polymorphism, so somehow this information must be saved on
> the stack or elsewhere.

> Would one of you gurus clarify this for me or point me to a FAQ or site
> that would have an answer?

If you want to know all the details:

* Read the CLI specification (ECMA-335, Google will find it)
* Write test programs in C#, disassemble with ildasm, fiddle, reassemble
with ilasm, learn by experimentation.
* Even better: write a compiler using System.Reflection.Emit. It's not
 difficult.

-- Barry

Signature

http://barrkel.blogspot.com/

Ben Voigt - 10 Jan 2007 15:48 GMT
>> it would go to the instance that o pointed to on
>> the heap, check the 'Type object' pointer and voila! However, since the
>> 'Type object' pointer only exists for reference types, I was wondering
>> how it determined the type of a value type since value types don't have
>> the 'Type object' pointer or sync block fields.
Either the variable is a value type, in which case the JITter finds the type
from the metadata, or else the variable is type object and points to a boxed
value type, in which case the boxing overhead contains that "Type object
pointer".  Of course, as Barry has already said, the in memory-structure of
what you call a "Type object" is a virtual dispatch table, plus I suspect
the metadata token is stored as well, in order to find the full metadata.

Since GetType is a method on System.Object, a reference type, it looks as if
the variable is boxed before calling... but the JIT compiler recognizes this
specific case and will strip away the need for boxing and virtual dispatch.
Sometimes MSIL can lie to you, you really have to read the resulting native
code as well.

> Thus your question is answered: the CLR doesn't do this dynamically,
> it's not like Python or Ruby or similar. It's all determined statically.
True for typeof, not for .GetType().

>> Then I began to wonder
>> how the CLR was able to differentiate between the stack variables in
[quoted text clipped - 3 lines]
> Again, this is answered: the C# compiler, in producing the assembly's
> metadata, writes out the types of all locals as part of the method.
Barry Kelly - 10 Jan 2007 15:59 GMT
> > > Here is where I'm confused... I assumed that in order for the CLR to
> > > determine what o is,
[quoted text clipped - 13 lines]
>
> True for typeof, not for .GetType().

The question starts out with 'to determine what o is', rather than 'what
the value referred to by o is', so that's the implied context of my
answer here.

However, that context might not be have been obvious, so thanks for
pointing out that the CLR does indeed support polymorphism, thus the
need for GetType() etc. :)

-- Barry

Signature

http://barrkel.blogspot.com/

Anthony Paul - 10 Jan 2007 16:22 GMT
Hello Barry,

Thanks for answering, I really appreciate your time!

> > Would anyone happen to know how the CLR determines what types are on
> > the stack?
[quoted text clipped - 14 lines]
> example, pushing two I4 (Int32) followed by an 'add' will result in an
> I4 on the stack. Similarly for doubles, longs, etc.

Okay, so I take this to mean that the compiler generates IL code that
already contains the necessary information (via the metadata you point
out later on) needed to identify the types on the stack. This makes
sense and was my gut feeling, but since most everything I was reading
referred to dynamic resolution (vs static) I was confused.

> > Specifically, if I have the following code :
> >
[quoted text clipped - 11 lines]
>     // and this guy pops it off the stack and stores it in local 0.
>     IL_0006:  stloc.0

I take it that by 'local' you mean a register, and by that I mean a
software equivalent of a CPU register.

> > My understanding is that what occurs should be the following (assuming
> > that this is the first block of code that the CLR encounters) :
>
> Don't confuse the CLR with the C# compiler. The CLR doesn't encounter
> C#; the C# compiler converts it into MSIL, and the CLR only encounters
> MSIL.

Sorry, I should have been clear. I am aware that the compiler generates
IL code which is what's being executed; however, I don't know how to
write IL yet and so I used the c# code as a viable representation. That
is, any c# code in my example should be assumed to represent the actual
IL code.

> > // init
> > 1) Determines the types in the block of code, in this case Object and
[quoted text clipped - 8 lines]
>
> See an introductory compiler text for more details on that.

Yes this is definitely deep as you seem to have intimate knowledge of
compiler theory. I am currently going through the Wintellect book "C#
via the CLR" which so far is an excellent resource which has led to
invaluable insights but also to questions that it doesn't answer such
as the one I posed this morning. Since my goal is to become an exper on
these subjects I definitely plan on supplementing this book with the
"Expert .NET 2.0 IL Assembler" book I recently purchased. However, this
thread is making me realize that I may have to temporarily stop reading
the "C# via the CLR" book until I thoroughly familiarize myself with
IL. What do you suggest, intimate knowledge of IL before CLR or
vice-versa?

> > 2) Creates the Object Type and Point Type on the heap.
>
> By 'Object Type', do you mean the reference that you get when you
> evaluate 'typeof(Object)'? If so, then no, that's not necessary (unless
> you evaluate that expression).

What I mean by the 'Object Type' is the 'instance' of the 'type'
representing that class of which there is only one instance of and
which would be common to ALL instances of that class. I believe this
Type 'instance' (and this is now coming off the top of my head) would
represent the instance that contains the values of the Type's static
fields. (?)

class MyClass
{
public static int MyStaticField = 0;
}

...

MyClass.MyStaticField++;

Given the above code, there are no actual instances of the 'MyClass'
type (ie. no object allocated on the heap), but there MUST be an
instance of the 'MyClass Type' somewhere, and from my understanding
there is, and it's allocated on the heap only once for each type. This
is what I meant by the 'Object Type'. Notice that I capitalize the T to
make it easier to distinguish.

> > 3) Allocates memory for the Object and Point variables on the thread
> > stack.
>
> Allocation of local variables can only occur at execution time, not
> initialization time. So, I'll interpret this as asking what happens at
> execution time.

What I meant by //init and //execution was (to my understanding) :

All steps under //init are what I believe to be the process through
which the CLR goes through when it first encounters a block of IL code
in order to initialize its internal structures for the eventual JIT of
the IL block and execution. This includes allocating memory on the heap
and stack for the different types and variables it encountered during
it's initial scan of the IL block.

All steps under //execution are what I believe to be the process
through which the actual execution of the JIT'd code (native code) goes
through.

> When the 'newobj' instruction executes, it allocates cleared memory from
> the GC, fills in the method table pointer, and passes it off to the
[quoted text clipped - 10 lines]
> don't forget optimizations etc.) will probably be a simple move, to copy
> the value from one register to another.

> Point is different, because it's a value type. Value types typically get
> space allocated on the stack, and get initialized by loading a pointer
[quoted text clipped - 14 lines]
> BTW, you can use 'ildasm' (part of the .NET SDK) to disassemble any .NET
> executable, to see the MSIL code generated.

*nod*

> > 4) Compiles block of code.
>
[quoted text clipped - 17 lines]
> directly.
> 5) method is executed.

Correct. Again my fault, I should have been clear. I should have
specified JIT when I wrote step 4 to avoid confusion. It should read :

4) JIT compiles block of IL code (to native instructions).

> > // execution
> > 4) Creates an instance of the Object type (ie. an object of type
[quoted text clipped - 10 lines]
> top of the stack keeps moving around as methods are called and returned
> from.

Correct, and steps 4 and 5 are meant to represent what occurs during
the actual exection of the code. So step 4 is meant to represent what
occurs when executing the native instructions pertaining to "Object o =
new Object()" and step 5 to represent "Point p = new Point(0, 0)".

I should have written my initial post better to avoid confusion.

> > Here is where I'm confused... I assumed that in order for the CLR to
> > determine what o is,
>
> The type of 'o' is in the metadata in the assembly on disk. It's
> described in the table of local variables at the start of the metadata
> for the method.

!!! ...

> > it would go to the instance that o pointed to on
> > the heap, check the 'Type object' pointer and voila! However, since the
[quoted text clipped - 4 lines]
> Thus your question is answered: the CLR doesn't do this dynamically,
> it's not like Python or Ruby or similar. It's all determined statically.

... Eureka! This is the missing piece I didn't get from the book.

> > Then I began to wonder
> > how the CLR was able to differentiate between the stack variables in
[quoted text clipped - 25 lines]
> --
> http://barrkel.blogspot.com/

Barry, thank you very much for your verbose and well formulated
reply... the missing piece was, of course, the metadata.

I did download the PDF for ecma-335 and looked briefly through the
index but didn't find what I was looking for because 1) I wasn't
exactly sure what I was looking for and 2) I looked in the wrong place.

Besides those two books that I am studying, do you have any other
recommendations? My goal is to thoroughly familiarize myself with the
.NET internals relating to the CLR, perhaps you have some suggestions.

Cheers!

Anthony
Barry Kelly - 10 Jan 2007 17:45 GMT
> >     // this guy puts a System.Object on the stack, because that's the
> >     // constructor called
[quoted text clipped - 5 lines]
> I take it that by 'local' you mean a register, and by that I mean a
> software equivalent of a CPU register.

Actually, I have three mental models in my head:
1) The C# model
2) The CLI abstract machine (which has a stack, and no registers)
3) The native CPU's machine code / assembler

When I'm talking about MSIL (IL, CIL - all words for the same thing),
I'm talking about the CLI abstract machine. (CLR is an implementation of
CLI, but not to be confused with CLS, or CTS :)

So here, I mean local 0, as in, the 0th entry in the table of local
variables at the start of the method.

> I don't know how to
> write IL yet and so I used the c# code as a viable representation. That
> is, any c# code in my example should be assumed to represent the actual
> IL code.

.NET Reflector (Google), ildasm and ilasm, combined with C# test
programs are the best way to learn about this.

.NET Reflector's disassembly views have tooltips on every MSIL
instruction explaining briefly what the instruction does. .NET Reflector
ought to be distributed with the SDK, IMHO - the guy who wrote it works
for MS.

> What do you suggest, intimate knowledge of IL before CLR or
> vice-versa?

I think:

1) Know well how the underlying machine itself works, in an unmanaged
language like Delphi (disclaimer: I work for CodeGear), C or C++, or
even assembler. Delphi, in my biased opinion, is probably the easiest
native language out there with the power of C etc. (Probably you know
all this, but I write just in case.)

2) Get very familiar the C# language and its semantics.

3) When very familiar with C#, see how C# translates to MSIL, and
understand how the MSIL relates to the underlying machine (and this
requires CLR knowledge). See also my final comment.

So basically, the approach is a pincer movement: attack from the top
with high-level knowledge of C#, and from the bottom with low-level
knowledge of the CPU & memory etc.

> What I mean by the 'Object Type' is the 'instance' of the 'type'
> representing that class of which there is only one instance of and
> which would be common to ALL instances of that class. I believe this
> Type 'instance' (and this is now coming off the top of my head) would
> represent the instance that contains the values of the Type's static
> fields. (?)

There are structures which describe the type, yes: the root one in
memory for the CLR is called EEClass. You can find out lots more with
the SSCLI source code, and by playing with WinDbg and SOS (see final
comment).

> Besides those two books that I am studying, do you have any other
> recommendations? My goal is to thoroughly familiarize myself with the
> .NET internals relating to the CLR, perhaps you have some suggestions.

Apart from what I said earlier about the pincer movement:

*) Read blogs, especially the archives, such as:
- "Advanced .NET debugging"
- Rico Mariani's blog
- Mike Stall's blog

These guys all come up on Google.

*) Get familiar with WinDbg and SOS (Advanced .NET Debugging has entries
on this in its archives IIRC)

*) Be sure to configure symbol server properly when using WinDbg:

 SRV*c:\symbol-cache*http://msdl.microsoft.com/download/symbols

... and you'll find that the class and function names in the public
symbols largely correspond with the actual code found in SSCLI:

 http://msdn.microsoft.com/net/sscli/

This is a long apprenticeship for guru-dom, though - you may not need to
get this deep - the road is long, yet interesting, if you have the
curiosity for it!

Good luck.

-- Barry

Signature

http://barrkel.blogspot.com/


Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.