.NET Forum / Languages / Managed C++ / October 2004
Size of int (once again, sorry)
|
|
Thread rating:  |
Agoston Bejo - 30 Sep 2004 08:52 GMT Hi, sorry about the multiple posting, technical difficulties....
-----
What does exactly the size of the int datatype depends in C++? Recenlty I've heard that it depends on the machine's type, i.e. on 16-bit machines it's 16 bit, on 32-bit machines it's 32 etc. Is this true? Is this true for _all_ C++ compilers?
Sigurd Stenersen - 30 Sep 2004 09:05 GMT > What does exactly the size of the int datatype depends in C++? > Recenlty I've heard that it depends on the machine's type, i.e. on > 16-bit machines it's 16 bit, on 32-bit machines it's 32 etc. > Is this true? Is this true for _all_ C++ compilers? No, that's wrong. If you run a 16-bit compiler on a 32-bit computer, int will typically be 16 bits.
In other words, it's up to the compiler to decide the size of an int.
 Signature Sigurd http://utvikling.com
Arnaud Debaene - 30 Sep 2004 15:57 GMT > Hi, sorry about the multiple posting, technical difficulties.... > [quoted text clipped - 4 lines] > machines it's 16 bit, on 32-bit machines it's 32 etc. > Is this true? Is this true for _all_ C++ compilers? An int is defined as being the "natural size" for the target machine, so a 32 bit compiler would have 32 bits ints, and so on.
Arnaud MVP - VC
Doug Harrison [MVP] - 30 Sep 2004 15:58 GMT >Hi, sorry about the multiple posting, technical difficulties.... > [quoted text clipped - 4 lines] >machines it's 16 bit, on 32-bit machines it's 32 etc. >Is this true? Is this true for _all_ C++ compilers? The following relationship holds for the four basic integer types:
sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)
There are also minimum ranges:
char >= 8 bits short >= 16 bits int >= 16 bits long >= 32 bits
Certain parts of the standard library become impossible to implement correctly if sizeof(char) == sizeof(int), so on most systems, sizeof(char) < sizeof(int). See <limits.h> for lots of interesting macros which describe the integer types for a given system.
Now, to answer your question, int is intended to map to the "natural" word size for the target architecture, and it's intended to be the first type you reach for when you need an integer type. So, a compiler targeting a 16 bit machine will typically define int as 16 bits, a compiler targeting a 32 bit machine will define int as 32 bits, and so on. Well, almost. A 16 bit int really is too small in many cases, but a 32 bit int is usually large enough. So to avoid wasting space with a 64 bit int, not to mention breaking programs that assume 32 bit int, compilers for 64-bit Windows break with tradition and keep int 32 bits. I think that's a reasonable decision, but one has to hope 64-bit CPU designers keep 32 bit ints nice and efficient. (It would certainly be in their interest to do so. :)
 Signature Doug Harrison Microsoft MVP - Visual C++
Walter Briscoe - 30 Sep 2004 16:49 GMT In message <v07ol0lok9danukateg5peu3urjgg3677q@4ax.com> of Thu, 30 Sep 2004 09:58:59 in microsoft.public.vc.language, "Doug Harrison [MVP]" <dsh@mvps.org> writes [snip]
>Certain parts of the standard library become impossible to implement >correctly if sizeof(char) == sizeof(int), so on most systems, sizeof(char) < >sizeof(int). See <limits.h> for lots of interesting macros which describe >the integer types for a given system. What parts and why?
 Signature Walter Briscoe
Doug Harrison [MVP] - 30 Sep 2004 17:00 GMT >In message <v07ol0lok9danukateg5peu3urjgg3677q@4ax.com> of Thu, 30 Sep >2004 09:58:59 in microsoft.public.vc.language, "Doug Harrison [MVP]" [quoted text clipped - 7 lines] > >What parts and why? Off the top of my head, fgetc and the <ctype.h> functions. They need to be able to distinguish EOF from char values, when stored in int. IOW, there needs to be an int value that doesn't correspond to any char value. (I think that might also be true for the C++ char_traits<char> specialization, but there you probably aren't restricted to just int; any standard integer type larger than char would do. I'd have to double-check the standard to be sure about that, though.)
 Signature Doug Harrison Microsoft MVP - Visual C++
Doug Harrison [MVP] - 30 Sep 2004 21:44 GMT >>In message <v07ol0lok9danukateg5peu3urjgg3677q@4ax.com> of Thu, 30 Sep >>2004 09:58:59 in microsoft.public.vc.language, "Doug Harrison [MVP]" [quoted text clipped - 7 lines] >> >>What parts and why? Note: Below I've replaced "char" with "unsigned char", which makes it right.
>Off the top of my head, fgetc and the <ctype.h> functions. They need to be >able to distinguish EOF from unsigned char values, when stored in int. IOW, there >needs to be an int value that doesn't correspond to any unsigned char value. An even better way to put it is this. In order for fgetc and <ctype.h> to work right, int has to be able to faithfully hold all values of unsigned char, plus EOF. Since all bit patterns of unsigned char are valid unsigned char values, this means sizeof(int) has to be greater than sizeof(char).
It's even messier with char_traits<char>, so I didn't try to fix that part, but the same consideration applies to its char_type and int_type types. See the subthread starting here for more on the aforementioned messiness:
http://groups.google.com/groups?selm=379b92df.19966375%40netnews.worldnet.att.net
Someone has recently proposed at least a partial way to clean this up:
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#467
 Signature Doug Harrison Microsoft MVP - Visual C++
Lucas Galfaso - 02 Oct 2004 15:15 GMT Hi,
> (...) Since all bit patterns of unsigned char are valid unsigned > char values, this means sizeof(int) has to be greater than sizeof(char). I believe that is somehow wrong, because C++ does not specify that every value that you can put into an unsigned char, must correspond to a character. I fact, I used to work with a C++ compiler that sizeof(char) == sizeof(long) and char was 64 bits long.
Lucas/
Doug Harrison [MVP] - 02 Oct 2004 18:14 GMT >Hi, > [quoted text clipped - 4 lines] >value that you can put into an unsigned char, must correspond to a >character. I'm talking about unsigned char, not "characters", in the context of fgetc and <ctype.h> functions.
>I fact, I used to work with a C++ compiler that sizeof(char) == sizeof(long) >and char was 64 bits long. Let me give you an example. For that compiler, we can use the relationship I posted earlier to deduce that sizeof(int) is the same as sizeof(char) and sizeof(long). The macro EOF is some integer constant expression, typically an int equal to -1. For the sake of argument, let's assume the usual two's complement signed integer representation. Then EOF has the representation with all bits set. Now consider the definition of fgetc. It reads unsigned chars and returns them as ints. All bit patterns of unsigned char are valid, so unless int can represent all of them, we've got a problem. But two's complement signed ints can do this, so that's not an issue. The problem is that fgetc returns EOF when it reaches end of file or encounters an error, and EOF is -1, which has all bits set in two's complement. Thus, there's no way to distinguish an EOF return value from an unsigned char in the file that had all bits set. So, like I said, a compiler that wants to implement fgetc needs sizeof(int) > sizeof(char).
I'm curious how the compiler you described dealt with this. Perhaps it was a freestanding implementation, one that doesn't supply all the standard library, in particular the problematic <ctype.h> and <stdio.h>.
 Signature Doug Harrison Microsoft MVP - Visual C++
Lucas Galfaso - 02 Oct 2004 19:25 GMT >>Hi, >> [quoted text clipped - 34 lines] > freestanding implementation, one that doesn't supply all the standard > library, in particular the problematic <ctype.h> and <stdio.h>. I am not 100% sure (this was +5 years ago) but I think we do not have fgetc, just the C++ libs. Anyway the lack of this library does not make the compiler not standard.
Doug Harrison [MVP] - 02 Oct 2004 19:59 GMT >I am not 100% sure (this was +5 years ago) but I think we do not have fgetc, >just the C++ libs. Anyway the lack of this library does not make the >compiler not standard. True. Like I said:
<q> Certain parts of the standard library become impossible to implement correctly if sizeof(char) == sizeof(int), so on most systems, sizeof(char) < sizeof(int). </q>
<q> So, like I said, a compiler that wants to implement fgetc needs sizeof(int) > sizeof(char).
I'm curious how the compiler you described dealt with this. Perhaps it was a freestanding implementation, one that doesn't supply all the standard library, in particular the problematic <ctype.h> and <stdio.h>. </q>
Those headers are required by a hosted C++ implementation, i.e. what people normally think of as Standard C++, but not by freestanding implementations, which you might encounter in embedded environments.
 Signature Doug Harrison Microsoft MVP - Visual C++
Bo Persson - 02 Oct 2004 22:18 GMT >>Hi, >> [quoted text clipped - 42 lines] > implement > fgetc needs sizeof(int) > sizeof(char). No, that is not a requirement. If the char type is large enough (like CHAR_BITS == 32), it can't possibly use all bit patterns for valid characters. One pattern can then be reserved for EOF.
That is exactly what we generally do for wchar_t and WEOF.
Bo Persson
Doug Harrison [MVP] - 02 Oct 2004 22:53 GMT >>So, like I said, a compiler that wants to >> implement fgetc needs sizeof(int) > sizeof(char). > >No, that is not a requirement. Yes, it is.
>If the char type is large enough (like >CHAR_BITS == 32), it can't possibly use all bit patterns for valid >characters. One pattern can then be reserved for EOF. Please, review the documentation for fgetc and <ctype.h>. They deal in unsigned char cast to int. (And why would plain char be restricted as you describe? What if plain char is unsigned?)
>That is exactly what we generally do for wchar_t and WEOF. I'm not familiar with those details so won't comment.
 Signature Doug Harrison Microsoft MVP - Visual C++
Bo Persson - 03 Oct 2004 09:57 GMT >>>So, like I said, a compiler that wants to >>> implement fgetc needs sizeof(int) > sizeof(char). [quoted text clipped - 11 lines] > you > describe? What if plain char is unsigned?) But if char, unsigned char, and int are all the same size (like 32 bits), not all values will be used for characters so there is room for reserving one value for EOF.
>>That is exactly what we generally do for wchar_t and WEOF. > > I'm not familiar with those details so won't comment. Ok, that is C++ where wide the character type wchar_t uses the value wchar_t(-1) as the end-of-file signal WEOF for wide streams.
Bo Persson
Doug Harrison [MVP] - 03 Oct 2004 18:56 GMT >>>>So, like I said, a compiler that wants to >>>> implement fgetc needs sizeof(int) > sizeof(char). [quoted text clipped - 14 lines] >But if char, unsigned char, and int are all the same size (like 32 >bits) Note that char and unsigned char are always the same size. For any integer type X, signed X and unsigned X are the same size, and for the type char, plain char is a distinct type otherwise implemented the same as signed char or unsigned char.
>not all values will be used for characters so there is room for >reserving one value for EOF. There's no basis for saying that.
At any rate, it's irrelevant for fgetc and <ctype.h>. Did you review their documentation? Here's a proof of what I've been saying:
1. fgetc reads unsigned chars and returns them as ints. 2. For character types, all bits in the object representation participate in the value representation. 3. All bit patterns of unsigned char are valid numbers. 4. Thus, if unsigned char and int are the same size, unsigned char will necessarily use up all the values of int. 5. Therefore, there is no int value left over to represent EOF. 6. One cannot implement fgetc, <ctype.h>, and a handful of other library functions if char and int are the same size, because the int value EOF cannot be distinguished from a valid unsigned char represented by int, and those functions require that distinction.
>>>That is exactly what we generally do for wchar_t and WEOF. >> >> I'm not familiar with those details so won't comment. > >Ok, that is C++ where wide the character type wchar_t uses the value >wchar_t(-1) as the end-of-file signal WEOF for wide streams. Actually, WEOF appears in the Standard C header <wchar.h>, and it has the type wint_t, which is not necessarily wchar_t:
http://www.lysator.liu.se/c/na1.html <q> typedef ... wint_t; An integral type unchanged by integral promotion. It must be capable of holding every valid wide character, and also the value WEOF (described below). It can be the same type as wchar_t.
WEOF is an objectlike macro which evaluates to a constant expression of type wint_t. It need not be negative nor equal EOF, but it serves the same purpose: the value, which must not be a valid wide character, is used to represent an end of file or as an error indication. </q>
This is explicitly spelled out compared to the situation with unsigned char, int, EOF, fgetc, and <ctype.h>, for which you have to synthesize the constraints from more fundamental properties.
 Signature Doug Harrison Microsoft MVP - Visual C++
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|