Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / Managed C++ / May 2005

Tip: Looking for answers? Try searching our database.

alignment is lost with openmp

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Gabest - 13 May 2005 19:01 GMT
I really like the new openmp implementation of vc, but I ran into a
something which may be a bug. Please see the following little example:

void funct()
{
 __declspec(align(16)) BYTE buff[16*16];
 #pragma omp parallel private(buff) num_threads(2)
 {
   #pragma omp for
   for(int y = 0; y < height; y += 16)
   {
     // here a few instructions involving buff and sse2 intrinsics which
require 16 byte alignment
   }
 }
}

If I print the base address of buff inside the loop I can see that it has
lost its alignment and of course it crashes a little later there.
Carl Daniel [VC++ MVP] - 13 May 2005 22:54 GMT
> I really like the new openmp implementation of vc, but I ran into a
> something which may be a bug. Please see the following little example:
[quoted text clipped - 15 lines]
> If I print the base address of buff inside the loop I can see that it
> has lost its alignment and of course it crashes a little later there.

Please post a bug report with a complete repro case to
http://lab.msdn.microsoft.com/productfeedback/

-cd
Gabest - 13 May 2005 23:53 GMT
Done.

http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=06205
24a-2821-4ed5-a82a-32821fc49691

Carl Daniel [VC++ MVP] - 14 May 2005 00:16 GMT
> Done.
>
> http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=06205
24a-2821-4ed5-a82a-32821fc49691

You might have taken the time to post a complete repro (there's no standard
include file named stdafx.h) or to actually fill out all the fields...

In any case, I'm unable to reproduce the problem with your sample.  How
'bout a few more repro steps, such as the exact command-line arguments to
the compiler?

Here's what I get:

C:\Pub\Dev\cppbugs>cl -MD -arch:SSE2 -openmp ompalign0513.cpp
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50215.44 for
80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

ompalign0513.cpp
Microsoft (R) Incremental Linker Version 8.00.50215.44
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:ompalign0513.exe
ompalign0513.obj

C:\Pub\Dev\cppbugs>ompalign0513
thread_num=0, i=0, buff=0012FB90
thread_num=2, i=8, buff=00AEFE30
thread_num=3, i=12, buff=00BEFE30
thread_num=3, i=13, buff=00BEFE30
thread_num=0, i=1, buff=0012FB90
thread_num=0, i=2, buff=0012FB90
thread_num=0, i=3, buff=0012FB90
thread_num=3, i=14, buff=00BEFE30
thread_num=2, i=9, buff=00AEFE30
thread_num=2, i=10, buff=00AEFE30
thread_num=3, i=15, buff=00BEFE30
thread_num=1, i=4, buff=009EFE30
thread_num=1, i=5, buff=009EFE30
thread_num=2, i=11, buff=00AEFE30
thread_num=1, i=6, buff=009EFE30
thread_num=1, i=7, buff=009EFE30

-cd
Gabest - 14 May 2005 02:19 GMT
> You might have taken the time to post a complete repro (there's no
> standard include file named stdafx.h) or to actually fill out all the
> fields...

That was the auto generated precompiled header file. Pretty "standard" in
visual c, every new project gets that automagically. This one was a console
application with the default settings, I have only changed openmp support to
yes.

> In any case, I'm unable to reproduce the problem with your sample.  How
> 'bout a few more repro steps, such as the exact command-line arguments to
> the compiler?

As I said, the default settings were used, except the /openmp switch of
course. But if you insist, here is the command line of the debug build:

/Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm
/EHsc /RTC1 /MTd /GS- /openmp /Yu"stdafx.h" /Fp"Debug\omptest.pch"
/Fo"Debug\\" /Fd"Debug\vc80.pdb" /W3 /nologo /c /Wp64 /ZI /TP
/errorReport:prompt

/OUT:"Debug\omptest.exe" /INCREMENTAL /NOLOGO /MANIFEST:NO /DEBUG
/PDB:"i:\Progs\omptest\omptest\Debug\omptest.pdb" /SUBSYSTEM:CONSOLE
/MACHINE:X86 /ERRORREPORT:PROMPT   kernel32.lib user32.lib gdi32.lib
winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
uuid.lib odbc32.lib odbccp32.lib

> Here's what I get:
> ...

Try to change the code around a bit or run it several times, eventually you
will see non 0 ending pointers.
Carl Daniel [VC++ MVP] - 14 May 2005 03:38 GMT
>> You might have taken the time to post a complete repro (there's no
>> standard include file named stdafx.h) or to actually fill out all the
[quoted text clipped - 4 lines]
> This one was a console application with the default settings, I have
> only changed openmp support to yes.

I normally do repros as stand-alone CPP files compiled from the command-line
with the minimal options necessary to elicit the bug.  Old habit :)

>> In any case, I'm unable to reproduce the problem with your sample. How
>> 'bout a few more repro steps, such as the exact command-line
[quoted text clipped - 13 lines]
> winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib
> oleaut32.lib uuid.lib odbc32.lib odbccp32.lib

Ah Ha!

It's /ZI that does it.  I'll add a note to your bug report to that effect.
Thanks for the details.

-cd
Gabest - 14 May 2005 05:52 GMT
> Ah Ha!
>
> It's /ZI that does it.  I'll add a note to your bug report to that effect.
> Thanks for the details.

Well, I don't think that switch does it. Just tried the release build as
well with /Zi, then completly disabled the generation of debug info too, the
alignment was still wrong sometimes.

This is how it builds now for the release conofiguration, but I don't think
any of the switches can affect this problem.

/O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /FD
/EHsc /MT /openmp /Yu"stdafx.h" /Fp"Release\omptest.pch" /Fo"Release\\"
/Fd"Release\vc80.pdb" /W3 /nologo /c /Wp64 /TP /errorReport:prompt

/OUT:"Release\omptest.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST
/MANIFESTFILE:"Release\omptest.exe.intermediate.manifest" /DEBUG
/PDB:"i:\progs\omptest\omptest\release\omptest.pdb" /SUBSYSTEM:CONSOLE
/OPT:REF /OPT:ICF /MACHINE:X86 /ERRORREPORT:PROMPT   kernel32.lib user32.lib
gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib
oleaut32.lib uuid.lib odbc32.lib odbccp32.lib
Carl Daniel [VC++ MVP] - 14 May 2005 06:04 GMT
>> Ah Ha!
>>
[quoted text clipped - 7 lines]
> This is how it builds now for the release conofiguration, but I don't
> think any of the switches can affect this problem.

You're right, it's not as simple as just -ZI.  -Zi also seems to be
sufficient, as well as -O2.  In any case, having at least one concrete repro
will allow someone to triage the bug.  Watch it on product feedback center -
you should see something within a few days.

-cd
Gabest - 14 May 2005 06:56 GMT
> You're right, it's not as simple as just -ZI.  -Zi also seems to be
> sufficient, as well as -O2.  In any case, having at least one concrete
> repro will allow someone to triage the bug.  Watch it on product feedback
> center - you should see something within a few days.

Just stepped through the assembly and found something strange after I
tailored the code to the following. (also turned of security check to not
get in to the way). My comments are inlined.

__declspec(align(16)) BYTE buff[1234];

#pragma omp parallel private(buff) num_threads(2)
{
 #pragma omp for
 for(int i = 0; i < 2; i++)
 {
   _mm_store_si128((__m128i*)&buff[16], _mm_setzero_si128());
 }
}

When the debugger reached this multi-threaded code then I saw this:

#pragma omp for
for(int i = 0; i < 2; i++)
00401000 push ebp
00401001 mov ebp,esp
00401003 and esp,0FFFFFFF0h
00401006 sub esp,4E0h

buff was just aligned to 16 bytes, fine!

0040100C lea eax,[esp]
0040100F push eax
00401010 lea ecx,[esp+8]
00401014 push ecx
00401015 push 1
00401017 push 1
00401019 push 1
0040101B push 0
0040101D call _vcomp_for_static_simple_init (40710Ah)

It pushed 6 arguments on stack (esp -= 0x18) and called
_vcomp_for_static_simple_init.

00401022 mov ecx,dword ptr [esp+1Ch]
00401026 mov eax,dword ptr [esp+18h]
0040102A add esp,18h

Looks like it is cleaning up the stack, esp += 0x18

0040102D cmp ecx,eax
0040102F jg wmain$omp$1+4Bh (40104Bh)
00401031 sub eax,ecx
00401033 pxor xmm0,xmm0
00401037 add eax,1
0040103A lea ebx,[ebx]
00401040 sub eax,1
#include <windows.h>
#include <omp.h>
#include <xmmintrin.h>
#include <emmintrin.h>
int _tmain(int argc, _TCHAR* argv[])
{
__declspec(align(16)) BYTE buff[1234];
buff[0] = 0;
#pragma omp parallel private(buff) num_threads(2)
{
#pragma omp for
for(int i = 0; i < 2; i++)
{
_mm_store_si128((__m128i*)&buff[16], _mm_setzero_si128());
00401043 movdqa xmmword ptr [esp+18h],xmm0

Storing at "esp+18h" ???? esp is aligned to 16 bytes right now
(esp=0x0012fa00), what's that +18h doing there? Also, this 18h looks
familiar, could it be a coincidence?

00401049 jne wmain$omp$1+40h (401040h)
{
#pragma omp for
for(int i = 0; i < 2; i++)
0040104B call _vcomp_for_static_end (407104h)
00401050 call _vcomp_barrier (4070FEh)
}
00401055 mov esp,ebp
00401057 pop ebp
00401058 ret

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.