.NET Forum / .NET Framework / New Users / May 2004
All Your Thread Are Belong to Us
|
|
Thread rating:  |
C# Learner - 30 May 2004 04:23 GMT Hi all,
I have a nasty bug in my multi-threaded app which I think is being caused by "race conditions" with multiple threads. I'm getting data corruption.
The app is an HTTP server I'm making just for fun. I noticed the bug yesterday night when performing a little stress testing on it. When under stress, I find that a particular variable sometimes has an incorrect value.
I'm not quite a threading expert, so I may be making a stupid mistake...
How can I go about getting help with this? I'd probably have to give someone the whole project to get help, so they can observe the problems themselves, I guess...
Daniel O'Connell [C# MVP] - 30 May 2004 04:40 GMT > Hi all, > [quoted text clipped - 11 lines] > someone the whole project to get help, so they can observe the problems > themselves, I guess... First and foremost, is that variable accessible from *one* place or is it directly accessed in multiple places? If its the latter and the variable isn't used as an out or ref parameter, you might be able to turn the variable into a property temporarily and input some logging to figure out why things are going wrong as well as placing a breakpoint there.
Its probably a race condition, but it could also be some subtle bug on a rarely crossed codepath you havn't noticed yet(such as a casing difference causing an invalid assignment, etc).
C# Learner - 30 May 2004 06:41 GMT > [...] Hi Daniel,
Thanks for the reply -- it made me take a step back and think at a higher level.
After several hours, I've finally fixed the two bugs that were causing the problems.
The first bug was that I was calling ArrayList.Remove instead of ArrayList.RemoveAt. I was passing an 'int' as a parameter to Remove, when the ArrayList was holding only references to instances of class Connection. Doh!
I'm kicking myself now because of this. What seems weird here, though, is that ArrayList.Remove doesn't throw an exception when the passed reference doesn't exist in the list.
The other bug was a race condition which I solved by locking a particular block of code that I hadn't given enough prior thought to.
I appreciated your reply.
I'm gonna have to let my brain rest for awhile now.
See ya.
Daniel O'Connell [C# MVP] - 30 May 2004 06:51 GMT >> [...] > [quoted text clipped - 10 lines] > the ArrayList was holding only references to instances of class > Connection. Doh! Ooops, Generics would have helped in this case, ;).
> I'm kicking myself now because of this. What seems weird here, though, is > that ArrayList.Remove doesn't throw an exception when the passed reference > doesn't exist in the list. It is strange, but as it stands ArrayList.Remove is pretty much a fire and forget method. Most of the existing collectinos are *very* forgiving. This may not stand in the face of generics. For example, Dictionary<T,U> currently throws an exception when a key doesn't exist instead of returning null(there is a discussion on this which I'll post a link to tomorrow, to tired to find it right now). I
Its really hard to say which is better, virtually any answer you get is always specific to what you are doing and if you think hard enough, you can usually come up with an argument for hte other way of doing things.
> The other bug was a race condition which I solved by locking a particular > block of code that I hadn't given enough prior thought to. Multithreading is a demon like that, its generally teh stuff you don't expect more so than what you do, especially when you start evolving a code base that is only multi-thread safe where it is(was) possible for multiple threads to reach. As the code grows, you sometimes add race conditions or deadlocks where you didn't figure they were possible when you designed your locking code.
> I appreciated your reply. np.
> I'm gonna have to let my brain rest for awhile now. > > See ya. C# Learner - 30 May 2004 07:26 GMT [...]
>>The first bug was that I was calling ArrayList.Remove instead of >>ArrayList.RemoveAt. I was passing an 'int' as a parameter to Remove, when >>the ArrayList was holding only references to instances of class >>Connection. Doh! > > Ooops, Generics would have helped in this case, ;). Yeah. :-D
>>I'm kicking myself now because of this. What seems weird here, though, is >>that ArrayList.Remove doesn't throw an exception when the passed reference [quoted text clipped - 6 lines] > null(there is a discussion on this which I'll post a link to tomorrow, to > tired to find it right now). I Alright. :-)
G'night.
[...]
Daniel O'Connell [C# MVP] - 30 May 2004 16:08 GMT >> It is strange, but as it stands ArrayList.Remove is pretty much a fire >> and forget method. Most of the existing collectinos are *very* forgiving. [quoted text clipped - 4 lines] > > Alright. :-) Here ya go: http://blogs.msdn.com/brada/archive/2004/04/26/120438.aspx
> G'night. > > [...] Jon Skeet [C# MVP] - 30 May 2004 06:39 GMT > I have a nasty bug in my multi-threaded app which I think is being > caused by "race conditions" with multiple threads. I'm getting data [quoted text clipped - 10 lines] > someone the whole project to get help, so they can observe the problems > themselves, I guess... I have an unfinished threading tutorial which may still be of some help:
http://www.pobox.com/~skeet/csharp/multithreading.html
Let me know if it raises any further questions.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
C# Learner - 30 May 2004 06:58 GMT > I have an unfinished threading tutorial which may still be of some > help: > > http://www.pobox.com/~skeet/csharp/multithreading.html > > Let me know if it raises any further questions. I read it earlier, in fact. :-) I bookmarked it the other day when you mentioned it, then decided to read it during this bug crisis.
I did notice that it's unfinished because I didn't see an explanation for why you created a new variable just for the purposes of locking (countLock), but it's a very nice, easy-to-understand tutorial. It did help, too, as I'm still pretty new to locking, and it cleared up some doubts I had about it.
My problems seem to be fixed now (as detailed in my reply to Daniel), so things are looking okay for now. I'll double-check all my changes later on, though, to be sure...
Thanks.
Jon Skeet [C# MVP] - 30 May 2004 07:20 GMT > > I have an unfinished threading tutorial which may still be of some > > help: [quoted text clipped - 5 lines] > I read it earlier, in fact. :-) I bookmarked it the other day when you > mentioned it, then decided to read it during this bug crisis. Cool :)
> I did notice that it's unfinished because I didn't see an explanation > for why you created a new variable just for the purposes of locking > (countLock), but it's a very nice, easy-to-understand tutorial. It did > help, too, as I'm still pretty new to locking, and it cleared up some > doubts I had about it. Excellent. If you have any suggestions as to how to improve it (other than to finish it!) please let me know.
> My problems seem to be fixed now (as detailed in my reply to Daniel), so > things are looking okay for now. I'll double-check all my changes > later on, though, to be sure... Glad you're sorted. I find there's a particular satisfaction in squashing threading bugs. Maybe it's because they're usually so hard to find!
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
C# Learner - 30 May 2004 07:46 GMT [...]
>>I did notice that it's unfinished because I didn't see an explanation >>for why you created a new variable just for the purposes of locking [quoted text clipped - 4 lines] > Excellent. If you have any suggestions as to how to improve it (other > than to finish it!) please let me know. I don't remember noticing anything that I felt could've been improved. In any case, I'll read it again later once I've rested my brain, and I'll post here if I find anything.
>>My problems seem to be fixed now (as detailed in my reply to Daniel), so >>things are looking okay for now. I'll double-check all my changes [quoted text clipped - 3 lines] > squashing threading bugs. Maybe it's because they're usually so hard to > find! Yeah, it's a real weight off my shoulders. :-)
Jerry Pisk - 30 May 2004 08:48 GMT Jon,
I do have a comment on the tutorial. When talking about deadlocks you suggest that programmers take care to take multiple locks in the same order. That is a very difficult rule to enforce as you note, I would say you should not acquire more than one lock at a time. If you need to lock two resources at the same time than you need to control that with a single lock (and I mean a single lock, there will be no way to lock just one resource). That's lot easier to enforce than the order in which locks can be acquired.
Jerry
> > > I have an unfinished threading tutorial which may still be of some > > > help: [quoted text clipped - 24 lines] > squashing threading bugs. Maybe it's because they're usually so hard to > find! Jon Skeet [C# MVP] - 30 May 2004 10:10 GMT > I do have a comment on the tutorial. When talking about deadlocks you > suggest that programmers take care to take multiple locks in the same order. [quoted text clipped - 3 lines] > mean a single lock, there will be no way to lock just one resource). That's > lot easier to enforce than the order in which locks can be acquired. I don't think it always makes sense though. For a start, locks can be in different classes, so one combined lock may well be hard to contrive elegantly (as it would be shared between two classes).
I'll have a think about it though - maybe suggest it as an alternative approach.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
Niki Estner - 30 May 2004 10:29 GMT Well, actually locking only one object at a time is merely a special case of always locking in the same order: There is only one possible order for one object ;-)
However, I do think the deadlocks section could mention more pitfalls: for example, you do mention that Monitor.Wait does release and reacquire a lock - chances are good that locks are in wrong order after that... Or: Some methods (like Control.Invoke or anything else that uses a windows message queue) contain "implicit locks"; If the main thread is waiting for a worker thread while that thread issues a Control.Invoke call, the call will deadlock.
And I think a "how do I shut down my worker threads cleanly"-section would be very helpful;
But that's enough critic: I wished I'd have read that article before I started multithreaded programming...
Niki
> Jon, > [quoted text clipped - 36 lines] > > squashing threading bugs. Maybe it's because they're usually so hard to > > find! Bruno Jouhier [MVP] - 30 May 2004 11:52 GMT > Well, actually locking only one object at a time is merely a special case of > always locking in the same order: There is only one possible order for one [quoted text clipped - 3 lines] > example, you do mention that Monitor.Wait does release and reacquire a > lock - chances are good that locks are in wrong order after that... No! If your code paths guarantee that the locks are always acquired in the same order, the will always be, even if there is wait that releases the locks and reacquires them (they will reacquired in the order in which they were acquired before entering the wait). If this was not the case, Wait and Pulse would be completely unusable!
> Or: Some methods (like Control.Invoke or anything else that uses a windows > message queue) contain "implicit locks"; If the main thread is waiting for a > worker thread while that thread issues a Control.Invoke call, the call will > deadlock. Yes, you have to be careful with the locks placed by the framework, and in general with "callbacks" (events, virtual methods overriden by code that you don't control). You should try to issue your callbacks from code sections that are not protected by locks to avoid deadlocks. This is not always easy to achieve.
Bruno.
> And I think a "how do I shut down my worker threads cleanly"-section would > be very helpful; [quoted text clipped - 51 lines] > > > squashing threading bugs. Maybe it's because they're usually so hard to > > > find! Niki Estner - 30 May 2004 12:56 GMT > ... > > However, I do think the deadlocks section could mention more pitfalls: for [quoted text clipped - 5 lines] > locks and reacquires them (they will reacquired in the order in which they > were acquired before entering the wait). Where is that information from? The Docs clearly have a different position: "Wait releases the lock for the specified object only; if the caller is the owner of locks on other objects, these locks are not released" (copied from MSDN Article - Monitor.Wait Method) Let's assume we have two locks, L1 and L2: Thread A aquires L1, acquires L2, then waits for L1 (accoring to the docs releasing L1, not L2) Thread B acquires L1, tries to acquire L2 -> deadlock (Thread B is waiting for thread A, and noone else could Pulse L1, so they both wait forever)
> If this was not the case, Wait and Pulse would be completely unusable! Not at all: calling Wait while holding multiple locks isn't good style anyway; Shared resources should be locked for as short as possible.
> Yes, you have to be careful with the locks placed by the framework, and in > general with "callbacks" (events, virtual methods overriden by code that you > don't control). There's nothing wrong about callbacks - synchronized containers or older COM objects will have the same problems in ordinary function calls. Reading from a socket might cause a deadlock too, even across application-boundaries, without any callbacks in them. (which doesn't mean locking socket access would be a bad thing to do)
> You should try to issue your callbacks from code sections > that are not protected by locks to avoid deadlocks. This is not always easy > to achieve. If you want that a callback is only called by one thread at a time, call it in a lock; Otherwise, don't. That's it.
Niki
Bruno Jouhier [MVP] - 30 May 2004 19:16 GMT > > ... > > > However, I do think the deadlocks section could mention more pitfalls: [quoted text clipped - 21 lines] > Not at all: calling Wait while holding multiple locks isn't good style > anyway; Shared resources should be locked for as short as possible. Ooops. I was not very awake when I wrote this. You are right.
I got confused by the fact that if the same lock has been acquired N times, Wait releases it completely and then reaquires it N times, but what I said about the L1 / L2 case is completely wrong, and your deadlock example is right.
> > Yes, you have to be careful with the locks placed by the framework, and in > > general with "callbacks" (events, virtual methods overriden by code that [quoted text clipped - 3 lines] > There's nothing wrong about callbacks - synchronized containers or older COM > objects will have the same problems in ordinary function calls. I agree, but the problem is more about "who" writes the code. When you call a private or non virtual method, you have complete control on the locks that this method will acquire, but when you call a callback or a virtual method that may be overriden in assemblies written by someone else, you call some piece of code that may try to acquire other locks and that will cause deadlocks if the person who implements this piece of code is not aware that it is being called in the context of a lock. There are two ways around it: a) documentation b) rewrite the code so that the callback is done outside the lock context. Of course, this does not happen in every piece of code but this is something that people who write toolkits and framework components should be aware of.
Bruno.
> Reading from a socket might cause a deadlock too, even across > application-boundaries, without any callbacks in them. (which doesn't mean [quoted text clipped - 11 lines] > > Niki Niki Estner - 30 May 2004 20:18 GMT > ... > I agree, but the problem is more about "who" writes the code. When you call [quoted text clipped - 4 lines] > deadlocks if the person who implements this piece of code is not aware that > it is being called in the context of a lock. I'm not sure I understand: My locks are almost always around "someone else's" code: if that code is not considered thread-safe. That's what locks are for, aren't they? Locking file streams, sockets, container classes, etc.
> There are two ways around it: > a) documentation b) rewrite the code so that the callback is done outside > the lock context. Of course, this does not happen in every piece of code but > this is something that people who write toolkits and framework components > should be aware of. Of course: Everything that's inside a lock should have a damn good reason to be there. If a callback (or any other piece of code that takes time) can live happily outside the lock, then it shouldn't be inside - for safety and efficiency reasons.
Niki
Jon Skeet [C# MVP] - 30 May 2004 14:08 GMT > Well, actually locking only one object at a time is merely a special case of > always locking in the same order: There is only one possible order for one [quoted text clipped - 3 lines] > example, you do mention that Monitor.Wait does release and reacquire a > lock - chances are good that locks are in wrong order after that... Not sure how that could cause a deadlock at the moment.
> Or: Some methods (like Control.Invoke or anything else that uses a windows > message queue) contain "implicit locks"; If the main thread is waiting for a > worker thread while that thread issues a Control.Invoke call, the call will > deadlock. Yup - I'm going to mention that in a section about Control.Invoke. Another problem with Control.Invoke is doing something like:
lock (foo) { bar = newValue; Invoke (SomethingWhichReadsBarHavingLockedFoo); }
(if you see what I mean).
> And I think a "how do I shut down my worker threads cleanly"-section would > be very helpful; Yup, that's coming later.
> But that's enough critic: I wished I'd have read that article before I > started multithreaded programming... Glad it's of some use. I really *must* get it finished though.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
Jon Skeet [C# MVP] - 30 May 2004 14:29 GMT > > Well, actually locking only one object at a time is merely a special case of > > always locking in the same order: There is only one possible order for one [quoted text clipped - 5 lines] > > Not sure how that could cause a deadlock at the moment. Having read your example to Bruno, I see what you mean. I guess that means only Waiting on the "innermost" lock.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
Niki Estner - 30 May 2004 16:42 GMT > ... > Having read your example to Bruno, I see what you mean. I guess that > means only Waiting on the "innermost" lock. Really? Same example as above, but this time they wait/pulse the innermost lock:
Thread A aquires L1, acquires L2, then waits for L2 (releasing L2, not L1) Thread B tries to acquire L1 -> waiting for Thread A, can't pulse L2
You are right: Strictly speaking, this is no deadlock, as some other thread could acquire and Pulse L2; In practice however, there might well be no other piece of code that pulses that lock...
Niki
PS: I've tried that sample; The docs are correct about the behaviour of Wait - it doesn't touch any of the other locks
Jon Skeet [C# MVP] - 30 May 2004 20:18 GMT > > ... > > Having read your example to Bruno, I see what you mean. I guess that [quoted text clipped - 9 lines] > could acquire and Pulse L2; In practice however, there might well be no > other piece of code that pulses that lock... Yup. When Wait and Pulse are involved, I suspect the rules need to be extended quite a lot - unless (as may be the case) it's worth recommending that when using Wait and Pulse, *no* other locks should be held.
The rule about acquiring locks only in a particular order can be generalised to "don't acquire lock A within lock B; you can acquire lock B without having acquired lock A, but that's all". That would get round the above, but it's a bit harder to keep track of. Mind you, no-one's trying to say that it's easy to get all this right :)
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
Bruno Jouhier [MVP] - 31 May 2004 10:33 GMT > The rule about acquiring locks only in a particular order can be > generalised to "don't acquire lock A within lock B; you can acquire > lock B without having acquired lock A, but that's all". That would get > round the above, but it's a bit harder to keep track of. Mind you, > no-one's trying to say that it's easy to get all this right :) That's the rule I am using. It is equivalent to saying that you always need to acquire them in the same order, if you consider that "reacquiring" is equivalent to "acquiring" (instead of considering that "reacquiring" is a nop).
Unfortunately, there is no way to verify this at compile time. But you can write assertions that check it at run-time. For example, you can encapsulate your locks in properties and encode these rules in the accessors:
object LockB { get { return _lockB; } } object LockA { get { Assert(!IsAcquired(_lockB)); return _lockA; } }
Unfortunately (again), there is no API to check if a lock has been acquired or not (I submitted a request for this on the Whidbey NG), but you can get around it with:
static bool IsAcquired(object l) { try { Monitor.Pulse(l); return true; } catch (SynchronizationLockException) { return false; } }
You should only activate these checks in debug mode because they slow things down a lot (you end up throwing and catching an exception every time you use LockA).
I have used assertions like these in my multi-threaded server and I found them very helpful, especially when I had to reorganize code. I did not have to go through extensive stress testing to detect deadlock conditions, I found them as soon as I tried to execute any code path that did not acquire the locks in the right order.
Bruno.
Bruno Jouhier [MVP] - 30 May 2004 19:20 GMT > > > Well, actually locking only one object at a time is merely a special case of > > > always locking in the same order: There is only one possible order for one [quoted text clipped - 8 lines] > Having read your example to Bruno, I see what you mean. I guess that > means only Waiting on the "innermost" lock. Yes, my post was totally confusing.
Niki Estner - 30 May 2004 16:57 GMT > ... > Yup - I'm going to mention that in a section about Control.Invoke. > Another problem with Control.Invoke is doing something like: > ... > Yup, that's coming later. > ... Seems like you got big plans for that article...
> Glad it's of some use. I really *must* get it finished though. If you're done, think about an "advanced topics" sequel: Things like ThreadPool, asynchronous delegates and asynchronous IO callbacks really do need some good introduction, too ;-)
Niki
Bruno Jouhier [MVP] - 30 May 2004 11:33 GMT > Jon, > [quoted text clipped - 5 lines] > mean a single lock, there will be no way to lock just one resource). That's > lot easier to enforce than the order in which locks can be acquired. Of course, you will avoid deadlocks if you replace two locks by a single one, BUT there is also a good chance that you will reduce the level of concurrency (and thus the throughput of your server if you are writing a multithreaded server). So, IMO, the right way to go is to follow Jon's avice, i.e., keep several locks and acquire them always in the same order.
Let's consider the following situation: * You have two locks L1 and L2 * The code controlled by L1 is low level logic that executes very fast (does not perform any I/O) and it is called from many places in your code (not just from code controlled by L2). * The code controlled by L2 is higher level logic that executes more slowly (performs some I/O for example) and it calls code controlled by L1 (L1 is always acquired after L2 to prevent deadlocks).
If you replace L1 and L2 by a single lock, you will end up waiting for L2 when you call the low level code controlled by L1. So operations that are not I/O bound will have to wait for I/O completion on other operations. This is very bad and it can have a dramatic impact on the throughput of a multi-threaded server!
So, multithreading is really a subtle issue. Writing "correct" code, i.e., code that does not deadlock and that accesses shared data in a disciplined way, is already difficult, but writing "efficient" code, i.e., code that maximizes concurrent execution, is even more difficult. You need to elaborate a good "locking strategy" if you want to take full advantage of multi-threading (of course, I am talking about writing multi-threaded servers, there are other scenarios that are not so demanding).
Bruno.
> Jerry > [quoted text clipped - 26 lines] > > squashing threading bugs. Maybe it's because they're usually so hard to > > find! C# Learner - 30 May 2004 13:58 GMT > [...] > Excellent. If you have any suggestions as to how to improve it (other > than to finish it!) please let me know. > [...] Reading through a Subversion tutorial has given me an idea.
It goes something like this:
"Suppose we have two co-workers, Harry and Sally. They each decide to edit the same file at the same time. If Harry saves his changes first, then it's possible that (a few moments later) Sally could accidentally overwrite them with her own new version of the file. Any changes Harry made won't be present in Sally's newer version of the file, because she never saw Harry's changes to begin with."
Perhaps this, used as an analogy, could help newbies to understand race conditions. What do you think?
Jon Skeet [C# MVP] - 30 May 2004 14:13 GMT > Reading through a Subversion tutorial has given me an idea. > [quoted text clipped - 9 lines] > Perhaps this, used as an analogy, could help newbies to understand race > conditions. What do you think? Possibly. I can't remember right now how I've explained it at all. The difficulty is getting the balance right between just enough analogies to get the point across, and giving two many analogies which start confusing things :)
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
C# Learner - 30 May 2004 15:12 GMT [...]
>>Perhaps this, used as an analogy, could help newbies to understand race >>conditions. What do you think? > > Possibly. I can't remember right now how I've explained it at all. By the way, I think your current explanation of it is fine. The analogy idea just came to me while reading something totally unrelated.
> The > difficulty is getting the balance right between just enough analogies > to get the point across, and giving two many analogies which start > confusing things :) Yeah, I see your point.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|