.NET Forum / .NET Framework / New Users / October 2007
HELP - Why are Worker thread's child thread's entering WaitSleepJoin? Plus Context Deadlock....
|
|
Thread rating:  |
celoftis - 29 Sep 2007 21:25 GMT Using VS2005, VB.NET, I have a worker thread (started by main UI thread) that in turn spawns and monitors child threads to execute several long running process. The problem is that the worker thread's children are entering WaitSleepJoin and thus my worker thread just hangs doing nothing and just loops. My question is why are my child thread's entering waitsleepjoin state? There is no sleep or spinwait being called in the child's thred process.
Also, I am getting the dreaded content deadlock and con't figure out why: The CLR has been unable to transition from COM context 0x1d1ba0 to COM context 0x1d1e80 for 60 seconds. The thread that owns the destination context/apartment is most likely either doing a non pumping wait or processing a very long running operation without pumping Windows messages. This situation generally has a negative performance impact and may even lead to the application becoming non responsive or memory usage accumulating continually over time. To avoid this problem, all single threaded apartment (STA) threads should use pumping wait primitives (such as CoWaitForMultipleHandles) and routinely pump messages during long running operations. The deadlock even happens if the code in StartUpload (worker thread's process - see full lisiting below) executes to the end (which I thought would cause the worked thread to Stop). How can I determine where the deadlock is? BTW, I've tried diabling MDA's LoadLock exception and adding a ...mda.config file to no avail (I don't think that VS is starting the debugger starting the MDA erronously).
Beow are some code excerpts that show (1) the start of the worker thread; (2) code in the work thread's proc that spawns and monitors child threads and (3) the child thread proc. Let me know if you see any problems.
One thing I'm not sure about - that's is the number of event's that I am raising. These events are handled in my main UI thread so the user can get status updates. The event handlers are invoking delegate functions (so they run on the main UI thread) using BeginInvoke in attempts to avoid blocking on the RaiseEvent calls in the worker thread.
Selected code excerpts follow: ---------------------------------------------------------------------------------------------------------- Private Sub btnUpload_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnUpload.Click Try Dim xRoot As XmlElement = xDataDocument.DocumentElement 'xDataDocument is an XML having info about audio files... Dim xNodeList As XmlNodeList = xRoot.SelectNodes("// audioFile[bConvertedToMp3='true' and bUploaded='false']/strFilename") 'gets a list of all the files that need to be uploaded to the server... If xNodeList Is Nothing OrElse xNodeList.Count <= 0 Then Else Me.Uploader = New Upload(Me.strCI_Id, xNodeList) 'Me.Uploader.StartUpload() 'COMMENTED OUT, NOW RUNS N SEPERATE WORKED THREAD Me.tUpload = New Thread(New ThreadStart(AddressOf Me.Uploader.StartUpload)) Me.tUpload.Priority = ThreadPriority.Normal Me.tUpload.SetApartmentState(ApartmentState.STA) Me.tUpload.Start() 'STARTS THE WORKER THREAD End If xNodeList = Nothing Catch ex As Exception Debug.WriteLine(ex.Message & vbNewLine & ex.StackTrace) 'TODO handle upload error End Try End Sub ---------------------------------------------------------------------------------------------------------- Public Sub StartUpload() 'Private Sub Process_Upload() If Me.xNodeList Is Nothing OrElse Me.xNodeList.Count <= 0 Then Throw New Exception("No files to upload (xNodeList is empty)") Else Try Me.bRunning = True RaiseEvent DisableUploadAction() 'ALL RAISED EVENTS ARE INVOKED USING BeginInvoke (asynchronously) TO AVOID BLOCKING WHILE THE UI GETS UPDATED RaiseEvent ChangeUploadButtonTooltip("Uploading file(s)") Dim intNumberFiles = Me.BuildUploadDataSet() 'Converts xNodeList to a DataSet so we can just send an XML string up to the server.. RaiseEvent StatusChange("Starting upload of " & intNumberFiles & " file" & IIf(intNumberFiles = 1, "", "s")) 'Send upload list to the server, get a confirmation list back... Dim objWebService As New myWebServiceClass Try Me.strUploadFileList = objWebService.UploadFileList(strCI_Id, dsUpload.GetXml) Catch ex As Exception Debug.WriteLine(ex.Message & vbNewLine & ex.StackTrace) Throw ex End Try If InStr(Me.strUploadFileList.ToUpper, "ERROR") = 1 Then RaiseEvent StatusChange("Error registering upload file list for CIID=" & strCI_Id & "..." & Replace(strUploadFileList, "ERROR", "")) Throw New Exception("Error registering upload request for CIID=" & strCI_Id & "..." & Replace(strUploadFileList, "ERROR", "")) Else Try Dim dsFileList As DataSet = New DataSet Dim srReader As New StringReader(Me.strUploadFileList) dsFileList.ReadXml(srReader, XmlReadMode.InferSchema) 'read the strUploadFileList into a dataset for easies processing below... Dim dr As DataRow Dim intFileSize As Int32 = 0 Dim intSecs As Double = 0 Dim dblProgress As Double = 0 Dim strMessage As String = Nothing Dim bProgress As Boolean = True For i As Int32 = 0 To dsFileList.Tables(0).Rows.Count - 1 'For each file in the upload list.... strMessage = "Uploading file " & i + 1 & " of " & dsFileList.Tables(0).Rows.Count RaiseEvent StatusChange(strMessage) dr = dsFileList.Tables(0).Rows(i) Me.strUploadFileName = dr.Item(0) 'If the file has no specified path, assume it is in the interview folder If InStr(Me.strUploadFileName, "\") = 0 Then Me.strUploadFileName = STR_INTERVIEW & Me.strUploadFileName Me.dblBytesPerSecond = Me.GetUploadBytesPerSecond() 'Based on past history, speculate an upload speed in bytes/second Try intFileSize = My.Computer.FileSystem.GetFileInfo(Me.strUploadFileName).Length bProgress = True Catch ex As FileNotFoundException bProgress = False End Try tsUploadStart = Nothing tsUploadEnd = Nothing Try 'HERE IS WHERE I KICK OFF THE WORK THREAD'S CHILDREN If Not Me.tUpload Is Nothing AndAlso Me.tUpload.IsAlive Then Me.tUpload.Abort() Me.tUpload = Nothing Me.tUpload = New Thread(New ThreadStart(AddressOf Me.Process_UploadFile))
Me.tUpload.SetApartmentState(ApartmentState.STA) Me.tUpload.Priority = ThreadPriority.Normal Me.tUpload.IsBackground = True 'Set this thread to end with parent thread Me.tUpload.Start() Catch ex As Exception Debug.WriteLine("launch of upload thread failed..." & vbNewLine & ex.Message & vbNewLine & ex.StackTrace) If Not Me.tUpload Is Nothing AndAlso Me.tUpload.IsAlive Then Me.tUpload.Abort() Me.tUpload = Nothing 'TODO log failure of upload thread... Continue For 'Skip this file... 'Throw ex End Try Try 'ESTIMATE THE PROGRESS OF THE WORKER THREAD WHILE CHECKING ITS THREADSTATE... While Me.tsUploadStart = Nothing OrElse (Me.tUpload.ThreadState And Threading.ThreadState.Unstarted) = Threading.ThreadState.Unstarted Thread.Sleep(5) 'wait for thread to start... End While While (Me.tUpload.ThreadState And Threading.ThreadState.Stopped) = 0 'If Me.tUpload.ThreadState And Threading.ThreadState.WaitSleepJoin = Threading.ThreadState.WaitSleepJoin Then Me.tUpload.Interrupt() 'IS this what I need todo to get avoid the waitsleepjoin? Why did the child thread go to sleep? intSecs = ((New TimeSpan(Now.Ticks)).TotalMilliseconds - Me.tsUploadStart.TotalMilliseconds) / 1000.0 If bProgress Then If Me.dblBytesPerSecond = 0 Then dblProgress = intSecs / (intFileSize / (0.8 * 32.768)) 'default upload speed to to 80% of ~262kbps ~= 32.768K/sec (typical upload speed of cable modem) Else dblProgress = intSecs / (intFileSize / Me.dblBytesPerSecond) End If If dblProgress > 0.99 Then dblProgress = 0.99 RaiseEvent StatusChange(strMessage & ", " & Format(dblProgress, "0.#%") & New String(".", intSecs Mod 11)) Else RaiseEvent StatusChange(strMessage & ", " & intSecs & "second(s)" & New String(".", intSecs Mod 11)) End If Thread.Sleep(100) End While If Me.strUploadResult Is Nothing Then Debug.WriteLine("processing of upload failed...") RaiseEvent StatusChange(strMessage & " upload failed") If Not Me.tUpload Is Nothing AndAlso Me.tUpload.IsAlive Then Me.tUpload.Abort() Me.tUpload = Nothing 'TODO log failure of upload thread... Continue For 'Skip this file... 'Throw ex 'Exit For 'TODO should be keep trying to uploads? ElseIf InStr(Me.strUploadResult.ToUpper, "ERROR") = 1 Then Debug.WriteLine(" upload failed " & Me.strUploadResult) RaiseEvent StatusChange(strMessage & " upload failed " & Replace(Me.strUploadResult, "ERROR", "")) If Not Me.tUpload Is Nothing AndAlso Me.tUpload.IsAlive Then Me.tUpload.Abort() Me.tUpload = Nothing 'TODO log failure of upload thread... Continue For 'Skip this file... 'Throw ex 'Exit For 'TODO should be keep trying to uploads? Else RaiseEvent StatusChange(strMessage & " complete") 'Calculate elapsed time, store stats for future uploads RaiseEvent TrackUploadStats(intFileSize, (Me.tsUploadEnd.TotalMilliseconds - Me.tsUploadStart.TotalMilliseconds) / 1000) RaiseEvent FileUploadComplete(Me.strUploadFileName, Now) 'Mark this file as being uploaded... End If Catch ex As Exception Debug.WriteLine("processing of upload failed..." & vbNewLine & ex.Message & vbNewLine & ex.StackTrace) RaiseEvent StatusChange(strMessage & " upload failed " & ex.Message) If Not Me.tUpload Is Nothing AndAlso Me.tUpload.IsAlive Then Me.tUpload.Abort() Me.tUpload = Nothing 'TODO log failure of upload thread... Continue For 'Skip this file... 'Throw ex End Try Next i RaiseEvent StatusChange("Upload complete...") Thread.Sleep(2000) 'Dispaly complete message for a couple of seconds... RaiseEvent StatusChange("") srReader.Dispose() srReader = Nothing dsFileList.Dispose() dsFileList = Nothing Catch ex As Exception Debug.WriteLine(ex.Message & vbNewLine & ex.StackTrace) 'TODO log error RaiseEvent StatusChange("Error uploading file list for CIID=" & strCI_Id & "..." & ex.Message) Throw ex End Try End If objWebService.Dispose() objWebService = Nothing Catch ex As Exception Debug.WriteLine(ex.Message & vbNewLine & ex.StackTrace) 'TODO log error RaiseEvent StatusChange("Error processing upload action for CIID=" & strCI_Id & "..." & ex.Message) Throw ex Finally 'TODO signal main thread that uploading is complete 'RaiseEvent EnableUploadAction() 'This may not be the correct action... 'RaiseEvent ChangeUploadButtonTooltip("No cases pending transmission") 'This may not be true RaiseEvent Complete() Me.bRunning = False End Try End If End Sub ---------------------------------------------------------------------------------------------------------- Private Sub Process_UploadFile() Try Dim strName As String = StrReverse(StrReverse(Me.strUploadFileName).Split("\")(0)).ToUpper Dim dt As DateTime = Now 'initialize to today's date... Try dt = Me.GetInterviewStartDate(Replace(Replace(strName, ".WAV", ""), ".MP3", "")) Catch ex As Exception End Try Dim objWebService As New myWebServiceClass Me.strUploadResult = Nothing Me.tsUploadStart = New TimeSpan(Now.Ticks) 'Capture start time Me.strUploadResult = objWebService.Upload(Me.strCI_Id, dt, "", strName, Me.ConvertFileToBase64(Me.strUploadFileName)) 'THIS CALL TO THE WEB SERVICE UPLOADS A FILE... IT CAN TAKE A WHILE TO RUN Me.tsUploadEnd = New TimeSpan(Now.Ticks) 'Capture end time objWebService.Dispose() objWebService = Nothing Catch ex As Exception Throw ex End Try End Sub
Peter Duniho - 30 Sep 2007 00:49 GMT > Using VS2005, VB.NET, > I have a worker thread (started by main UI thread) that in turn [quoted text clipped - 6 lines] > the > child's thred process. Well, then what statement is a stuck "child" thread waiting on when you break in the debugger once you get the application into that state?
I can tell you that the number of events being raised should not in any way be a factor. It may in fact be related, but if so only because for some reason you haven't properly synchronized code related to the raising of events. The raising of events itself isn't an issue (and in fact, raising an event is not very much different from simply calling a method).
For what it's worth, the code you posted is practically useless to anyone trying to help answer your question. It's far too complicated for most people to bother trying to read through it and see what might be going on, and as near as I can tell it's not even a complete sample, so no one can simply compile and run it either.
You will get better help if you come up with a concise-but-complete example of code that reliably reproduces the problem. "Concise" meaning there's not a single thing in the code that isn't directly related to reproducing the problem, and "complete" meaning no one has to add anything else to the code to get it to run.
Personally, it's my opinion that the source of this sort of problem is usually easily identified, if not solved outright, just by breaking in the debugger and finding out what statement a blocked thread is waiting on. That usually provides a great deal of information about what resource a thread is waiting for and why it's not getting it.
But at the very least, you need a much simpler code sample that is complete in order to successfully solicit detailed help.
Pete
celoftis - 30 Sep 2007 01:31 GMT > Well, then what statement is a stuck "child" thread waiting on when you > break in the debugger once you get the application into that state? [quoted text clipped - 3 lines] > But at the very least, you need a much simpler code sample that is > complete in order to successfully solicit detailed help. Pete, very good points about the code I posted - I'll note that in the future. Now, on to your question - I'm not sure how to tell what line of code (LOC) the child thread is stuck on when I break into the worker (parent) thread. Maybe this is simple, but forgive me for asking, how do I get this information from the bugger?
But without knowing for a fact what LOC the child is stuck on, I strongly suspect one line in that child's thread proc - that line is a call to a long runnng process (web service) that uploads files from the client to the server. So, assuming that this is the line that the child thread is stuck on, how do I get it unstuck. In my testing I have child run and run (many times longer than is required for the suspect LOC to complete - note the code below is the (concise) child thread's start procedure: ---------------------------------------------------------------------------------------------------------- Private Sub Process_UploadFile() Try Dim objWebService As New myWebServiceClass '----------- THE NEXT LINE IS PROBABLY THE LINE THAT IS BLOCKING IN THE CHILD ------------- Me.strUploadResult = objWebService.Upload(file) 'For simplicity parms to the web service have been removed.... objWebService.Dispose() objWebService = Nothing Catch ex As Exception Throw ex End Try End Sub ----------------------------------------------------------------------------------------------------------
Peter Duniho - 30 Sep 2007 02:52 GMT > [...] > Now, on to your question - I'm not sure how to tell what line of code > (LOC) the child thread is stuck on when I break into the worker > (parent) thread. Maybe this is simple, but forgive me for asking, how > do I get this information from the bugger? In Visual Studio 2005, you would select the appropriate thread from the threads dropdown list found in the debugging toolbar. That will cause the callstack to be populated with that thread's information, and of course from there you can inspect any of the stack frames, including the lowest-level one that will tell you where the thread is waiting.
Note that the Express version of Visual Studio 2005 doesn't provide this. It has only very limited thread-debugging features. You could use the Debug.WriteLine() method to trace your threads' execution, and if you already have a good idea of where things are stuck, this is usually not even that hard. But if you're going to do any serious thread debugging, you need the retail version of VS.
If you already have it, then you're good to go. :)
> But without knowing for a fact what LOC the child is stuck on, I > strongly suspect one line in that child's thread proc - that line is a > call to a long runnng process (web service) that uploads files from > the client to the server. So, assuming that this is the line that the > child thread is stuck on, how do I get it unstuck. Well, that depends on why it's stuck, of course. :) Presumably you should be able to verify whether the data has in fact been uploaded successfully. That should tell you whether the service is failing to upload the data and thus never returns, or if it's simply failing to return correctly upon completion of the operation.
In either case, how to address it depends on the myWebServiceClass class. If it's buggy, then obviously the bug in it needs to be fixed. As well, you may find that in order to recover from network problems gracefully you need to provide or use a timeout mechanism, or some way to cancel the operation externally after some timeout.
The name "myWebServiceClass" implies that the class itself is yours, which means that if that thread is blocked in that statement, then it should actually be blocked even deeper in your own code. So that's part of the analysis as well. If the class isn't actually yours, then you may have to enlist the help of whoever did write it.
Pete
celoftis - 01 Oct 2007 15:59 GMT Pete, Thanks for the help.
Well, I haven't been able to replicate the problem with my child thread getting stuck again... I suppose that this may have been an issue with my web service code - I think that it is unstable enough to cause this problem so I've moved on to the other problem now: context deadlock.
Using the tip you gave me about looking at the threads in the debugger I was hoping to determine which of my threads was causing the Context deadlock. Before giving this a try, I gave all my threads unique names so that I could see which one was being executed at the time of the deadlock - but when the context deadlock is expereinced and I break into the code, none of the threads on the stack have a name - one is highlighted, but I don't know how to determine what this thread is deadlocked on - I've repeated the context deadlock message below - any thoughts on this? Also, note that I only get the deadlock AFTER running worker thread mentioned in this post - I mean, all the code for the worker thread completes normally - and as I mentioned above, then the context deadlock is detected no named threads appear in the list - so does this mean that my worker thread (and all its children) ended also?
Your thoughts appreciated.
Context deadlock error message: The CLR has been unable to transition from COM context 0x1d1ba8 to COM context 0x1d1e88 for 60 seconds. The thread that owns the destination context/apartment is most likely either doing a non pumping wait or processing a very long running operation without pumping Windows messages. This situation generally has a negative performance impact and may even lead to the application becoming non responsive or memory usage accumulating continually over time. To avoid this problem, all single threaded apartment (STA) threads should use pumping wait primitives (such as CoWaitForMultipleHandles) and routinely pump messages during long running operations.
Peter Duniho - 01 Oct 2007 22:13 GMT > [...] > Using the tip you gave me about looking at the threads in the debugger [quoted text clipped - 5 lines] > highlighted, but I don't know how to determine what this thread is > deadlocked on I don't name my threads. I just select a thread and see where it is. If it's the thread I'm interested in, it'll be on a line of code I recognize. If it's not, it won't and I check the next thread.
Inefficient maybe, but it works. :)
- I've repeated the context deadlock message below - any
> thoughts on this? Also, note that I only get the deadlock AFTER > running worker thread mentioned in this post - I mean, all the code > for the worker thread completes normally - and as I mentioned above, > then the context deadlock is detected no named threads appear in the > list - so does this mean that my worker thread (and all its children) > ended also? It's entirely possible that the error message you're getting is a false positive. That is, that you don't have any threads that are stuck. I have only ever seen that message when stepping through code, causing one or more threads to be delayed long enough for the debugger to pop the error up. I don't know if your situation is similar; I don't have enough experience with the error (and in particular, have zero experience with it in a true positive scenario) to say for sure.
I would say that one thing to look at is whether your code appears to be working normally otherwise. If it is, then perhaps there's no problem at all.
Pete
celoftis - 02 Oct 2007 05:48 GMT Thanks again for the response.
> I don't name my threads. I just select a thread and see where it is. > If it's the thread I'm interested in, it'll be on a line of code I > recognize. If it's not, it won't and I check the next thread. I suppose I need to get more famaliar with thread debugging info. In my output window, I messages telling me that "thread 0x#### has exited with return code 0x0" - which looks good to me (no error number)... my question is how I relate the 0x#### number back to the threads that I create - none of the thread attributes/properties produce this number when I investigate them on a known thread. Where does this number come from?
> It's entirely possible that the error message you're getting is a false > positive. That is, that you don't have any threads that are stuck. I [quoted text clipped - 7 lines] > working normally otherwise. If it is, then perhaps there's no problem > at all. Yes, I have read that the debugger throws false context deadlock errors when debugging - I was thinking that I wasn't falling under that bogus case b/c I get the error when the app is running vs. being stopped in the debugger. Maybe just running the app in Debug mode triggers the error? Not a real secure feeling that I am left with on this...
Peter Duniho - 02 Oct 2007 09:14 GMT > I suppose I need to get more famaliar with thread debugging info. In > my output window, I messages telling me that "thread 0x#### has exited [quoted text clipped - 3 lines] > when I investigate them on a known thread. Where does this number come > from? Which number? The "0x####" number? Or the actual return code? I'm assuming the former, for the moment.
To be honest, I've never looked closely that the thread ID output in the console. .NET is always on its own making threads that start up and exit, causing that message to be more of a distraction than a help to me. If I care about the return code from a thread, I've got code somewhere else that checks it.
That said, I'll bet that the ID is either the managed thread ID (Thread.ManagedThreadId) or the unmanaged thread ID (Windows function GetThreadId()). Frankly, I'm still a little confused about the relationship between managed threads and unmanaged threads. One day I'll read that they are one and the same, though not guaranteed to be so, the next day I'll read that .NET already doesn't map a managed thread to a specific unmanaged thread.
Who knows. But it wouldn't be hard to write a little test app to just Debug.WriteLine() the two thread IDs (managed and unmanaged) just before a thread exits, and see which if either matches the ID in the "thread exited" message (you might return some non-zero number from the thread, just in case some other thread exits at the same time, to make it easier to match up the thread exit message with your own test thread).
> Yes, I have read that the debugger throws false context deadlock > errors when debugging - I was thinking that I wasn't falling under > that bogus case b/c I get the error when the app is running vs. being > stopped in the debugger. Maybe just running the app in Debug mode > triggers the error? Not a real secure feeling that I am left with on > this... I have the same problem with the LoaderLock MDA exception. All signs suggest I should be able to ignore it in the situations in which I run into it, but it still bothers me that it shows up.
Anyway, if I'm recalling correctly, your deadlock error is similar, and is an MDA exception. So it will only ever show up when you are running with the debugger, and it will _always_ show up if the conditions it describes are met, whether or not those conditions are really a problem.
At the heart of the error is the assumption by the debugger that if a thread with a message queue does not call a function to retrieve a message from the queue after a certain amount of time, then that thread may be deadlocked. But there are other reasons a thread may not get to retrieve a message within a specific amount of time. One of those reasons is that you are stepping slowly through the debugger, interfering with a thread's ability to execute its message pump. Another is that the thread has a message pump, but also has some code that takes a long time to complete in response to processing some message.
Now, I think it could be argued that in that latter case, it may actually be a design flaw to have code like that. I feel that threads with message queues are generally threads that should not have lengthy tasks executed in them. Those threads should instead delegate those tasks to some other thread without a message queue. But that doesn't mean that it's a fundamental programming error to fail to do that. It just means that's not how _I_ would do it. There's no shortage of people out there who will argue that's not a useful metric at all. :)
IMHO more important is the question of whether there is actually a deadlock condition when you see the error. And one hopes that you would easily detect a deadlock condition, at least from a user's point of view. That is, if deadlock occurs, your application should essentially just stop working, at least partially. It can be tricky to detect deadlock from within the code, but from where the user's sitting it's usually pretty obvious. The program just stops doing anything. :)
So, assuming your program continues to do the work it's supposed to be doing, even when you get this error, I'd say it's at worst suggesting there may be room for improvement in your design. At the very least, in that situation I don't think it's something that you _have_ to fix (even if it's something I might personally look more closely at).
Pete
celoftis - 02 Oct 2007 20:17 GMT Thanks for the background... and the input. I call this one closed! Again, thanks for the help.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|