.NET Forum / Languages / C# / January 2008
How do you kill a completly locked up thread?
|
|
Thread rating:  |
TheSilverHammer - 16 Jan 2008 14:20 GMT Because C# has no native SSH class, I am using SharpSSH. Sometimes, for reasons I do not know, a Connect call will totally lock up the thread and never return. I am sure it has something to do with weirdness going on with the server I am talking to. Anyhow, this locked up state happens once in a while (maybe once per day) and I can't figure out how to deal with the locked up thread.
If I issue a Thread.Abort() the exception never gets thrown in the thread because it is locked up. This seems to be the only C# method I know of to kill a thread. Is there some other way to kill off a thread?
A way you can simulate this yourself, is create any thread that connects to a server where the connection takes some time, like 10 to 20 seconds. When the thread is doing this connect (it will happen with even a simple TCP/IP socket connect) issue a Thread.Abort() from another thread (the one that made the Thread Object) and you will see that the ThreadAbortException will NOT be thrown until the Connect call returns.
Another way you can do this is after the connect call is finished and you start to talk to a server, if you are on a recive data call and the server stops sending data but never closes the connection, it will block forever. You will once again not be able to get Thread.Abort() to kill the locked up thread.
Is there anyone, especially a MSVP who can answer this?
Peter Bromberg [C# MVP] - 16 Jan 2008 14:35 GMT Here is an article with an approach that allows to make any method call "time-outable": http://www.eggheadcafe.com/tutorials/aspnet/847c94bf-4b8d-4a66-9ae5-5b61f049019f /basics-make-any-method-c.aspx -- Peter Site: http://www.eggheadcafe.com UnBlog: http://petesbloggerama.blogspot.com MetaFinder: http://www.blogmetafinder.com
> Because C# has no native SSH class, I am using SharpSSH. Sometimes, for > reasons I do not know, a Connect call will totally lock up the thread and [quoted text clipped - 21 lines] > > Is there anyone, especially a MSVP who can answer this? TheSilverHammer - 16 Jan 2008 16:57 GMT Here is a code snippit from an asynch callback that I am sure is one of the causes of my thread being locked up when the SharpSSH shell object dies.
ReadDataCallback = new AsyncCallback(OnReadData); shell.IO.BeginRead(RecvBuff, 0, RecvBuff.Length, ReadDataCallback, null);
while (true == shell.ShellOpened) { // See if we have data to send lock (SendBuffer) { if (0 != SendBuffer.Length) { shell.Write(SendBuffer); SendBuffer = string.Empty; } }
Thread.Sleep(50); }
I am not sure how to set the ReadDataCallback up so that it can recover from a hard lockup from the shell object. The method on the egghead cafe page doesn't seem to fit this very well.
> Here is an article with an approach that allows to make any method call > "time-outable": [quoted text clipped - 29 lines] > > > > Is there anyone, especially a MSVP who can answer this? Ben Voigt [C++ MVP] - 16 Jan 2008 15:37 GMT > Because C# has no native SSH class, I am using SharpSSH. Sometimes, for > reasons I do not know, a Connect call will totally lock up the thread and [quoted text clipped - 10 lines] > to > kill a thread. Is there some other way to kill off a thread? If you need to terminate a thread while it's running native code, especially inside a kernel call, you have no way of knowing what state it is modifying and keeping it coherent. You have to assume the whole process is corrupted.
The only safe way to forcibly end a failed thread like you have is to end the process containing it.
Do you have access to the socket handle for the connection? If you shutdown (non-gracefully by setting SO_DONTLINGER) the socket from a different thread then that will probably cause the stuck operation to complete immediately.
TheSilverHammer - 16 Jan 2008 16:43 GMT Wow, these are some fast replies. Normally I can go several days without one.
Anyway, I am using the SharpSSH class which I did not write, however, I do have the source code and I suppose I could dig through it to find the socket calls.
However I am not sure where all the deadlocks are happening in it so it would be very hard to catch all the problems. In some cases I think the connect succeeds and a lockup may occur in a receive data callback which runs in the main thread of that instance of my object (opposed to the SSH object).
Ill look at the egg-head café solution, but I am not sure how applicable it can be to all the instances of a lockup. For example, if an event handler in your class has been called by another object (IE: SharpSSH asynch callback for on data received) you can't wrap that in a method call that can time out can you?
> > Because C# has no native SSH class, I am using SharpSSH. Sometimes, for > > reasons I do not know, a Connect call will totally lock up the thread and [quoted text clipped - 21 lines] > (non-gracefully by setting SO_DONTLINGER) the socket from a different thread > then that will probably cause the stuck operation to complete immediately. TheSilverHammer - 16 Jan 2008 17:25 GMT > If you need to terminate a thread while it's running native code, especially > inside a kernel call, you have no way of knowing what state it is modifying > and keeping it coherent. You have to assume the whole process is corrupted. > > The only safe way to forcibly end a failed thread like you have is to end > the process containing it. Is there an unsafe way to kill it? I know it can be done, such tools like process explorer can let me select a single thread of my app and kill it.
Jeroen Mostert - 16 Jan 2008 19:26 GMT > Because C# has no native SSH class, I am using SharpSSH. Sometimes, for > reasons I do not know, a Connect call will totally lock up the thread and [quoted text clipped - 6 lines] > because it is locked up. This seems to be the only C# method I know of to > kill a thread. Is there some other way to kill off a thread? Yes, the unmanaged TerminateThread(). However, this doesn't work, in that it will kill off the thread, but leave approximately zero chance for your application to continue running successfully. You are guaranteed to corrupt internal state with this, especially since the CLR gets no chance to cleanly release resources associated with that thread. Seriously, don't do this. Your application will probably just deadlock later on the locks the terminated thread was holding, if it doesn't just crash on corrupted state.
Also, there's no obvious way to find the thread that's blocking. For one thing, you can kill off the thread corresponding to the Thread object, but this is not guaranteed to be the thread doing the actual blocking I/O, it might just be waiting on another thread. As a result, you've just leaked a thread that's still busy blocking, and worse, the actual I/O is still in progress, so the socket is unusable. You don't want to repeat this exercise, as it's a good way to run out of resources fast.
> A way you can simulate this yourself, is create any thread that connects to > a server where the connection takes some time, like 10 to 20 seconds. When > the thread is doing this connect (it will happen with even a simple TCP/IP > socket connect) issue a Thread.Abort() from another thread (the one that made > the Thread Object) and you will see that the ThreadAbortException will NOT be > thrown until the Connect call returns. Correct. The thread is blocking on I/O, in unmanaged code. You can't end it, and this is more or less by design. But you shouldn't be too dismayed, because Thread.Abort() is a bad idea for the same reasons TerminateThread() is. If a thread needs to end, it should be designed to have exit points where the application state is known, and it can check a flag or issue a wait on a user object at those points. Raising an exception in the middle of anywhere is a good way of corrupting global state.
> Another way you can do this is after the connect call is finished and you > start to talk to a server, if you are on a recive data call and the server > stops sending data but never closes the connection, it will block forever. > You will once again not be able to get Thread.Abort() to kill the locked up > thread. Same thing.
> Is there anyone, especially a MSVP who can answer this? I'm not an MSVP but I've seen this so many times in our codebase that it's not funny anymore. The one way to cancel pending I/O on a socket and unwedge threads blocking on that is to close the socket from another thread and handle the resulting exceptions. Nothing else will do, at least nothing that can be called reliable. Of course, this means tearing down the connection, but that's still a whole lot better than tearing down your process.
The other alternative, which is less straightforward but suits some designs better, is to make sure that threads never issue I/O which can take "forever". Almost every I/O call has a timeout parameter, and for those that don't there's always asynchronous I/O and ThreadPool.RegisterWaitForSingleObject(). When the call returns with a timeout, either poll, decide to wait some more or give up and close the socket, which you can then do from the same thread that owns the socket, simplifying error handling.
I understand it's not your code, but trust me: you'll want to rewrite it anyway, unless you can afford restarting your application every so often.
 Signature J.
TheSilverHammer - 16 Jan 2008 20:43 GMT > I'm not an MSVP but I've seen this so many times in our codebase that it's > not funny anymore. The one way to cancel pending I/O on a socket and unwedge [quoted text clipped - 14 lines] > I understand it's not your code, but trust me: you'll want to rewrite it > anyway, unless you can afford restarting your application every so often. Grrr.. Damm post thing asked me to login again and ate me post... Anyway...
The SharpSSH code base has a bunch of classes and would take a major effort to re-write. It is clear that it is unfinished from looking at it. I am not sure my company wants to fund me re-writing this code set.
However, the solution Peter Bromberg gave on his web site looks good except for what appears to me to be a big hole or leak. I do not understand how C# handles this, so maybe it is a non issue. The following is the code segment from his web site, I hope he doesn't mind me posting it:
public ArrayList DoWorkNeedsTimeout(ArrayList alin, int secondsToWait) {
ArrayList alOut = new ArrayList();
//Create an instance of our delegate, pointing to the helper method:
DoWorkNeedsTimeoutDelegate deleg = new DoWorkNeedsTimeoutDelegate(DoWorkWithTimeout);
// Call BeginInvoke on delegate. // Note on last two parameters of Delegate BeginInvoke Method: // 1) callback: not used here, we can pass null // 2) state: not used, pass an instance of object in the required parameter location // Invoke the delegate passing the parameters and get the IAsyncResult object in "ar":
IAsyncResult ar = deleg.BeginInvoke(alin, secondsToWait, null, new object());
// if the WaitOne method times out before we get a result, it will be false: if (!ar.AsyncWaitHandle.WaitOne(5000, false)) {
// handle timeout logging / notification here - Syslog, Database, Email - whatever you need alOut.Add("TIMED OUT!"); }
else // we didn't time out: { // get the result of the method call here alOut = deleg.EndInvoke(ar); }
return alOut;
}
What he is doing is making a delegate to call BeginInvoke with and then using the IAsyncResult to wait for a time peroid. If the time peroid expires, then his thread continues on. If it doesn't expire, he calls EndInvoke(). This looks good except for the issue of dealing with a truely locked-up thread.
BeginInvoke() uses a thread from the thread-pool right? So what happens if that thread never returns so you can call End-Invoke? Is it gone from the thread-pool forever? If you repeat this look 1000s of times and even if 1% of the time you get a locked up thread, won't you run out of threads?
The only way this can work indefinitly, which it may, is if the Garbage collector will reclaim the thread once the delegate and other related objects are out of scope. Is this how it works?
Jeroen Mostert - 16 Jan 2008 22:26 GMT >> I'm not an MSVP but I've seen this so many times in our codebase that it's >> not funny anymore. The one way to cancel pending I/O on a socket and unwedge [quoted text clipped - 20 lines] > to re-write. It is clear that it is unfinished from looking at it. I am > not sure my company wants to fund me re-writing this code set. SSH is widely implemented, though, and you will probably want a proven implementation, given the security concerns. Delegating to a good unmanaged library (if the interface isn't too horrible to P/Invoke to) may be a better option. You can also consider using an ActiveX control: there's good support for this in .NET, and standalone components were all the rage in the VB days for a reason. Alternatively, use a standalone SSH application and pull its strings from the managed application, which is an ugly but venerable hack. Last but certainly not least -- write it in another language where you do have a mature library at your beck and call.
.NET still suffers from the "everything old is new again" syndrome where everyone is reinventing the wheel in the new languages, which under circumstances can be a big waste of time and money. Just because you're now using C# doesn't mean all your libraries have to be. I see my colleagues falling into the same trap; one of them tried to "leverage" a Java library by automatically converting it to C# and then ignoring the warnings. The results were, as you can imagine, not pretty, and guess who got to fix the crashes? Meanwhile, the Java applications continued to run just fine with their "old" library and "legacy" code.
> However, the solution Peter Bromberg gave on his web site looks good except > for what appears to me to be a big hole or leak. I do not understand how C# [quoted text clipped - 38 lines] > alOut = deleg.EndInvoke(ar); > } This is wrong. Every call to .BeginInvoke() must have a corresponding call to .EndInvoke(), to free up any resources that .BeginInvoke() set up. This is irrespective of whether you've happened to hit a timeout waiting on the async handle. People violate this rule all over the place, though, because it seems to work, but even when it actually does work (because .BeginInvoke() happens not to claim any additional resources) it's a bad habit to get into. Don't believe me, believe the MSDN: http://msdn2.microsoft.com/en-us/library/2e08f6yc(VS.80).aspx
Of course, our hands are forced here because .EndInvoke() would block until the underlying method actually completed, but this just demonstrates why this can't actually work. You're leaving a delegate call up in the air, but forgetting about it isn't going to make it go away. (In this case, we can easily fix things by passing a callback to the .BeginInvoke() that will call .EndInvoke(), but it's all irrelevant anyway if the delegate never completes.)
> return alOut; > [quoted text clipped - 5 lines] > EndInvoke(). This looks good except for the issue of dealing with a truely > locked-up thread. Yes, exactly. Wrapping everything in another asynchronous invocation does *nothing* for the blocking problem. What you create here is just a wrapper that can indeed be abandoned at will, but this doesn't cancel the underlying blocking method, it just tosses aside the delegate invocation.
> BeginInvoke() uses a thread from the thread-pool right? Yes.
> So what happens if that thread never returns so you can call End-Invoke? You can always call .EndInvoke(). It will just block until the delegate completes.
> Is it gone from the thread-pool forever? Well, it's still a part of the thread pool, it just never becomes available for other tasks again. So the number of available TP threads will steadily decrease.
> If you repeat this look 1000s of times and even if 1% of the time you get > a locked up thread, won't you run out of threads? That's exactly what will happen, and it's easy to test. Use the above code with a delegate that just does "for (;;) Thread.Sleep(10);" and observe.
This approach is only useful if you don't care that you can't abort an action that goes on longer than your timeout, but you just need to log when it does. It doesn't give you any magical ability to abort the action. The action still needs to complete on its own eventually if you don't want to run out of resources.
> The only way this can work indefinitly, which it may, is if the Garbage > collector will reclaim the thread once the delegate and other related objects > are out of scope. Is this how it works? No. If it worked that way, you could never have background threads unless they were referenced by other threads. In a sense, a Thread object is always "referenced" by the underlying thread. They're not collected until the underlying thread exits, and if the underlying thread never exits, well, that's too bad.
 Signature J.
TheSilverHammer - 17 Jan 2008 18:33 GMT So the basic lesson here is that a locked up thread is unrecoverable. The only thing you can do about it is abandon the thread and move on. If you have an application which is supposed to run persistently for days or weeks at a time, it will have to be restarted to reclaim the resources.
In my case, unless I do major repairs on the SharpSSH class, I will have the occasional unrecoverable threads.
This kind of stinks. I wonder if there was a way that MS could write a thread that could be terminated safely. If you can do that with a process, why can't you do it with a thread? Is there a way to create a process as a thread that can be killed?
Peter Duniho - 17 Jan 2008 19:09 GMT > [...] > In my case, unless I do major repairs on the SharpSSH class, I will have > the > occasional unrecoverable threads. Yup. One of the risks of using third-party code is that if the code sucks (whether because it's poorly designed or just a work in progress), there's not much you can do about it. At least in this case, it sounds like you _could_ try to fix the library (I don't know anything about the library, so I'm just taking that from your comments).
> This kind of stinks. I wonder if there was a way that MS could write a > thread that could be terminated safely. If you can do that with a > process, > why can't you do it with a thread? You can't really do it with a process either.
This isn't something that Microsoft can really solve. The lack of safety has to do with what the code executing in the thread or process is doing, and in particular the inability for someone outside the code to know for sure what that is. It is possible to write code that, if interrupted unexpectedly, leaves things in an indeterminate state.
If you are the one writing the code executing on a thread, there are some situations in which you could know that aborting the code is safe. But if you're the one writing the code, there's no need to do so. You can just design the code correctly, so that it's abortable in a well-defined way instead.
If you're not the one writing the code, then you don't know whether the nature of the code is such that it's safe to abort at some arbitrary point of execution. Thus, it's not safe to do. But there's not really any practical way for Microsoft to change that. It's not about how the OS manages the thread, it's about the fact that code executing in a thread could be doing _anything_.
Pete
Jeroen Mostert - 17 Jan 2008 19:36 GMT > So the basic lesson here is that a locked up thread is unrecoverable. The > only thing you can do about it is abandon the thread and move on. Well, I'd phrase it differently: threads must never lock up *because* there's no acceptable way to deal with them. If you've got a thread that could block forever, you've got a bug, simple as that. You have to get an answer if you ask "so what guarantees that this wait here will be satisfied eventually?" and if the answer is "the kindness of strangers", you lose.
> If you have an application which is supposed to run persistently for days > or weeks at a time, it will have to be restarted to reclaim the > resources. And that's assuming the application will clean up everything when it stops. The OS will guarantee that most resources are released, but that's not the same thing as exiting cleanly (an open file will be closed, for example, but what's *in* the file when it is?)
> This kind of stinks. I wonder if there was a way that MS could write a > thread that could be terminated safely. If you can do that with a process, > why can't you do it with a thread? Is there a way to create a process as a > thread that can be killed? You can't terminate a process safely either! The keyword here is "safely". The best thing that happens when you kill off a process is that the OS will reclaim the resources it associated with that process -- forcibly. For memory, this doesn't matter; for a socket, this means a connection reset; for a file, it's probably data loss. This is nothing to get enthusiastic about, even if it's a good step up from crashing the computer.
Terminating a thread means your application state is hosed. There's nothing the OS can do to make this "safe", since it knows diddly about your application's internal state. It can't even track OS resources for every thread to release them, because there's no notion of ownership beyond the process. Threads share the process state, including any resources, so just releasing anything a terminated thread allocated would be wrong.
 Signature J.
TheSilverHammer - 17 Jan 2008 22:58 GMT If they can do it with processes, why can't they do it with threads?
I am sure they can't guarantee that everything will be fine if my code doesn't anticipate a resources disappearing, but if I do, I should be able to do it safely.
For example:
I have a MyThread and then I have the thread procedure which opens a bunch of files, sockets, and all that. If MyThread is killed, the OS can recover all that stuff. If MyApplication is the one calling the ThreadKill, then windows should say, "OK, well you made it, so if you want to kill it, you must know what you are doing."
If in my thread I do something like:
MyList = new List<string>;
And then when I kill the thread, windows says the List was created in the thread and therefor will be nuked, it is my problem. I could write my app in such a way that I know where stuff was allocated so that I could expect MyList to go away. The CLR could go as far as making any references to MyList null or just throwing an exception of I try and use it (besides assigning it a new value).
All a thread has to be is a bag of 'stuff' and if it goes bad, toss it all out, and as long as there are 'rules' which I can expect to follow, I could deal with it. They only need one simple rule: If it was opened, allocated, created in a thread, when the thread is killed (not exits) then it would be Closed, freed, destroyed, etc...
Having said all that, I understand the sentiment about writing good code and how none of this is necessary. Unfortunately, that is a 'if the world were perfect...' point of view in an imperfect world.
In this particular case, I need SSH, which for some reason Microsoft doesn't seem to see fit as being a core protocol for C# (or .NET in general). I suggested this on the community sites, and got a 'resolved' and 'won't fix' with no reasons supplied. The only valid reason I can think of is because SSH support is in the works, however after much googling I can't find any hint about official MS SSH support. With their big security push, and SSH being a cornerstone in network security management, this makes absolutely no sense. Maybe they are waiting until the security crowd starts beating them with a stick and hail it as yet another reason to use Linux. How long would it take a few of MS well trained developers to put out a great SSH suite for .NET? Ignoring the bureaucracy, it should only take a few actual weeks of development time.
This leaves me with a choice of writing my own implementation or using some other library. My employer is not going to want me to spend several weeks to write my own or fix this SharpSSH library. Personally, I wouldn't mind, but really, I have a lot to do.
Considering we are living in an imperfect world, we should try to be accommodating. Yes, the right thing is to NOT screw something up, but it WILL happen. The proper thing isn't to stand around and talk about how it should have been done right, and if it was all your problems would go away.
Microsoft's job on this kind of issue is to make life as a programmer as easy as possible. I will grant you that compared to OS X and Linux stuff, Microsoft is a rock-star, but in a more absolute sense there is a lot they could do much, much better.
For example, the current issue, Locked up threads. Granted a good program will never have this problem, but a realistic response outlook would be that we have to deal with 'bad' things. A better approach would be for MS to figure out a way to create a thread and provide some kind of emergency recovery system. You could make it a special kind of thread used to run unsafe stuff and the architecture will save you from what is in the thread if worst comes to worst. It would be like a container for uranium. You have to use it, and you hope nothing goes wrong, but if it does, it is contained.
Another way (not to drag this rant any longer) to look at this is to look back in the days where there was no memory protection for applications. One rogue application could bring the entire system down. To take today's outlook on threads and apply it to that, it would be the same thing as simply saying, "Clearly the solution to rogue applications is to not run rogue applications." Ignoring the fact that AwsomeApp.exe is the ONLY app that does what you need.
No, I do not expect anyone here to be able to do anything about this. I do not know, and would doubt, that any MS big-wigs (ones with enough power to actually do something) read this kind of stuff and would care enough to do anything about it.
Having said all that, the squeaky wheel gets the kick, so griping about issues like this might instill even more griping until "The powers that be" at MS can't stand it anymore and decide to do something.
Anyway, to all who have helped me, thanks. I would like suggest to Peter Bromberg that he put a warning for the solution he purposed, or in fact remove it. he solution leaves bound up threads and resources, and if an application repeats that more then 50 times, it will cease working until it is restarted. It is OK for a program that isn't going to iterate over that more then a few times, but it is a death trap for anything that does.
Peter Duniho - 17 Jan 2008 23:42 GMT > If they can do it with processes, why can't they do it with threads? Can do what with processes? We've already explained that you can't safely terminate a process any more than you can safely abort a thread.
> I am sure they can't guarantee that everything will be fine if my code > doesn't anticipate a resources disappearing, but if I do, I should be > able to > do it safely. It's not an issue of resources "disappearing". It's an issue of them being left in an inconsistent state.
There is no way for the _operating system_ to ensure that things are left in an inconsistent state. Implementors of various data structures can do things to make sure they are always in a consistent state (e.g. see "journaled" or "transaction-based"), but that's up to the implementor. The OS has no way to do this (though it might provide APIs to help an implementor do it).
> For example: > [quoted text clipped - 3 lines] > recover > all that stuff. No, it can't. All data within a process is owned by the _process_, unless it's been specifically marked as thread data (*). The OS has no way to know whether killing a thread allows that data to be cleaned up or not.
(*) (I'm not sure .NET supports this or not, but is supported in the unmanaged Windows API...I'm seeing a Thread.AllocateDataSlot() method, and I suspect this addresses the same issue in managed code. In any case, note that it only addresses specific thread-local storage, not the OS objects that might be referenced by that storage, as those are still per-process and cannot be released with the thread terminates).
But even if it did have a way to know what data could be cleaned up, _that's not the problem_. Cleaning things up is the least of the worries. It's the fact that software _does_ stuff, and if it's interrupted in the middle of _doing_ that stuff, whatever data the software is operating on could be in an inconsistent state.
Most of your rant seems to be about this question of cleaning up, but that's not the main problem. That's not what makes killing threads or processes unsafe, and coming up with a paradigm in which you can ensure things are cleaned up would _not_ make killing threads or processes a safe operation.
As far as your specific problem goes, there's no point in complaining that SSH isn't supported in .NET (assuming it's not...I know .NET does have a lot of crypto stuff in it, and it's possible that you could easily write an SSH implementation just by combining that with the usual network i/o stuff). .NET can't possibly implement _everything_, even as with each iteration it does support more and more.
If a specific library isn't doing what you need or want, you can either find a different library or write it yourself. Programmers all over the world make these kinds of decisions every day, and it's just not a big deal. Note that you are not limited to using a managed code library. With p/invoke you should be able to use pretty much whatever library you find useful.
I will point out that your assertion that Microsoft could publish an SSH library "in a few weeks time" is absurd. No reputable software publishing company does _anything_ "in a few weeks time". It would take _way_ more than a few weeks just to properly _test_ such a library, never mind implement it correctly. Granted I have very little specific knowledge of SSH, but I would guess that it would take at least three staff members (programmer, tester, and a program manager to manage the specification for the feature) something like 6-12 months, for a potential cost of up to three man-years.
Even if it _were_ just a few weeks worth of work, it boggles my mind that you would on the one hand say that Microsoft should do this work, and on the other hand write "My employer is not going to want me to spend several weeks to write my own". Don't you think Microsoft already has their own things they are trying to get done? Surely if this is an important enough feature for your need to justify them implementing it, it's important enough to justify _you_ doing whatever work is needed on your own to get it into your product.
Maybe it will get into .NET eventually, maybe it won't. But making fanciful claims about how easy it would be to implement doesn't help your case any. If it's really that easy, write it yourself.
And please keep in mind that designing and implementing an operating system is a lot harder than you seem to think it is. I think it's safe to say that if dealing with hung threads were really as easy as you claim it is, Windows and every other OS would already do it. But there's not a single OS I can think of off the top of my head that can allow a thread or process to be safely terminated without the risk of causing data integrity problems.
Pete
Ben Voigt [C++ MVP] - 18 Jan 2008 13:10 GMT >> If they can do it with processes, why can't they do it with threads? > > Can do what with processes? We've already explained that you can't safely > terminate a process any more than you can safely abort a thread. Sure you can. Ok, maybe not an arbitrary process, but it's fairly easy (depending on what resources are required by your requirements) to design a process that can be terminated at any point in time. It's even easier to manage exiting your own process, even with hung threads. Theoretically you can also create a thread that can be safely terminated, but... not with .NET. .NET holds internal state and accesses it willy-nilly from any threads in a way that's threadsafe but not abort safe. However, .NET doesn't implement any external state on its own, only what you ask it to, so you can manage your external resources in such a way that it's ok for the process to be interrupted (for example, instead of writing data files that could be left inconsistent, store your data in an ACID database using transactions).
>> I am sure they can't guarantee that everything will be fine if my code >> doesn't anticipate a resources disappearing, but if I do, I should be [quoted text clipped - 10 lines] > The OS has no way to do this (though it might provide APIs to help an > implementor do it). Yup, and the problem is that the .NET implementation uses hidden process-local resources without doing any of this, so no matter what code you tag on top, calling TerminateThread is gonna crash the process.
>> For example: >> [quoted text clipped - 26 lines] > things are cleaned up would _not_ make killing threads or processes a safe > operation. One reasonable approach, as long as this SharpSSH library doesn't use any external resources except sockets, would be to put that component in a separate process, communicate back and forth with Remoting, and provide at least one call that causes said process to free any shared resources and then call ExitProcess (.NET Application.Exit?) to free the hung thread(s).
Willy Denoyette [MVP] - 18 Jan 2008 14:05 GMT >>> If they can do it with processes, why can't they do it with threads? >> [quoted text clipped - 7 lines] > you can also create a thread that can be safely terminated, but... not > with .NET. Terminating a thread using TerminateThread is safe as long as you know exactly what the thread is doing at the moment the OS kills the thread, this is exactly what's impossible to know when calling into arbitrary code. Whenever your thread runs arbitrary code (third party or not) you can't safely terminate the thread, because you don't have an idea what the thread is doing, this has nothing to do with .NET, this is about Windows. Run a simple native code program and terminate a thread (using TerminateThread Win32 API) while he's allocating memory from the heap, all successive heap alloc's or heap releases from other threads will now block forever. Or terminate a thread while he's executing in a critical section, this CS will never get released (well, actually when the process terminates), another thread that tries to enter the CS will deadlock....
Willy.
Ben Voigt [C++ MVP] - 18 Jan 2008 16:07 GMT >>>> If they can do it with processes, why can't they do it with threads? >>> [quoted text clipped - 19 lines] > successive heap alloc's or heap releases from other threads will now block > forever. Only if it's using a shared heap...
> Or terminate a thread while he's executing in a critical section, this CS > will never get released (well, actually when the process terminates), > another thread that tries to enter the CS will deadlock.... But you can use a kernel mutex instead, then it'll be marked as abandoned and you can recover.
My point was that .NET in particular does a bunch of stuff that is not abort safe. This is far from saying that .NET is the only library that isn't abort safe, but there is nothing inherently unsafe about Win32 itself.
> Willy. Willy Denoyette [MVP] - 18 Jan 2008 16:46 GMT >>>>> If they can do it with processes, why can't they do it with threads? >>>> [quoted text clipped - 21 lines] > > Only if it's using a shared heap... I'l talking about real world applications (.NET or not), calling into arbitray code, how would a caller know what allocator is getting used? I'm talking about Windows applications calling into library code, that allocates from the process heap, CRT heap or from the COM heap, that is, allocates from the heap manager (ntdll). You can't safely kill threads that are executing in these libraries.
>> Or terminate a thread while he's executing in a critical section, this CS >> will never get released (well, actually when the process terminates), >> another thread that tries to enter the CS will deadlock.... > > But you can use a kernel mutex instead, then it'll be marked as abandoned > and you can recover. Again, I'm calling into arbitrary code, say I'm calling into Winsock library like the OP is doing.... and this library is using CS all the way down.
> My point was that .NET in particular does a bunch of stuff that is not > abort safe. This is far from saying that .NET is the only library that > isn't abort safe, but there is nothing inherently unsafe about Win32 > itself. What do you call Win32? A thread that executes arbitrary (native code libraries, whatever) code cannot safely be killed (using TerminateThread ) , that's why the CLR refuses to kill a thread (using TerminateThread ) that currently runs in "unmanaged" code, the CLR waits for the thread to return into managed to checks whether a thread abort has been issued, gracefully aborting the thread when it's the case (not using TerminateThread !). Again, it's unsafe to call Win32's TerminateThread, unless you know it's not.
Willy.
Ben Voigt [C++ MVP] - 18 Jan 2008 19:10 GMT >>>>>> If they can do it with processes, why can't they do it with threads? >>>>> [quoted text clipped - 30 lines] > allocates from the heap manager (ntdll). You can't safely kill threads > that are executing in these libraries. I wasn't talking about arbitrary code. I was challenging your statement "this has nothing to do with .NET, this is about Windows". It is not possible to write abort safe code in .NET. Of course not all code written without .NET is abort safe, but the act of using .NET prevents being abort safe, whereas the act of using Windows does not. So this isn't exclusive to .NET, but it isn't applicable to Windows in general like it is to .NET.
BTW you can use the heap manager (ntdll) with private heaps. You could not safely free such a heap after a thread was aborted while allocating from it, but it would not block other threads either.
Willy Denoyette [MVP] - 18 Jan 2008 20:52 GMT >>>>>>> If they can do it with processes, why can't they do it with threads? >>>>>> [quoted text clipped - 41 lines] > not safely free such a heap after a thread was aborted while allocating > from it, but it would not block other threads either. I was talking about arbitrary code, and here I mean - calling arbitrary code from whatever environment you see fit( .NET or other) on Windows, that's why I keep saying that it has nothing to do with .NET.
Whenever you call *TerminateThread* to kill a thread that actually runs code you didn't *completely* implement yourself, you are in danger. That's why MSDN says that "TerminateThread" is dangerous API, you should never call it unless you know exactly what the target thread is actually doing, which is exactly the point of this whole thread, you *never* know what the thread is doing when running arbitrary code. Some say that this service (invoked by TerminateThread) should never have been exposed to user code)
Note also, that "private" based on the heap manager (ntdll) have the same issue, the heap manager protects it's internal structures with critical sections when you are allocating/de-allocating, you don't control the heap manager don't you?. Killing a thread when the heap manager runs in a critical section, will most likely corrupt the heap and deadlock whenever another thread tries to allocate/de-allocate from the same heap.
Simply things like statics and global variables, TLS, FLS etc... are allocated on the heap, creating a thread (from kernel32) in windows calls the heap manager (ntdll), several hundred times, to allocate from the process heap, dynamic module loading/unloading allocate from the process heap, kernel32 and ntdll are the first modules loaded by the OS loader when you create a process, no single Win32 process can live without them. You aren't going to rewrite all these Win32 DLL's and runtime libraries, so that they use your own heap manager, don't you?
You could build your own private heap manager on top of the VM Manager (like the CLR's memory allocator) , but just like the "Heap Manager " you'll have to protect your internal structures with a CS, if you want to handle allocations from multiple threads. So, you are back at square zero, nor will it solve the other possible issues related to TerminateThread.
Willy.
Jeroen Mostert - 18 Jan 2008 00:51 GMT > If they can do it with processes, why can't they do it with threads? It's more a case of "THREADS DON'T WORK THAT WAY!" rather than "can't be done".
A thread's supposed to be lightweight; a simple means of achieving multiprocessing. If you follow the reliability angle through and add resource tracking and whatnot you end up with a thread that's basically just as fat as a process. A thread's not supposed to be isolated from anything; that's not their purpose.
What you're looking for actually has less to do with threads and more with isolating components (which may or may not be using separate threads) from each other's failures. But here "failure" has to be defined so generally as to make any form of isolation lower than process level well nigh useless.
> If in my thread I do something like: > [quoted text clipped - 6 lines] > MyList null or just throwing an exception of I try and use it (besides > assigning it a new value). But what's the point?
If you are in a position to terminate the thread properly, you're also in a position to know what resources should be thrown away. So why don't you do that, instead of demanding that the CLR save your bacon at a considerable (and in 99% of the cases, unnecessary) overhead?
Now, if you're using someone else's component, you don't know what resources they're squirreling away, so you could say that's an argument in favor of CLR tracking. But hang on a moment -- how do you know what threads the misbehaving component is using, and how do you select the one that's blocking in a way you don't want it to for termination? If you can dig deep enough to figure that out, can't you also figure out what resources it's abusing and dispose of them?
Indefinitely blocking threads are such a huge pain in the a.s because recognizing when a thread is never going to do something meaningful again is in theory equivalent to the halting problem and in practice not actually that much easier. It's like asking the OS for an infinite loop detector. It could try, but it'd run into unsolvable cases pretty soon.
> Having said all that, I understand the sentiment about writing good code and > how none of this is necessary. Unfortunately, that is a 'if the world were > perfect...' point of view in an imperfect world. If the world were perfect, the operating system and the runtime would join hands to ensure that nothing you ever did could cause state corruption, and every error condition was recoverable. But since that's a theoretical impossibility, they have to settle somewhere before that. Threads were never meant to be an aid in this. They're actually more like aggravating factors.
The process is the one edge where they can reasonably isolate the rest of the system from most of the impact of failure. And even that fails when processes are cooperating to get something done. Try killing off "csrss.exe" sometime. If you succeed, it's rebooting time, baby. Your other processes will be just as doomed.
> In this particular case, I need SSH, which for some reason Microsoft doesn't > seem to see fit as being a core protocol for C# (or .NET in general). Hey, they have to give third-party developers *some* chance at a living, don't they? :-)
> I suggested this on the community sites, and got a 'resolved' and 'won't > fix' with no reasons supplied. The only valid reason I can think of is > because SSH support is in the works, however after much googling I can't > find any hint about official MS SSH support. With their big security > push, and SSH being a cornerstone in network security management, this > makes absolutely no sense. Windows has no native (read: Microsoft-supplied) SSH services. That's the most obvious reason I can think of. .NET heavily focuses on making all of Windows available through the managed API, but it doesn't go out of its way to support stuff that isn't ubiquitous on Windows already. And SSH isn't ubiquitous on Windows -- RDP over VPN is much more common. I say this without offering judgement on how things are or should be.
> Maybe they are waiting until the security crowd starts beating them with > a stick and hail it as yet another reason to use Linux. How long would it > take a few of MS well trained developers to put out a great SSH suite for > .NET? Ignoring the bureaucracy, it should only take a few actual weeks > of development time. It's not a case of "MS has so much resources, they could do this". Because every developer and his janitor has a feature they clamor for this way ("why isn't this just in the base classes so I don't have to think about it anymore?") It's a big win for the developers, but it has to be a win for Microsoft too. If there's not enough business incentive for Microsoft to develop, distribute and support it then they won't do it. Simple as that.
It's weird how in the Unix world everyone cheers when a third-party developer brings out Yet Another implementation of a well-known protocol, but how in the Windows world the developers are looking over at Microsoft expectantly to build everything they need and give it to them. It's true that Microsoft plays a big role in encouraging this attitude, but still.
> This leaves me with a choice of writing my own implementation or using some > other library. My employer is not going to want me to spend several weeks > to write my own or fix this SharpSSH library. Personally, I wouldn't mind, > but really, I have a lot to do. I just googled ".NET SSH". You don't want to know how many hits I got (and some of them relevant, even!) What made SharpSSH the monopolist? What about my suggestion of using an ActiveX control? Is it just a case of not wanting or being able to spend any money? You get what you pay for...
If you're waiting for MS to turn into a charity and do the things your company doesn't have the time or money for, then don't forget to pick up a lottery ticket every day, because you're sure to win in the meantime. Say hi to your competitors for me.
> Considering we are living in an imperfect world, we should try to be > accommodating. Yes, the right thing is to NOT screw something up, but it > WILL happen. The proper thing isn't to stand around and talk about how it > should have been done right, and if it was all your problems would go away. You're absolutely right. The proper thing is not to stand around and talk about it but to *do* things right. There has to be a point, somewhere, where you have to stop talking about general stopgaps and have to get down to where the actual problem is, because stopgaps only go so far. The OS can't fix problems with hung threads for you. It already allows you to kill them off Completely Dead through TerminateThread() if you really think you know what you're doing. (You probably don't, which is why it's so dangerous.) That is not fixing the problems, though. And releasing all resources we somehow deem "belonging" to that thread still isn't fixing the problems.
Tacking on a tracking system for releasing resources is just not a cost-effective tradeoff. For most applications, the problem will *not* be in releasing the resources, it's in the fact that whatever they're doing is going completely wrong. Some applications might just be able to continue without any problem if the particular action the thread was working on fails spectacularly, but most will not. They're more likely to grind to a halt. If you're killing off a thread, you'll probably be killing off your process soon.
> Microsoft's job on this kind of issue is to make life as a programmer as > easy as possible. I will grant you that compared to OS X and Linux stuff, > Microsoft is a rock-star, but in a more absolute sense there is a lot they > could do much, much better. I really have to disagree, at least on this particular issue. You're asking for the impossible. They can give you the Big Red Emergency Button, and it's already present in the form of .Abort, and if that doesn't work TerminateThread(). But you want that button to magically keep your application in serviceable condition as it's killing off an integral part of it, and that can't be done.
> For example, the current issue, Locked up threads. Granted a good program > will never have this problem, but a realistic response outlook would be that > we have to deal with 'bad' things. A better approach would be for MS to > figure out a way to create a thread and provide some kind of emergency > recovery system. TerminateThread() *will* get rid of the thread. But the only one who can "recover" is you. And if the component that failed you is a black box to you, you're just as sunk as the OS would be.
> You could make it a special kind of thread used to run unsafe stuff and > the architecture will save you from what is in the thread if worst comes > to worst. It would be like a container for uranium. You have to use > it, and you hope nothing goes wrong, but if it does, it is contained. Uranium is easy. That's just radiation. Threads can do *anything*. And most of the time they're *cooperating* with other threads to get things done. Good luck automagically containing things.
> Another way (not to drag this rant any longer) to look at this is to look > back in the days where there was no memory protection for applications. [quoted text clipped - 3 lines] > rogue applications." Ignoring the fact that AwsomeApp.exe is the ONLY app > that does what you need. See above for the whole "the buck stops somewhere" point. If you want this protection (and it's indeed a good thing the OS has this), then by all means, isolate the failing component in a process. The OS can guarantee that it will at least keep your main process safe from wrongdoings as far as internal state goes (the failing app might still have corrupted your drive or something annoying like that, but you stand a good chance).
But that's the thing: that's what *processes* are for. Processes only started working that way when the OS said they did: before that, processes could exchange memory directly, as ugly and error-prone as that was. Then the OS said: "No, stop that -- processes are isolated, and if you want to cooperate, do it explicitly". But threads are not for isolation and they never were, they're for integration! They're "lightweight processes", where "lightweight" means "fast because I do the least amount of work possible to manage them, they're all yours".
Your argument simply doesn't hold water for threads: it's impossible for thread X to be "the only thread that does what you need". The thread is just a way to achieve parallel execution! It's not some sort of isolation box for computations that aren't under your control. What you want is to isolate *components*, not threads. Unfortunately, most components can't meaningfully be isolated, since they have to be able to do anything.
 Signature J.
TheSilverHammer - 18 Jan 2008 14:46 GMT Maybe you are all right about making a safe thread that can be killed the way processes can be to recover resources isn't possible. If you are right, I have no idea why beyond it is 'logistically' impossible, not actually impossible.
BTW you can't use ThreadKill() to kill a C# thread (be it from the thread pool or Thread class) because there is no way to match the Thread ID with the OS Thread ID. The documentation also says that a thread created with the Thread Class might be used for multiple things behind the scenes.
So I have been putting as much Duct Tape on SharpSSH as I can and hoping to catch the lockups, which is very hard since I can't reproduce them easily. As far as googling SSH for .NET, I am sure you did find quite a few solutions. Expensive, commercial solutions. Maybe large companies do not have an issue paying for such things, but the smaller ones I work at are very cheap. Do you know how long it took me to get them to upgrade just TWO machines from VS 6.0 to VS 2005? It was like a 2 year long campaign of pestering. Eventually, with Vista on the horizon, I had invent an unresolvable problem that forced the issue. So yeah, there are other C# SSH solutions. Really the point wasn't so much about that, but locked threads.
The simple answer with regard to recovering a locked thread is: You can't. Not, "You can't safely". No, you simply can't. End of Story. Game Over. Thank you for playing.
Clearly the big issue is / was figuring out why a thread was locking. Even that was very difficult because the lockup would only occur sometime at night when no one was around, and in the morning when my App was seized up, even Dev Studio could not 'break' the App so I could see what was going on with the threads. If I did try and 'break' it, Dev Studio would lock up until I used Task Manager to kill my app, and then Dev studio would say it could not interrupt the Application.
Whomever suggest I use another thread to close the Shell object, I would like to thank. That works although it causes a lot of exceptions and crashes. At least I have a working point and the thread is no longer seized up.
Ben Voigt [C++ MVP] - 18 Jan 2008 16:11 GMT > Maybe you are all right about making a safe thread that can be killed the > way [quoted text clipped - 30 lines] > Over. > Thank you for playing. Ah, well, you asked a slightly different question.
How do you kill a locked thread? You can't safely. How do you recover a locked thread in .NET? You can't, period.
> Clearly the big issue is / was figuring out why a thread was locking. > Even [quoted text clipped - 13 lines] > seized > up. You're welcome. Win32 APIs are designed not to force you into a totally unrecoverable state.
I suspect if you had used "native-only debugging" you might have had less problems attaching with the debugger.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|