I'm trying to run multiple independent processes in separate AppDomains.
Ideally, these processes should be restartable after failure (which
hopefully occurs very rarely). I had assumed that, since AppDomains are
supposed to provide isolation, an unhandled exception in an AppDomain would
cause an AppDomain unload and nothing more. Unfortunately, in the default
host, the entire process is terminated instead.
Hosting the CLR gives you the ability to effectively turn off process
termination on unhandled exceptions, using
ICLRPolicyManager::SetUnhandledExceptionPolicy, and even escalate AppDomain
unloads that fail. Unfortunately (this will become a recurring theme, I'm
afraid) there is not actually a way for the process hosting the CLR to
register for a notification that an unhandled exception has occurred.
You can, however, register a managed AppDomainManager type with
ICLRControl::SetAppDomainManagerType. A instance of this type will be
created for every new AppDomain, and it can greatly customize initialization
of the AppDomain. In particular, there will be an AppDomainManager for the
default AppDomain, and you can use this to register an unhandled exception
event handler. Because unhandled exceptions in other AppDomains are also
reported in the default AppDomain, it seems this is the perfect way to
detect unhandled exceptions. Unfortunately, poorly written code in other
AppDomains can prevent this from ever happening.
I've seen the following scenario. When an unhandled exception occurs in an
AppDomain, the (managed) thread that caused the exception actually raises
the UnhandledException event on the AppDomain containing the code that
raised the exception. When and only when the event handler in that AppDomain
terminates will the handler in the default AppDomain be called -- by the
same thread, migrating between domains.
In other words: if a handler for the UnhandledException event ever blocks,
the application keeps running and the unhandled exception is effectively
swallowed. Obviously, it's not acceptable for a reliable host to be foiled
this way. Nor is it acceptable to make it impossible to register
UnhandledException event handlers in the failing domain (which could be
achieved by denying the ControlAppDomain security permission), because the
code should still have the ability to try and clean up what it can. It just
shouldn't be allowed to tie up unhandled exception signaling to the default
AppDomain indefinitely, because this effectively prevents the host from ever
unloading the failing AppDomain as well.
Anyone have some ideas as to how to fix this, or suggestions for a better
approach? (A colleague suggested abandoning AppDomains altogether and just
spawning new processes deftly controlled and monitored by IPC, but I don't
want to go that way unless I absolutely have to, since resource use of all
these processes is a major problem right now.)
S.
Ben Voigt [C++ MVP] - 06 Sep 2007 02:54 GMT
> I'm trying to run multiple independent processes in separate AppDomains.
> Ideally, these processes should be restartable after failure (which
[quoted text clipped - 45 lines]
> want to go that way unless I absolutely have to, since resource use of all
> these processes is a major problem right now.)
Do you have control over the hosted applications' code?
You can deny ControlAppDomain permission, then provide an alternate event
which you fire after handling the true .NET framework UnhandledException
event and starting a timer to unfriendly termination.
Perhaps using an AppDomainInitializer to put your handler at the beginning
of the chain for UnhandledException would be enough.
However, I'm not convinced any of this is going to do you any good. Even
blocking UnhandledException handlers won't help at all, because a normal
stack-frame exception handler could block forever, or the appdomain could
lock up outside of any exception handler.
Implement a watchdog instead and kill the appdomain when its heartbeat
stops.
> S.
Ben Voigt [C++ MVP] - 06 Sep 2007 03:02 GMT
>> I'm trying to run multiple independent processes in separate AppDomains.
>> Ideally, these processes should be restartable after failure (which
[quoted text clipped - 62 lines]
> Implement a watchdog instead and kill the appdomain when its heartbeat
> stops.
Also, I don't think that you can have host multiple instances of the CLR
inside a single process. This means that your watchdog must have a thread
with 100% native code so that it can't be be blocked by the .NET garbage
collector. Otherwise a finalizer that doesn't exit could mean big trouble.
>> S.