Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / May 2008

Tip: Looking for answers? Try searching our database.

Socket BeginSend and disconnections

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Adam Clauss - 23 May 2008 15:20 GMT
A while back I posted regarding a problem we were having with one of our
applications which was randomly crashing.  Monitoring memory usage revealed
a spike in nonpaged pool memory just prior to the crash each time.

We finally think we have narrowed down the cause of this to a user (located
semi-remotely) who would connect into our system and disconnect
"ungracefully" (literally, by pulling his network cable).  Connections here
are all TCP/IP sockets.

So, we're trying to look at now how to properly protect our application
against this type of error in the future.

I have two questions (so far, I think):
1) What determines the length of time between this remote user yanking his
cable out and when we actually start seeing SocketExceptions that the
connection was closed?  Just depends on the network pieces involved? (aka:
not consistent?)

2) Is it safe (and good practice maybe??) to call multiple BeginSend's on a
socket?  Right now, we are, and that is what we think is causing our
problem.  Any time our application needs to send something to a particular
client, it basically just calls BeginSend and moves on.  What happened in
this case, is that the BeginSend's never COMPLETED (our callbacks never got
invoked) after the user's cable was pulled.  With the amount of traffic in
our system, the number of "pending sends" sometimes reached into the
thousands.  At this point 1 of 2 things happened: either the network finally
figured out the remote user wasn't there and all of a sudden all the pending
callbacks completed at once (with socketexceptions, which is good) - this
was rare.   More frequently, these pending calls just built up more and more
until the application finally croaked (we think this was the nonpaged pool
memory spike we are seeing).
So, is there a different concept we should be using on sending data out to
clients that would prevent this type of failure in the future?

Advice/comments appreciated. Thanks

-Adam
salman - 23 May 2008 15:31 GMT
> A while back I posted regarding a problem we were having with one of our
> applications which was randomly crashing.  Monitoring memory usage revealed
[quoted text clipped - 33 lines]
>
> -Adam

you should use socket KeepAlive option (set keep alive time to 4/5
sec), to notify disconnection.

Ali
Peter Duniho - 23 May 2008 18:15 GMT
> [...]
> I have two questions (so far, I think):
[quoted text clipped - 4 lines]
> (aka:
> not consistent?)

_Usually_, if there's a broken connection, you'll see exceptions shortly  
after you start trying to send data, if not as soon as you try to send  
data.  It does depend somewhat on how the connection is broken and where.  
Depending on the network configuration, it is theoretically possible to  
_never_ get an exception.

> 2) Is it safe (and good practice maybe??) to call multiple BeginSend's  
> on a
> socket?  Right now, we are, and that is what we think is causing our
> problem.

That very well could be.  It points to two issues:

    * You may want to put an upper bound on how many send operations you  
perform for any connection without getting a response, or at least a  
completion of the call to BeginSend() (which indicates the data's been  
buffered locally, not that the remote endpoint has received).  In theory,  
you should be able to queue as many sends as you need, especially if  
you've set the socket buffer itself to 0 (so that all buffering is managed  
by your own allocated buffers you pass to BeginSend()).  But it does sound  
like you are somehow sending data _so_ quickly that you fill up the  
available memory before the network layer can detect the broken connection.

    * You may have a bug where you don't handle an out-of-memory condition  
gracefully.  Whether this is really a bug depends on your intended  
design.  But it seems to me that for a server-class application, it makes  
sense to catch all exceptions and try to continue gracefully if possible.  
You may find that the program becomes useless until some clean-up is done,  
but at the very minimum it would allow you to report the specific problem,  
and possibly you could include logic that allows you to start pruning your  
client list until things start working again.

Given that the problem occurs when you try to send data, I'm not convinced  
that enabling keep-alive is going to be useful.  In scenarios where  
keep-alive would detect a problem, so too should trying to send data.  I'm  
curious, over how long does this failure take to occur.  From the time  
that the connection is broken, until the time that your application starts  
seeing errors (either exceptions on the socket or simply failing)?  I  
admit, I'm surprised to hear that you're able to make enough attempts to  
send that you run out of resources before the socket itself reports an  
error.  It seems like on a modern computer, you shouldn't be able to  
allocate memory fast enough to cause that to happen.

Pete
Adam Clauss - 23 May 2008 18:34 GMT
> _Usually_, if there's a broken connection, you'll see exceptions shortly
> after you start trying to send data, if not as soon as you try to send
> data.  It does depend somewhat on how the connection is broken and where.
> Depending on the network configuration, it is theoretically possible to
> _never_ get an exception.

For whatever reason (based on my testing in our development environment), I
see about a minute's worth of time go by before a socketexception gets
thrown and the disconnect recognized.

>     * You may want to put an upper bound on how many send operations you
> perform for any connection without getting a response, or at least a
[quoted text clipped - 6 lines]
> available memory before the network layer can detect the broken
> connection.

The associated error at the time of crash tends to be
EVENT_SRV_NO_NONPAGED_POOL in Event Viewer (aka: we ran out of nonpaged pool
memory).
Is setting the socket buffer to 0 a non-default setting (right now we do not
explicitly make a setting to that value).  If set that way, would it maybe
use our application memory (which obviously has a much larger pool to
allocate from) rather than nonpaged and possibly give the socket enough time
to recognize the disconnect?
The socket traffic is XML messages (typically one-way to the client).  They
can range from maybe a couple hundred bytes to the largest being several
hundred KB.  I don't remember what the cap is on non-paged memory, but we
put some counts in to look at number of calls to begin send vs number of
callbacks received, and the difference quickly grew into the thousands
during this minute or so time period.

>     * You may have a bug where you don't handle an out-of-memory condition
> gracefully.  Whether this is really a bug depends on your intended
[quoted text clipped - 15 lines]
> error.  It seems like on a modern computer, you shouldn't be able to
> allocate memory fast enough to cause that to happen.

Our test setup it takes about a minute.  However, our test setup also
doesn't crash.  Watching nonpaged pool memory with perfmon, I do see a spike
begin after I yank the cord, but it does not crash.  A minute goes by, a
"logout" (socket disconnection) gets logged by our application, and memory
falls back to normal.  It seems one of two things is happening in the
production setup:
1) They have FAR more data flowing than we do in our test setup, causing the
spike to be of greater magnitude and big enough to crash the application
before the minute goes by (this is almost certainly true - they DO have more
data); and/or:
2) Their network configuration is not registering the disconnection for a
timeperiod longer than a minute - I am still working to verify exactly how
long it took the application to crash after uncompleted operations started
stacking up.

- Adam
Peter Duniho - 23 May 2008 18:55 GMT
> For whatever reason (based on my testing in our development  
> environment), I
> see about a minute's worth of time go by before a socketexception gets
> thrown and the disconnect recognized.

Yuck.  For what it's worth, I've never seen disconnects take that long to  
detect when actually sending data (obviously, they can take indefinitely  
longer if you don't try to send anything :) ).  I typically see the  
disconnect within a second or two.

It might be worth trying to explore what makes it take so long.  I have  
little enough experience with the lowest levels of networking that I can't  
suggest specifics in that regard.

The only higher-level thing that comes to mind is the possibility that  
there's some thread hogging all the CPU time, which is limiting how  
quickly your i/o thread(s) get to process things.  In this latter  
scenario, the network driver itself would be detecting the disconnect  
almost immediately, but wouldn't get a chance to report it until much  
later.

But it'd be hard for me to say for sure even with a code sample.  Without  
one, it's just pure speculation.  That said, if you have any code that's  
raising thread priorities, you might consider disabling it to see if that  
helps (hopefully you don't...it's almost never the right thing to do :)  
).  And if you have a thread that is compute-intensive, you might consider  
_lowering_ that thread's priority so that in times of high i/o load, it  
doesn't get in the way.

> [...]
> The associated error at the time of crash tends to be
[quoted text clipped - 9 lines]
> time
> to recognize the disconnect?

Maybe.  However, I'm not really sure why the non-paged pool is involved.  
Typically, the network driver is going to have a fixed sized buffer.  I  
wouldn't expect it to try to expand that buffer or add new ones.  Instead,  
it will either reject an attempt to queue new data (non-blocking i/o) or  
it will force the attempt to wait until there is space (blocking i/o).

AFAIK a 0-sized buffer for your socket is not the default, and it has the  
effect of telling the driver to not buffer at all, but rather to use the  
buffer you provide.  This is common for IOCP implementations of sockets,  
and since the async Socket API uses IOCP, it's something to try.  The main  
advantage is actually one of performance -- it avoids one copy of the data  
-- but I suppose if there's something about the network layer where it's  
trying to allocate non-paged memory as you queue data, telling it not to  
buffer might improve things.

Again, I'm not actually clear myself why non-paged memory would be getting  
allocated at this point.  But then, that's as likely just a gap in my  
knowledge as it is an indication that that's abnormal and/or unrelated to  
your problem.  :)

I apologize for the vagueness in my comments.  The bulk of my socket  
programming experience is with the unmanaged Winsock API.  Inasmuch as the  
.NET Socket class is built on that, my previous knowledge is applicable,  
but there may be details specific to .NET that I'm unaware of.

Pete

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.