.NET Forum / Languages / C# / December 2007
Multithreading WebRequests, a good and stable approach?
|
|
Thread rating:  |
Nightcrawler - 07 Dec 2007 17:58 GMT I have a webservice that gets data from three websites and puts the result into a datatable and returns that datatable.
Currently the webservice makes a WebRequest (using parameters in a querystring) to the first website, adds the data into a datatable then moves on to the second website, merges the datatable together and finally gets data from the third website and merges that datatable with the existing one.
This works fine but I recently changed the client interface to pull the information using AJAX/javascript. A browser like Firefox will fire an error (script has stopped responding) if the javascript does not respond in 10 seconds. This puts more preassure on the webservice to execute and return results within those 10 seconds.
I started looking into multithreading these webRequests and fire the requests at the same time.
My questions are:
1. Is this a good approach? Are there any risks in multithreading multiple webRequests like this? 2. Can anyone point me in the right direction as to how to make these webrequests using multithreading? 3. More importantly, how do I merge the data into one datatable once all three webRequests are completed?
Any feedback is appreciated.
Thanks
Peter Duniho - 07 Dec 2007 18:42 GMT > [...] > My questions are: > > 1. Is this a good approach? Are there any risks in multithreading > multiple webRequests like this? Multithreading always has risks. But I don't think there is anything unusual in this scenario. Just the usual concurrency issues.
> 2. Can anyone point me in the right direction as to how to make these > webrequests using multithreading? Use the async methods (start with "Begin..." and "End..."). This will prevent any thread from being committed to the operation until it actually completes. Or, it should anyway on operating systems that support IOCP.
> 3. More importantly, how do I merge the data into one datatable once > all three webRequests are completed? Where's the datatable? What part of the issue are you having problems with?
It sounds as though you're implementing some sort of web components application, where there's server-side and client-side parts. Is the question about how to get the data back to the client when everything's done? Or is it simply about how to manage your DataTable object?
If the latter, it should be pretty much the same as however you do it synchronously, except you'll have to provide synchronization for the DataTable object (using the lock() statement, for example). If you need the data in the DataTable object in some specific order (for example, in the order the requests were started), you'll need to impose that order. How best to do that depends on how you've defined the order.
If the former, I have no idea. Sounds like a web applications question, and I don't know anything about that. :)
Pete
Nightcrawler - 07 Dec 2007 19:01 GMT Pete,
This is the way I have it setup. I only included enough code so you understand the logic. I stripped out a bunch that is not required. So the part where I need to optimze is the GetSearchResultsArray(). I would like to fire the three GetResults at the same time and then be able to merge the data together into one table (no particular order).
Thanks for your help!
[WebMethod] [System.Web.Script.Services.ScriptMethod(UseHttpGet = true)] public result[] GetSearchResultsArray() { DataTable dt = BuildDataTable(); // The BuildDataTable() just returns a datatable with specific columns to this operation (code not included for simplicity)
dt = GetResults("url1", "parameters1"); dt.Merge(GetResults("url2", "parameters2")); dt.Merge(GetResults("url3", "parameters3"));
//Takes the datatable and converts it to a List (code not included for simplicity) }
private DataTable GetResults(string url, string parameters) { string result = GetSearchResults(url, parameters);
// Does processing of the result in the response string and puts it into a prebuilt datatable (code not included for simplicity) return DataTable; } private string GetSearchResults(string url, string parameters) { string httpRequest = String.Format("{0}?{1}", url, parameters);
WebRequest webRequest = WebRequest.Create(httpRequest); StreamReader responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
string responseString = HttpUtility.UrlDecode(responseReader.ReadToEnd()); responseReader.Close();
return responseString; }
Peter Duniho - 07 Dec 2007 20:24 GMT > Pete, > [quoted text clipped - 3 lines] > would like to fire the three GetResults at the same time and then be > able to merge the data together into one table (no particular order). Okay, the "no particular order" is helpful. If the order did matter, that could be easily addressed, but it does make the code simpler to not have to worry about it.
Let's start with the suggestions and code that Nicholas posted, since his basic response is very useful.
Based on that response, I'd offer a couple of observations:
* First, the difference between his two suggestions -- calling EndGetResponse() in sequence for each request, versus setting a waitable event -- is not very great, at least not as he demonstrated it. In either case, the code will simply stop before exiting the method that starts all three requests, so they have the same effect.
Where setting the event handle might be useful is if you had some code _somewhere else_ that would wait on it, in a different thread. For example, let's say you ran the code he posted in the main thread in response to something, but had a different thread sitting around waiting to process completed data retrievals. Then that different thread could use the waitable event as its signal to do more work. Of course, in that scenario you wouldn't create the waitable event in the code that starts the requests. It'd be stored somewhere more accessible so that the other thread could already be waiting on it.
* Second, his sample provides a very good illustration of the synchronization required for the DataTable. I like to follow Jon's advice to not lock using the actual object, but rather to create a separate "object" instance for use in locking. But otherwise, his sample shows what I meant when I wrote of the need to address concurrency issues by synchronizing access to the DataTable.
* Finally, I think Nicholas meant to just write "callback" instead of "callback1", "callback2", and "callback3" when he calls BeginGetResponse().
Now, how would I adjust his sample to suit the description you've given above?
I would get rid of the synchronization at the end of his method, as well as the waitable event altogether. I would also, of course, create a new object for locking the DataTable. Finally, without the waitable event, instead I would just call whatever code you have that needs to be called when all of the requests have completed.
So, taking Nicholas's code as the starting point, here's what it'd look like instead:
public void MyMethod() { // Create the three web requests. HttpWebRequest wr1 = ...; HttpWebRequest wr2 = ...; HttpWebRequest wr3 = ...;
// This is the number of web requests that still have to complete. int requestsToComplete = 3;
// The data table to return. DataTable dt = ...;
// [an object used to synchronize access to the DataTable -- Pete] object objLock = new object();
// The event which will be called to indicate that processing is done. // The async callback which will process the data. You will need // separate code for each if they have different routines to // populate the data table. AsyncCallback callback = delegate(IAsyncResult ar) { // Get the request from the state. // [note that I've changed to a straight case from the "as" // that Nicholas had. I only use "as" if I've got some code // that will actually deal with a failed cast. Otherwise, // you just get a delayed exception, and a less-useful one at // that, since the exception is a null reference instead of the // more informative invalid cast that actually describes what // went wrong -- Pete] HttpWebRequest request = (HttpWebRequest)ar.AsyncState;
// Call EndGetResponse. using (HttWebResponse response = (HttpWebResponse) request.EndGetResponse(ar)) { // Add to the data table here. This is the code specific to the request. // You have to synchronize access to the table as well. lock (objLock) { // Process the response here and add the rows you need to.
// [here is where you'd convert the response to DataTable and // then call DataTable.Merge() with the results, for example. // Noting, of course, that in this scenario it might be easier // to just add the data as it's generated from the response to // the original table. But if that were really true, maybe you // would have done it that way in the original code too, so I don't // really know. :) -- Pete] } }
// Decrement the count on the requests to complete. If it is // zero, then fire the event. if (Interlocked.Decrement(ref requestsToComplete) == 0) { // [here you'd call whatever method needs executing when all of the // data has been retrieved. If that method includes any calls to update // things in the UI, you'll either need to use Control.Invoke() here to // call that method, or in that method use Control.Invoke() to do the // UI-specific stuff -- Pete] } };
// Begin the calls here. wr1.BeginGetResponse(callback, wr1); wr2.BeginGetResponse(callback, wr2); wr3.BeginGetResponse(callback, wr3); }
Hope that helps.
Pete
Nightcrawler - 12 Dec 2007 00:17 GMT Pete,
I am trying your code but it doesn't seem to work.
I tried Nicholas code and it worked fine. I then adjusted it to try yours by removing the manualevent and modifying it to your post but now it simply returns nothing. Almost as if the requests never happened. I have a feeling I am missing a line of code that prevents the method to exit out before the requests are done.
Please let me know.
Thanks
Peter Duniho - 12 Dec 2007 00:49 GMT > Pete, > [quoted text clipped - 5 lines] > happened. I have a feeling I am missing a line of code that prevents > the method to exit out before the requests are done. Why do you want to prevent the method from exiting?
I thought the whole point here was that if your code doesn't return, it appears unresponsive to the browser, which then cancels your code.
The code Nicholas posted may speed things a bit by parallelizing the requests, but ultimately you're still waiting, and if any one request takes too long, all of the requests are basically useless.
Presumably, you've got some other code that would be executed after the method returns, taking all of the responses in aggregate and doing something useful with them. In the code I posted, you should execute that code where I indicated by my comments, once the counter reaches zero. You may want to just put all that code into a method, and then call that method where I've indicated.
I don't think there's any reason to prevent the method from exiting, but if that's a requirement of yours for some reason then no, the code I posted isn't going to work for you. I wrote it specifically to return as soon as it could, rather than waiting around for the asynchronous i/o to complete, since that's generally the point of doing asynchronous i/o (as Nicholas points out, even initiating the i/o asynchronously is only going to allow a limited number of the requests to actually proceed in parallel, depending on the system configuration).
Pete
Nightcrawler - 12 Dec 2007 18:57 GMT Pete,
The method will be called throught an AJAX user interface so it will be exposed in a webservice.
So my current code has 3 different callbacks since each of them have seperate routines specific to each web request. Once the datatable has been populated through my three callback routines, I do some filtering of the datatable using a dataview, then finally convert all the data in the table to a List and return it as an array to the calling javascript, which will display it to the user.
Are you saying I could return portions using your code. So, if webrequest 1 is done it will return that to the javascript, then if request 3 is done, it will return that and then finally request 2 (I am assuming the finsih in that order).
Thanks for your input.
Peter Duniho - 12 Dec 2007 19:31 GMT > Pete, > [quoted text clipped - 7 lines] > in the table to a List and return it as an array to the calling > javascript, which will display it to the user. The basic theory is the same as the code that Nicholas and I posted. In the case of my proposal, you can still use three different callbacks, as long as each includes some logic as I've suggested at the end to detect whether all of the requests have completed.
> Are you saying I could return portions using your code. So, if > webrequest 1 is done it will return that to the javascript, then if > request 3 is done, it will return that and then finally request 2 (I > am assuming the finsih in that order). I have no idea if that would work. It might, but I have no way of knowing. For one, I don't do much web development, and I don't have any idea how .NET interacts with the web client stuff. For another, I don't know enough about your particular implementation and how that would work with the web client to know whether returning intermediate results would work.
What I do know is that assuming you currently have an implementation that returns just the final complete results, and assuming there's some way for that implementation to respond to the web client (with or without the actual results) for it to not generate some kind of timeout error, then there is a simple way (as illustrated in this thread) to asynchronously accumulate the responses as well as know when all have completed so that you can take some appropriate action.
Beyond that, you'll need someone who knows more about the web client aspect of .NET. I know that I've dealt with web pages that takes FAR longer than 20 seconds to return their results, both in terms of pages that take that long to load as well as pages that appear to load right away but then have some sort of deferred processing that updates something in the page later. But I've never bothered to take a look at how those are implemented. All I know is that it can be done.
Pete
Nightcrawler - 12 Dec 2007 19:05 GMT Pete,
On another note, how can I include a regular method call to a table adapter. Say I want to fetch data from 3 webrequests and 1 one request using a dataadapter and my own database. Could I incorporate that logic into this as well?
So theoretically, four threads would work at the same time to populate a datatable (3 webrequests and one dataadapter using sql server) then returned through a webservice.
The reason I have to optimze these requests is simply because browsers ike firefox will throw and disclaimer that the javascript stopped working if the webservice call through javascript takes longer than 10 seconds. I want to avoid that at all costs.
Thanks
Peter Duniho - 12 Dec 2007 19:44 GMT > On another note, how can I include a regular method call to a table > adapter. Say I want to fetch data from 3 webrequests and 1 one request > using a dataadapter and my own database. Could I incorporate that > logic into this as well? Yes, but since DataAdapter doesn't have an async API, you'll have to handle that yourself. The most straightforward way would be to use a BackgroundWorker. The general idea is the same though: provide a delegate (in this case, used as the handler for the BackgroundWorker.DoWork event) that does the request and then at the end does the same "am I done with all requests yet?" sort of logic that the other async handlers do.
In that case, BackgroundWorker.RunWorkerAsync() method takes the place of the BeginGetResponse() method. You can either put the "am I done with all requests yet?" logic at the end of the DoWork handler, or you can create a seperate delegate to handle the BackgroundWorker.RunWorkerCompleted event. In the latter case, the main advantage is that the event is raised on the same thread that created the BackgroundWorker, but since you need to do with thread synchronization issues anyway (for the other three requests), this may not be all that useful in your case.
> So theoretically, four threads would work at the same time to populate > a datatable (3 webrequests and one dataadapter using sql server) then [quoted text clipped - 4 lines] > working if the webservice call through javascript takes longer than 10 > seconds. I want to avoid that at all costs. Well, as I mentioned in my other reply, I can't really comment very much on the exact interaction with the browser. Whatever the time limit is (10 seconds, 20 seconds, etc.) it seems to me that any one request _could_ take longer than that, and so if you are waiting for them all to complete, then even if they are all done in parallel you could still wind up hitting that limit.
While I don't know how you'd implement this, I think it would be better for the code that the browser is waiting on to return immediately, and then provide some way to update the page later once the requests have all completed (or, if possible, update the page as the intermediate results complete as well).
With web browsers being mainly "pull" data models, I don't really know how that sort of things would work. But I know I've seen what _seems_ to be like a "push" data presentation in a web browser, so it seems like it ought to be doable somehow.
Pete
Nicholas Paldino [.NET/C# MVP] - 07 Dec 2007 19:13 GMT Thomas,
This is a perfectly fine idea, but it will require a little work. The HttpWebRequest/HttpWebResponse classes absolutely support making calls asynchronously.
The simplest way would be to set up your three web requests (HttpWebRequest) instances and then call BeginGetResponse on each of them in succession, storing the IAsyncResult implementations.
Then, right after that, you would call EndGetResponse on the instances, passing the IAsyncResponse implementations that correspond to the instances that returned them on BeginGetResponse.
At this point, you would have your three results and you could insert them all into the data table to be returned.
This works because you are basically going to take as long as the longest request to get all three requests (assuming they are to different websites, the HTTP specification has a note in it about how many concurrent connections can be opened to a website at the same time, I believe) and your successive calls to EndGetResponse will not hang if the call completes before it is called.
However, you can improve on this, if you need to squeeze out more performance. You could pass callback routines to the BeginGetResponse methods, in which you would merge the results with your data set. You could then, when they are all complete, indicate to the waiting main thread that you are done (through an EventHandle of some kind). That would be a little more complex, since you don't want to create an individual event handle for each web request (since you are in a web server, I imagine you are going to be calling this a lot).
Anonymous methods can help though. I would do this:
// This can have any inputs and outputs you like, I'm just using it as an example, but it // is basically the entry point for your web request. public DataTable MyMethod() { // Create the three web requests. HttpWebRequest wr1 = ...; HttpWebRequest wr2 = ...; HttpWebRequest wr3 = ...;
// This is the number of web requests that still have to complete. int requestsToComplete = 3;
// The data table to return. DataTable dt = ...;
// The event which will be called to indicate that processing is done. using (ManualResetEvent event = new ManualResetEvent()) { // The async callback which will process the data. You will need // separate code for each if they have different routines to // populate the data table. AsyncCallback callback = delegate(IAsyncResult ar) { // Get the request from the state. HttpWebRequest request = ar.AsyncState as HttpWebRequest;
// Call EndGetResponse. using (HttWebResponse response = (HttpWebResponse) request.EndGetResponse(ar)) { // Add to the data table here. This is the code specific to the request. // You have to synchronize access to the table as well. lock (dt) { // Process the response here and add the rows you need to. } }
// Decrement the count on the requests to complete. If it is // zero, then fire the event. if (Interlocked.Decrement(ref requestsToComplete) == 0) { // Set the event. event.Set(); } };
// Begin the calls here. wr1.BeginGetResponse(callback1, wr1); wr2.BeginGetResponse(callback2, wr2); wr3.BeginGetResponse(callback3, wr3);
// Wait on the event here. event.WaitOne();
// At this point, the data table will be populated, so you can return it. return dt; } }
 Signature - Nicholas Paldino [.NET/C# MVP] - mvp@spam.guard.caspershouse.com
>I have a webservice that gets data from three websites and puts the > result into a datatable and returns that datatable. [quoted text clipped - 26 lines] > > Thanks Nightcrawler - 07 Dec 2007 19:51 GMT Nicholas,
Many thanks for your input.
Yes, you are right, this is a web environment so it will be called alot. Also, yes, each call will have a different routine as to how to work with the data so I will have to setup three different AsyncCallback callbacks.
I will dive into this right away.
Thanks a bunch!
Peter Duniho - 07 Dec 2007 20:30 GMT > Yes, you are right, this is a web environment so it will be called > alot. Also, yes, each call will have a different routine as to how to > work with the data so I will have to setup three different > AsyncCallback callbacks. For the record, the previous code you posted illustrating what you're doing uses the same method to process all three requests. This suggests that you only need one callback method as well. If there are specific parameters guiding each specific request, those can easily be incorporated into the anonymous method (in fact, IMHO it can be easier when using an anonymous method than if you had to pass them directly, as long as you watch out for variable capturing).
Pete
Nightcrawler - 07 Dec 2007 21:42 GMT Thank you both for your input. I greatly appreciate it.
I will test it out and see what kind of improvement I will be able to get in my webservice requests in terms of time.
Thanks
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|