.NET Forum / .NET Framework / Performance / August 2006
Memory Cleanup for Hashtables
|
|
Thread rating:  |
Venkatesh - 18 Jul 2006 09:56 GMT Hi,
I am working on a tool which compares log files from different servers. The application allows comparision of data from 2 servers at a time. A max of 120 logs are compared in one go. To make things simpler, I am loading the data of the servers in 2 huge hash tables and then comparing them. Each file has a max size of 2.5MB. I am facing an issue with the memory usage of the application. If I check in task manager, the application uses close to 1 GB of memory. Even if the data of the 60 files are cached in the RAM, it should not cross more than 2.5 * 60 = 150 MB.
I am spawning 10 threads to load the files so that the initial load is faster. Once the load completes, I am deallocating the threads too. I am not quite sure why the memory usage is extremely high. Can you please provide some pointers or alternative approaches for the implementation.
Thanks! Venkatesh V.
Jon Skeet [C# MVP] - 18 Jul 2006 19:43 GMT > I am working on a tool which compares log files from different servers. > The application allows comparision of data from 2 servers at a time. A max [quoted text clipped - 4 lines] > of memory. Even if the data of the 60 files are cached in the RAM, it should > not cross more than 2.5 * 60 = 150 MB. Well, that depends on how you're storing the data. For a start, if you're loading the data as text and it's in ASCII to start with, you'll double the size of the data due to .NET strings being in Unicode. Beyond that, however, we'll need to know more details.
> I am spawning 10 threads to load the files so that the initial load is > faster. Do you have any evidence that loading the files in parallel is actually speeding things up? Unless loading the files really eats processor time, you're likely to be slowing things down by reading several files at a time. Even if it *does* speed things up, I'd expect the number of useful threads to be significantly less than 10.
> Once the load completes, I am deallocating the threads too. How exactly are you trying to "deallocate" the threads?
> I am not > quite sure why the memory usage is extremely high. Can you please provide > some pointers or alternative approaches for the implementation. Could you post a short but complete program which demonstrates the problem?
See http://www.pobox.com/~skeet/csharp/complete.html for details of what I mean by that.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Jon Shemitz - 20 Jul 2006 15:39 GMT > > I am spawning 10 threads to load the files so that the initial load is > > faster. [quoted text clipped - 3 lines] > time, you're likely to be slowing things down by reading several files > at a time. At least in principle, the OS can optimize simultaneous read requests to minimize head shuttling. I have no idea if any version of Windows really does this.
 Signature .NET 2.0 for Delphi Programmers www.midnightbeach.com/.net Delphi skills make .NET easy to learn In print, in stores.
Jon Skeet [C# MVP] - 20 Jul 2006 21:45 GMT > > > I am spawning 10 threads to load the files so that the initial load is > > > faster. [quoted text clipped - 7 lines] > to minimize head shuttling. I have no idea if any version of Windows > really does this. But is that going to make it any *faster* than reading in a single thread, assuming that IO is the limiting factor here? I'd at least hope that either the disk itself or Windows guesses that if I've read the first 50MB of a file, I might be about to read the next few bytes, so any time spent processing should be useful in terms of IO.
Unless there's a *lot* of time spent processing, I can't see the benefit of using multiple threads - and even then, there'd rarely be a benefit in using 10 threads (unless you have 8 cores...)
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Barry Kelly - 20 Jul 2006 22:59 GMT > But is that going to make it any *faster* than reading in a single > thread, assuming that IO is the limiting factor here? It could, if the different threads were reading from different drives. Even then, one could avoid redundant threads if one used asynchronous stream accesses - BeginRead etc.
-- Barry
 Signature http://barrkel.blogspot.com/
Jon Shemitz - 21 Jul 2006 04:40 GMT > > At least in principle, the OS can optimize simultaneous read requests > > to minimize head shuttling. I have no idea if any version of Windows > > really does this. > > But is that going to make it any *faster* than reading in a single > thread, assuming that IO is the limiting factor here? Well (again, in principle!) imagine the first file is at track 10, the second file is at track 10,000, the third file is at track 500, and the fourth file is at track 25,000. Even ignoring the time to get from the current location to track 10, it's going to be faster to step the head from 10 to 500 then to 10,000 and then to 25,000 than it is to go from 10 to 10,000 then back to 500 and then forward to 50,000. The extra physical shuttling adds milliseconds to the (essentially fixed) transfer time.
Of course, as I keep saying this is only theoretical. I see no reason to doubt Barry's experience ... though perhaps Vista (or some future server versions) will implement the sort of seek optimizations (that I seem to remember Unix had in the '80's ....)
 Signature .NET 2.0 for Delphi Programmers www.midnightbeach.com/.net Delphi skills make .NET easy to learn In print, in stores.
Ben Voigt - 31 Aug 2006 20:40 GMT >> > At least in principle, the OS can optimize simultaneous read requests >> > to minimize head shuttling. I have no idea if any version of Windows [quoted text clipped - 11 lines] > extra physical shuttling adds milliseconds to the (essentially fixed) > transfer time. You're assuming that the files are not fragmented. In reality, if multiple logfiles are being written simultaneously (I'm thinking logs for different IIS websites on the same machine, for example), they are likely to have interleaved fragments, and a good elevator sort will improve throughput considerably.
> Of course, as I keep saying this is only theoretical. I see no reason > to doubt Barry's experience ... though perhaps Vista (or some future > server versions) will implement the sort of seek optimizations (that I > seem to remember Unix had in the '80's ....) Elevator sort is best implemented on the drive interface electronics, not the OS btw, because the OS can have no knowledge of actual disk layout as determined by number of platters, hardware striping, etc. In contrast read-ahead caching is best implemented by the OS (or better yet application) which knows which data is related.
Barry Kelly - 20 Jul 2006 22:58 GMT > At least in principle, the OS can optimize simultaneous read requests > to minimize head shuttling. I have no idea if any version of Windows > really does this. It's been my experience that simultaneous reads slow everything down by about 100x, if not much more. Windows doesn't appear to be smart about it, it tries to be fair to each thread / process, but instead of giving them a decent run (required given the slowness of disk seeks), it slices them up too finely.
-- Barry
 Signature http://barrkel.blogspot.com/
Chris Mullins - 21 Jul 2006 05:58 GMT > Jon Shemitz <jon@midnightbeach.com> wrote: > [quoted text clipped - 7 lines] > them a decent run (required given the slowness of disk seeks), it slices > them up too finely. Isn't this what technologies such as Command Queueing in SCSI and SATA are specifically designed to address? On an old IDE system sure, you can only actually do 1 read at a time.
On a modern drive with Command Queueing, doesn't the O/S say, "I have these 50 read commands, do 'em all as best you can." ? This is also going to be very depenant upon the driver and a million other variables...
-- Chris Mullins
Barry Kelly - 21 Jul 2006 07:15 GMT > > Jon Shemitz <jon@midnightbeach.com> wrote: > > [quoted text clipped - 15 lines] > 50 read commands, do 'em all as best you can." ? This is also going to be > very depenant upon the driver and a million other variables... It's not too bad if your reads are sequential and there aren't too many of them competing. It gets a lot worse when you've got random competing reads. You can queue up reads at a far faster rate than the disk seeks, and disk seeks take far longer than sequential reads. The OS is faced with Hobson's choice: be unfair to processes and starve them, in order to minimize seeks and maximize sequential reads; or try to be fair, but incur loads of wasted time in redundant seeks. Whether it's the OS, the drive or the driver that gets the final say, there's still a tradeoff. If there's enough competing reads, you can't afford enough cache RAM to really amortize the seeks using optimistic read-ahead.
So, you either end up with unresponsive applications, or inefficient thrashing, or some mix between the two. Until we get solid-state hard drives, I don't think it's going to qualitatively improve. Modern drives sequentially read well over 100MB/sec as you probably know, it's seeks that take all the time.
-- Barry
 Signature http://barrkel.blogspot.com/
Arun - 24 Jul 2006 05:50 GMT > Could you post a short but complete program which demonstrates the > problem? Hi Jon,
I am just writing the functions I am calling to read files & storing the Data in to HashTables / Arraylist.
Method 1:
#region setupReaderThreads /// <summary> /// Setup the Multi Threaded environment to read the Log files /// </summary> private void setupReaderThreads() { int fileNameCount = fileNames.Count;
_fileName = new string[_arrayLength]; _logAnalyserBL = new LogAnalyserBL[_arrayLength]; _logAnalyserBLThread = new Thread[_arrayLength]; for(int i=0; i< _arrayLength; i++) { //fileNames is an arraylist _fileName[i] = fileNames[i + _fileCounter].ToString();
if(! File.Exists(_fileName[i])) { return; } else { _logAnalyserBL[i] = new LogAnalyserBL(false); _logAnalyserBL[i].FileName = _fileName[i]; }
_logAnalyserBLThread[i] = new Thread(new ThreadStart(_logAnalyserBL[i].DoWork)); _logAnalyserBLThread[i].Name = "Log Reader " + i;
_logAnalyserBLThread[i].Start(); } } ************************************************
Method 2:
#region SearchTimer_Tick /// <summary> /// SearchTimer_Tick /// </summary> /// <param name="sender"></param> /// <param name="e"></param> private void SearchTimer_Tick(object sender, System.EventArgs e) { try { SearchTimer.Enabled = false; // // Setting up the Multiple Threades to read the log files // if (_state == 0) { setupReaderThreads(); _state = 1; } // // Checking if the file Reading is finished or it is still going on // else if (_state == 1) { Cursor = Cursors.WaitCursor; bool flag = false;
StatusLabel.Text = _logAnalyserBL[0].statusBar; for(int i=0; i< _arrayLength; i++) { if(!_logAnalyserBL[i].isRunning) flag = true; else { flag = false; break; } }
if(flag) _state = 2;
} // // Adding all the Logfiles Hash Table objects into the arraylist to prepare the Result Nodes // else if (_state == 2) { StatusLabel.Text = string.Format("Preparing Results...", _logs.Count);
// Adding all the log file hashtable objects into the Arraylist "_logs" for(int i=0; i< _arrayLength; i++) { foreach (LogEntryNode n in _logAnalyserBL[i].results) { _logs.Add(n); }
_logAnalyserBL[i] = null; }
//Logic for Increment the counter to read 12 files at a time if(_fileCounter < fileNames.Count - fileNames.Count%12) { _fileCounter += 12; _state = 0; if( _fileCounter == fileNames.Count - fileNames.Count%12) { _arrayLength = fileNames.Count%12; if(_fileCounter == fileNames.Count) { _state = 3; } } else if(_fileCounter == fileNames.Count) { _state = 3; } } else _state = 3; } // // Reading the Log arraylist & Creating the corresponding Hashtables // else if (_state == 3) { _logMatcher = new LogMatcher(); _logMatcher.logs = _logs; ThreadStart threadStart = new ThreadStart(_logMatcher.DoWork); Thread logAnalysisThread = new Thread(threadStart); logAnalysisThread.Name = "Log Analysis"; logAnalysisThread.Start();
_state = 4; } // // Checking the thread is running or not // else if (_state == 4) { if (!_logMatcher.isRunning) _state = 5; else StatusLabel.Text = _logMatcher.status; } // // Updating the Status // else if (_state == 5) { this.StatusLabel.Text = "Converting to TreeView..."; _state = 6; } // // Converting the resultant nodes into TreeView form // else if (_state == 6) { this.StatusLabel.Text = "Converting to TreeView..."; MetaDataTreeView.BeginUpdate();
MetaDataTreeView.Nodes.Add(new TreeNode("QUERY Nodes")); MetaDataTreeView.Nodes.Add(new TreeNode("SAVE Nodes"));
int userCount = 1; foreach (string key in _logMatcher.orderedResults) { ArrayList logEntryArrayList = (ArrayList) _logMatcher.results[key]; TreeNode tNode = null; string startTime = string.Empty; string endTime = string.Empty; string timeTaken = string.Empty; if (_logMatcher.methods.ContainsKey(key)) { LogEntryNode logEntryNode = (LogEntryNode) _logMatcher.methods[key];
startTime = logEntryNode.GetVal("Time");
tNode = new TreeNode(string.Format("{0} {1}", logEntryNode.themethod, logEntryNode.GetVal("Time"))); // get the user string user = logEntryNode.GetVal("Original User"); if (user.Length == 0) { string refString = logEntryNode.GetVal("Ref"); user = refString.Substring(0, refString.IndexOf("-")); } //user += " - Ping";
if (!_users.ContainsKey(user)) _users.Add(user, new LogUserDetails(user, userCount++));
LogUserDetails logUserDetails = (LogUserDetails) _users[user]; tNode.BackColor = logUserDetails.color; logUserDetails.count++; } else { tNode = new TreeNode( "Unknown" ); tNode.BackColor = Color.LightCoral; }
TreeNode keyNode = new TreeNode(((LogEntryNode)logEntryArrayList[0]).key); tNode.Nodes.Add(keyNode); foreach (LogEntryNode n in logEntryArrayList) { tNode.Nodes.Add(n); if( n.Text.IndexOf(": Completed") > 0) { endTime = n.GetVal("Time"); if( startTime != string.Empty && endTime != string.Empty ) { timeTaken = Convert.ToString( Convert.ToDateTime(endTime) - Convert.ToDateTime(startTime)); } } }
TreeNode timeTakenNode = new TreeNode("Time Taken: " + timeTaken ); timeTakenNode.BackColor = Color.Red; timeTakenNode.ForeColor = Color.White; tNode.Nodes.Add(timeTakenNode); // Distinguishing the Log entries on the Basis of the query & Save if(tNode.Text.IndexOf("Query") == 0) { MetaDataTreeView.Nodes[0].Nodes.Add(tNode); } else { MetaDataTreeView.Nodes[1].Nodes.Add(tNode); } }
MetaDataTreeView.EndUpdate(); _logMatcherHash.Clear();
_logMatcherHash.Add("SEARCH_LOG", _logMatcher); _logMatcher = null; _state = 7; } // // Analysing the Treeview Nodes // else if (_state == 7) { this.StatusLabel.Text = "Analysing ..."; this.LogSummaryTreeView.BeginUpdate(); LogSummaryTreeView.Nodes.Clear();
TreeNode usersNode = new TreeNode("Users"); LogSummaryTreeView.Nodes.Add(usersNode); int count = 0; foreach (LogUserDetails ld in this._users.Values) { count += ld.count; ld.Update(); usersNode.Nodes.Add(ld); }
usersNode.Text = string.Format("Users - Total = {0}", count); usersNode.ExpandAll(); _users.Clear();
_state = 8; } else if (_state == 8) { StatusLabel.Text = "Log Entries are available now in Tree View Controls."; LogSummaryTreeView.EndUpdate(); _state = -1;
Cursor = Cursors.Default; } else { _state = -1; SearchTimer.Enabled = false; } if (_state != -1) SearchTimer.Enabled = true; } catch(Exception ex) { MessageBox.Show(ex.Message); } } #endregion
*************************************************
Method 3: LogAnalyserBL.DoWork()
#region DoWork /// <summary> /// Reading the Selected Log files and writing the file contents into ArrayList /// </summary> public void DoWork() { try { statusBar = string.Format( "Reading {0} ...", _fileName); Stream strm = File.Open(_fileName, FileMode.Open, FileAccess.Read, FileShare.Read); logFileReader = new StreamReader(strm); string line = null;
long currentCount = 0; LogEntryNode activeNode = null;
//Reading the selected Log file Line by Line while ((line = logFileReader.ReadLine()) != null) { if (line == "[LOG_ENTRY]") { if (activeNode != null && isCompatability) results.Add(activeNode);
activeNode = new LogEntryNode(isCompatability); } else if (line == "[END_LOG_ENTRY]") { results.Add(activeNode); activeNode = null; } else if (activeNode != null) { //adding the Logfile Lines into the LogEntryNode value activeNode.AddVal(line); } currentCount++; } logFileReader.Close(); logFileReader = null; strm.Close(); statusBar = "Reading Complete";
long count = 0; long repCounter = 0; statusBar = string.Format("Updating {0} Nodes...", results.Count); foreach (LogEntryNode n in results) { // update the value of the node... n.UpdateNow() ;
} } catch(System.Exception ex) { Debug.WriteLine(ex.Message); statusBar = "Error"; MessageBox.Show(ex.Message); return; } finally { isRunning = false; if (logFileReader != null) logFileReader.Close(); } } #endregion ************************************************
Method 4: LogMatcher.DoWork
#region DoWork
public Hashtable results = new Hashtable(); public Hashtable methods = new Hashtable(); public ArrayList orderedResults = new ArrayList(); public ArrayList logs = null; /// <summary> /// Adding the LogEntries in the HashTables /// </summary> public void DoWork() { ArrayList kiddies = null; foreach (LogEntryNode leNode in logs) { if (!results.ContainsKey(leNode.key)) { results.Add(leNode.key, new ArrayList()); orderedResults.Add(leNode.key); }
if (leNode.startCall) { if (!methods.ContainsKey(leNode.key)) methods.Add(leNode.key, leNode); }
kiddies = (ArrayList) results[leNode.key]; kiddies.Add(leNode); } isRunning = false; } #endregion **************************************************
Jon, If u see the above 4 methods then u can get the idea How I am reading the files & how I am storing the Data into Hashtables & ArrayLists Actually as u know the size of each file is 2.5 MB and there r atmost 60 files in each directory.
Now please suggest me the correct solution to reduce the memory usage in my application.
Thanks Arun
> > I am working on a tool which compares log files from different servers. > > The application allows comparision of data from 2 servers at a time. A max [quoted text clipped - 32 lines] > See http://www.pobox.com/~skeet/csharp/complete.html for details of > what I mean by that. Jon Skeet [C# MVP] - 24 Jul 2006 06:55 GMT > > Could you post a short but complete program which demonstrates the > > problem? > > I am just writing the functions I am calling to read files & storing the > Data in to HashTables / Arraylist. Why, when I specifically asked for a complete program? I can't run the code you've given, so I can't verify whether or not it has the same problem on my box.
I'll have a look at it, but it would have been a lot quicker if you'd posted a complete program.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Ben Voigt - 31 Aug 2006 20:57 GMT At which step are you looking at the memory usage. A TreeView has a lot of overhead. I also see nowhere that the ArrayList or Hashtable gets freed for garbage collection. So you probably have three copies of your data, in Unicode, plus TreeView overhead, of a 150 MB dataset, hence >900MB.
It doesn't help that you have multiple methods named DoWork... you shouldn't have any actually, it's a horrible name. You gain nothing by starting multiple threads to write to the same Hashtable.
I'd bring all the data processing back to one worker thread, using overlapped I/O and completion callbacks (WaitForSingleObjectEx, let the single object be a cancel event set from the GUI thread). Forget ReadLine, reading a one-page chunk (usually 4k) will be much faster. There's no need to save up the entire block before adding to the Hashtable either, once you've read the key, add the pair (key, new StringBuilder()) and then start appending data to the StringBuilder (or whatever object). And set the hashtable to null after you add the nodes to the treeview (either remove each element as you put in into the treeview, or dereference the entire hashtable at the end).
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|