Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / Performance / August 2006

Tip: Looking for answers? Try searching our database.

Memory Cleanup for Hashtables

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Venkatesh - 18 Jul 2006 09:56 GMT
Hi,

 I am working on a tool which compares log files from different servers.
The application allows comparision of data from 2 servers at a time.  A max
of 120 logs are compared in one go. To make things simpler, I am loading the
data of the servers in 2 huge hash tables and then comparing them. Each file
has a max size of 2.5MB. I am facing an issue with the memory usage of the
application. If I check in task manager, the application uses close to 1 GB
of memory. Even if the data of the 60 files are cached in the RAM, it should
not cross more than 2.5 * 60 = 150 MB.

I am spawning 10 threads to load the files so that the initial load is
faster. Once the load completes, I am deallocating the threads too. I am not
quite sure why the memory usage is extremely high. Can you please provide
some pointers or alternative approaches for the implementation.

Thanks!
Venkatesh V.
Jon Skeet [C# MVP] - 18 Jul 2006 19:43 GMT
>   I am working on a tool which compares log files from different servers.
> The application allows comparision of data from 2 servers at a time.  A max
[quoted text clipped - 4 lines]
> of memory. Even if the data of the 60 files are cached in the RAM, it should
> not cross more than 2.5 * 60 = 150 MB.

Well, that depends on how you're storing the data. For a start, if
you're loading the data as text and it's in ASCII to start with, you'll
double the size of the data due to .NET strings being in Unicode.
Beyond that, however, we'll need to know more details.

> I am spawning 10 threads to load the files so that the initial load is
> faster.

Do you have any evidence that loading the files in parallel is actually
speeding things up? Unless loading the files really eats processor
time, you're likely to be slowing things down by reading several files
at a time. Even if it *does* speed things up, I'd expect the number of
useful threads to be significantly less than 10.

> Once the load completes, I am deallocating the threads too.

How exactly are you trying to "deallocate" the threads?

> I am not
> quite sure why the memory usage is extremely high. Can you please provide
> some pointers or alternative approaches for the implementation.

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Jon Shemitz - 20 Jul 2006 15:39 GMT
> > I am spawning 10 threads to load the files so that the initial load is
> > faster.
[quoted text clipped - 3 lines]
> time, you're likely to be slowing things down by reading several files
> at a time.

At least in principle, the OS can optimize simultaneous read requests
to minimize head shuttling. I have no idea if any version of Windows
really does this.

Signature

.NET 2.0 for Delphi Programmers        www.midnightbeach.com/.net
Delphi skills make .NET easy to learn  In print, in stores.

Jon Skeet [C# MVP] - 20 Jul 2006 21:45 GMT
> > > I am spawning 10 threads to load the files so that the initial load is
> > > faster.
[quoted text clipped - 7 lines]
> to minimize head shuttling. I have no idea if any version of Windows
> really does this.

But is that going to make it any *faster* than reading in a single
thread, assuming that IO is the limiting factor here? I'd at least hope
that either the disk itself or Windows guesses that if I've read the
first 50MB of a file, I might be about to read the next few bytes, so
any time spent processing should be useful in terms of IO.

Unless there's a *lot* of time spent processing, I can't see the
benefit of using multiple threads - and even then, there'd rarely be a
benefit in using 10 threads (unless you have 8 cores...)

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Barry Kelly - 20 Jul 2006 22:59 GMT
> But is that going to make it any *faster* than reading in a single
> thread, assuming that IO is the limiting factor here?

It could, if the different threads were reading from different drives.
Even then, one could avoid redundant threads if one used asynchronous
stream accesses - BeginRead etc.

-- Barry

Signature

http://barrkel.blogspot.com/

Jon Shemitz - 21 Jul 2006 04:40 GMT
> > At least in principle, the OS can optimize simultaneous read requests
> > to minimize head shuttling. I have no idea if any version of Windows
> > really does this.
>
> But is that going to make it any *faster* than reading in a single
> thread, assuming that IO is the limiting factor here?

Well (again, in principle!) imagine the first file is at track 10, the
second file is at track 10,000, the third file is at track 500, and
the fourth file is at track 25,000. Even ignoring the time to get from
the current location to track 10, it's going to be faster to step the
head from 10 to 500 then to 10,000 and then to 25,000 than it is to go
from 10 to 10,000 then back to 500 and then forward to 50,000. The
extra physical shuttling adds milliseconds to the (essentially fixed)
transfer time.

Of course, as I keep saying this is only theoretical. I see no reason
to doubt Barry's experience ... though perhaps Vista (or some future
server versions) will implement the sort of seek optimizations (that I
seem to remember Unix had in the '80's ....)

Signature

.NET 2.0 for Delphi Programmers        www.midnightbeach.com/.net
Delphi skills make .NET easy to learn  In print, in stores.

Ben Voigt - 31 Aug 2006 20:40 GMT
>> > At least in principle, the OS can optimize simultaneous read requests
>> > to minimize head shuttling. I have no idea if any version of Windows
[quoted text clipped - 11 lines]
> extra physical shuttling adds milliseconds to the (essentially fixed)
> transfer time.

You're assuming that the files are not fragmented.  In reality, if multiple
logfiles are being written simultaneously (I'm thinking logs for different
IIS websites on the same machine, for example), they are likely to have
interleaved fragments, and a good elevator sort  will improve throughput
considerably.

> Of course, as I keep saying this is only theoretical. I see no reason
> to doubt Barry's experience ... though perhaps Vista (or some future
> server versions) will implement the sort of seek optimizations (that I
> seem to remember Unix had in the '80's ....)

Elevator sort is best implemented on the drive interface electronics, not
the OS btw, because the OS can have no knowledge of actual disk layout as
determined by number of platters, hardware striping, etc.  In contrast
read-ahead caching is best implemented by the OS (or better yet application)
which knows which data is related.
Barry Kelly - 20 Jul 2006 22:58 GMT
> At least in principle, the OS can optimize simultaneous read requests
> to minimize head shuttling. I have no idea if any version of Windows
> really does this.

It's been my experience that simultaneous reads slow everything down by
about 100x, if not much more. Windows doesn't appear to be smart about
it, it tries to be fair to each thread / process, but instead of giving
them a decent run (required given the slowness of disk seeks), it slices
them up too finely.

-- Barry

Signature

http://barrkel.blogspot.com/

Chris Mullins - 21 Jul 2006 05:58 GMT
> Jon Shemitz <jon@midnightbeach.com> wrote:
>
[quoted text clipped - 7 lines]
> them a decent run (required given the slowness of disk seeks), it slices
> them up too finely.

Isn't this what technologies such as Command Queueing in SCSI and SATA are
specifically designed to address? On an old IDE system sure, you can only
actually do 1 read at a time.

On a modern drive with Command Queueing, doesn't the O/S say, "I have these
50 read commands, do 'em all as best you can." ? This is also going to be
very depenant upon the driver and a million other variables...

--
Chris Mullins
Barry Kelly - 21 Jul 2006 07:15 GMT
> > Jon Shemitz <jon@midnightbeach.com> wrote:
> >
[quoted text clipped - 15 lines]
> 50 read commands, do 'em all as best you can." ? This is also going to be
> very depenant upon the driver and a million other variables...

It's not too bad if your reads are sequential and there aren't too many
of them competing. It gets a lot worse when you've got random competing
reads. You can queue up reads at a far faster rate than the disk seeks,
and disk seeks take far longer than sequential reads. The OS is faced
with Hobson's choice: be unfair to processes and starve them, in order
to minimize seeks and maximize sequential reads; or try to be fair, but
incur loads of wasted time in redundant seeks. Whether it's the OS, the
drive or the driver that gets the final say, there's still a tradeoff.
If there's enough competing reads, you can't afford enough cache RAM to
really amortize the seeks using optimistic read-ahead.

So, you either end up with unresponsive applications, or inefficient
thrashing, or some mix between the two. Until we get solid-state hard
drives, I don't think it's going to qualitatively improve. Modern drives
sequentially read well over 100MB/sec as you probably know, it's seeks
that take all the time.

-- Barry

Signature

http://barrkel.blogspot.com/

Arun - 24 Jul 2006 05:50 GMT
> Could you post a short but complete program which demonstrates the
> problem?

Hi Jon,

I am just writing the functions I am calling to read files & storing the
Data in to HashTables / Arraylist.

Method 1:

#region setupReaderThreads
        /// <summary>
        /// Setup the Multi Threaded environment to read the Log files       
        /// </summary>       
        private void setupReaderThreads()
        {
           
            int fileNameCount = fileNames.Count;

            _fileName        = new string[_arrayLength];
            _logAnalyserBL        = new LogAnalyserBL[_arrayLength];
            _logAnalyserBLThread     = new Thread[_arrayLength];
         
            for(int i=0; i< _arrayLength; i++)
            {
                //fileNames is an arraylist
               
                _fileName[i] = fileNames[i + _fileCounter].ToString();               

                if(! File.Exists(_fileName[i]))
                {
                    return;
                }
                else
                {
                    _logAnalyserBL[i] = new LogAnalyserBL(false);
                    _logAnalyserBL[i].FileName = _fileName[i];
                }

                _logAnalyserBLThread[i]  = new Thread(new
ThreadStart(_logAnalyserBL[i].DoWork));
                _logAnalyserBLThread[i].Name = "Log Reader " + i;

                _logAnalyserBLThread[i].Start();
                           
            }
       
        }
       
************************************************       

Method 2:

#region SearchTimer_Tick
        /// <summary>
        /// SearchTimer_Tick
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void SearchTimer_Tick(object sender, System.EventArgs e)
        {
            try
            {
                SearchTimer.Enabled = false;
                //
                // Setting up the Multiple Threades to read the log files
                //
                if (_state == 0)
                {                   
                    setupReaderThreads();
                    _state = 1;
                }
                //
                // Checking if the file Reading is finished or it is still going on
                //
                else if (_state == 1)
                {                                       
                    Cursor = Cursors.WaitCursor;
                    bool flag = false;

                    StatusLabel.Text = _logAnalyserBL[0].statusBar;
                                       
                    for(int i=0; i< _arrayLength; i++)
                    {
                        if(!_logAnalyserBL[i].isRunning)
                            flag = true;
                        else
                        {
                            flag = false;
                            break;
                        }
                    }

                    if(flag)
                        _state = 2;

                }
                //
                // Adding all the Logfiles Hash Table objects into the arraylist to
prepare the Result Nodes
                //
                else if (_state == 2)
                {
                    StatusLabel.Text = string.Format("Preparing Results...", _logs.Count);

                    // Adding all the log file hashtable objects into the
Arraylist    "_logs"               
                    for(int i=0; i< _arrayLength; i++)
                    {                   
                        foreach (LogEntryNode n in _logAnalyserBL[i].results)
                        {
                            _logs.Add(n);
                        }

                        _logAnalyserBL[i] = null;
                    }

                    //Logic for Increment the counter to read 12 files at a time
                    if(_fileCounter < fileNames.Count - fileNames.Count%12)
                    {
                        _fileCounter += 12;                       
                       
                        _state = 0;
                                               
                        if( _fileCounter == fileNames.Count - fileNames.Count%12)
                        {
                            _arrayLength  = fileNames.Count%12;   
                           
                            if(_fileCounter == fileNames.Count)
                            {
                                _state = 3;                       
                            }   
                        }
                        else if(_fileCounter == fileNames.Count)
                        {
                            _state = 3;                       
                        }                       
                    }
                    else
                        _state = 3;
                }
                //
                // Reading the Log arraylist & Creating the corresponding Hashtables
                //   
                else if (_state == 3)
                {
                    _logMatcher        = new LogMatcher();
                    _logMatcher.logs = _logs;
                   
                    ThreadStart threadStart    = new ThreadStart(_logMatcher.DoWork);
                    Thread logAnalysisThread = new Thread(threadStart);
                    logAnalysisThread.Name    = "Log Analysis";
                    logAnalysisThread.Start();

                    _state = 4;
                }
                    //
                    // Checking the thread is running or not
                    //   
                else if (_state == 4)
                {
                    if (!_logMatcher.isRunning)
                        _state = 5;
                    else
                        StatusLabel.Text = _logMatcher.status;
                }
                    //
                    // Updating the Status
                    //   
                else if (_state == 5)
                {
                    this.StatusLabel.Text = "Converting to TreeView...";
                    _state = 6;
                }
                    //
                    // Converting the resultant nodes into TreeView form
                    //   
                else if (_state == 6)
                {
                    this.StatusLabel.Text = "Converting to TreeView...";
                    MetaDataTreeView.BeginUpdate();

                    MetaDataTreeView.Nodes.Add(new TreeNode("QUERY Nodes"));
                    MetaDataTreeView.Nodes.Add(new TreeNode("SAVE Nodes"));

                    int userCount = 1;
                   
                    foreach (string key in _logMatcher.orderedResults)
                    {
                       
                        ArrayList logEntryArrayList = (ArrayList) _logMatcher.results[key];
                   
                        TreeNode tNode = null;
                        string startTime = string.Empty;
                        string endTime    = string.Empty;
                        string timeTaken = string.Empty;
                       
                        if (_logMatcher.methods.ContainsKey(key))
                        {
                            LogEntryNode logEntryNode = (LogEntryNode) _logMatcher.methods[key];

                            startTime = logEntryNode.GetVal("Time");

                            tNode = new TreeNode(string.Format("{0} {1}", logEntryNode.themethod,
logEntryNode.GetVal("Time")));
                       
                            // get the user
                            string user = logEntryNode.GetVal("Original User");
                            if (user.Length == 0)
                            {
                                string refString = logEntryNode.GetVal("Ref");
                                user = refString.Substring(0, refString.IndexOf("-"));
                            }
                           
                            //user += " - Ping";

                            if (!_users.ContainsKey(user))
                                _users.Add(user, new LogUserDetails(user, userCount++));

                            LogUserDetails logUserDetails = (LogUserDetails) _users[user];
                            tNode.BackColor = logUserDetails.color;
                            logUserDetails.count++;
                        }
                        else
                        {
                            tNode = new TreeNode( "Unknown" );
                            tNode.BackColor = Color.LightCoral;
                        }

                        TreeNode keyNode = new
TreeNode(((LogEntryNode)logEntryArrayList[0]).key);
                        tNode.Nodes.Add(keyNode);
                       
                        foreach (LogEntryNode n in logEntryArrayList)
                        {                                               
                            tNode.Nodes.Add(n);
                           
                            if( n.Text.IndexOf(": Completed") > 0)
                            {
                                endTime = n.GetVal("Time");                                    
                                if( startTime != string.Empty && endTime != string.Empty )
                                {
                                    timeTaken = Convert.ToString( Convert.ToDateTime(endTime) -
Convert.ToDateTime(startTime));                                   
                                }                                                           
                            }                       
                        }

                        TreeNode timeTakenNode    = new TreeNode("Time Taken: " + timeTaken );
                        timeTakenNode.BackColor = Color.Red;
                        timeTakenNode.ForeColor = Color.White;
                       
                        tNode.Nodes.Add(timeTakenNode);
                                                       
                        // Distinguishing the Log entries on the Basis of the query & Save
                        if(tNode.Text.IndexOf("Query") == 0)
                        {                           
                            MetaDataTreeView.Nodes[0].Nodes.Add(tNode);
                        }
                        else
                        {
                            MetaDataTreeView.Nodes[1].Nodes.Add(tNode);
                        }                       
                    }

                    MetaDataTreeView.EndUpdate();
                   
                    _logMatcherHash.Clear();

                    _logMatcherHash.Add("SEARCH_LOG", _logMatcher);
                    _logMatcher = null;
                    _state = 7;
                }
                    //
                    // Analysing the Treeview Nodes
                    //   
                else if (_state == 7)
                {
                    this.StatusLabel.Text = "Analysing ...";
               
                    this.LogSummaryTreeView.BeginUpdate();
                    LogSummaryTreeView.Nodes.Clear();

                    TreeNode usersNode = new TreeNode("Users");
                    LogSummaryTreeView.Nodes.Add(usersNode);
                   
                    int count = 0;
                    foreach (LogUserDetails ld in this._users.Values)
                    {
                        count += ld.count;
                        ld.Update();
                        usersNode.Nodes.Add(ld);
                    }

                    usersNode.Text = string.Format("Users - Total = {0}", count);
                    usersNode.ExpandAll();
                    _users.Clear();

                    _state = 8;
                }                   
                else if (_state == 8)
                {
                    StatusLabel.Text = "Log Entries are available now in Tree View
Controls.";
                    LogSummaryTreeView.EndUpdate();
                    _state = -1;

                    Cursor = Cursors.Default;
                }
                else
                {
                    _state = -1;               
                    SearchTimer.Enabled = false;           
                }
           
                if (_state != -1)
                    SearchTimer.Enabled = true;           
            }
            catch(Exception ex)
            {
                MessageBox.Show(ex.Message);
            }
        }
        #endregion

*************************************************

Method 3: LogAnalyserBL.DoWork()

#region DoWork
        /// <summary>
        /// Reading the Selected Log files and writing the file contents into
ArrayList
        /// </summary>
        public void DoWork()
        {
            try
            {
                statusBar = string.Format( "Reading {0} ...", _fileName);
                Stream strm = File.Open(_fileName, FileMode.Open, FileAccess.Read,
FileShare.Read);
               
                logFileReader = new StreamReader(strm);
                string line = null;

                long currentCount = 0;
                LogEntryNode activeNode = null;

                //Reading the selected Log file Line by Line
                while ((line = logFileReader.ReadLine()) != null)
                {
                    if (line == "[LOG_ENTRY]")
                    {
                        if (activeNode != null && isCompatability)
                            results.Add(activeNode);

                        activeNode = new LogEntryNode(isCompatability);
                    }
                    else if (line == "[END_LOG_ENTRY]")
                    {
                        results.Add(activeNode);
                        activeNode = null;
                    }
                    else if (activeNode != null)
                    {
                        //adding the Logfile Lines into the LogEntryNode value
                        activeNode.AddVal(line);
                    }
                    currentCount++;
                   
            }   
                logFileReader.Close();
                logFileReader = null;
                strm.Close();
                statusBar = "Reading Complete";

                long count = 0;
                long repCounter = 0;
                statusBar = string.Format("Updating {0} Nodes...", results.Count);
                foreach (LogEntryNode n in results)
                {
                    // update the value of the node...
                    n.UpdateNow() ;

                }                }
            catch(System.Exception ex)
            {
                Debug.WriteLine(ex.Message);
                statusBar = "Error";
                MessageBox.Show(ex.Message);
                return;
            }
            finally
            {
                isRunning = false;
                if (logFileReader != null)
                    logFileReader.Close();
            }
        }
        #endregion
       
************************************************       

Method 4: LogMatcher.DoWork

#region DoWork

    public Hashtable results = new Hashtable();
    public Hashtable methods = new Hashtable();
    public ArrayList orderedResults = new ArrayList();
    public ArrayList logs    = null;
       
        /// <summary>
        /// Adding the LogEntries in the HashTables
        /// </summary>
        public void DoWork()
        {
            ArrayList kiddies = null;
            foreach (LogEntryNode leNode in logs)
            {
                if (!results.ContainsKey(leNode.key))
                {
                   
                    results.Add(leNode.key, new ArrayList());
                           
                    orderedResults.Add(leNode.key);
                }

                if (leNode.startCall)
                {
                    if (!methods.ContainsKey(leNode.key))
                        methods.Add(leNode.key, leNode);
                }

                kiddies = (ArrayList) results[leNode.key];
                kiddies.Add(leNode);
            }
            isRunning = false;
        }
        #endregion
       
**************************************************

Jon, If u see the above 4 methods then u can get the idea How I am reading
the files & how I am storing the Data into Hashtables & ArrayLists
Actually as u know the size of each file is 2.5 MB and there r atmost 60
files in each directory.

Now please suggest me the correct solution to reduce the memory usage in my
application.

Thanks
Arun

> >   I am working on a tool which compares log files from different servers.
> > The application allows comparision of data from 2 servers at a time.  A max
[quoted text clipped - 32 lines]
> See http://www.pobox.com/~skeet/csharp/complete.html for details of
> what I mean by that.
Jon Skeet [C# MVP] - 24 Jul 2006 06:55 GMT
> > Could you post a short but complete program which demonstrates the
> > problem?
>
> I am just writing the functions I am calling to read files & storing the
> Data in to HashTables / Arraylist.

Why, when I specifically asked for a complete program? I can't run the
code you've given, so I can't verify whether or not it has the same
problem on my box.

I'll have a look at it, but it would have been a lot quicker if you'd
posted a complete program.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Ben Voigt - 31 Aug 2006 20:57 GMT
At which step are you looking at the memory usage.  A TreeView has a lot of
overhead.  I also see nowhere that the ArrayList or Hashtable gets freed for
garbage collection.  So you probably have three copies of your data, in
Unicode, plus TreeView overhead, of a 150 MB dataset, hence >900MB.

It doesn't help that you have multiple methods named DoWork... you shouldn't
have any actually, it's a horrible name.  You gain nothing by starting
multiple threads to write to the same Hashtable.

I'd bring all the data processing back to one worker thread, using
overlapped I/O and completion callbacks (WaitForSingleObjectEx, let the
single object be a cancel event set from the GUI thread).  Forget ReadLine,
reading a one-page chunk (usually 4k) will be much faster.  There's no need
to save up the entire block before adding to the Hashtable either, once
you've read the key, add the pair (key, new StringBuilder()) and then start
appending data to the StringBuilder (or whatever object).  And set the
hashtable to null after you add the nodes to the treeview (either remove
each element as you put in into the treeview, or dereference the entire
hashtable at the end).

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.