Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / January 2008

Tip: Looking for answers? Try searching our database.

Looking for a collection that uses hard disk as storage

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
GG - 17 Jan 2008 22:11 GMT
Anybody knows of any collection where is not stored in memory but using
hard disk instead?

Thanks
Tom Dacon - 17 Jan 2008 22:25 GMT
does the word DataTable ring any bells for you?

Tom Dacon
Dacon Software Consulting

> Anybody knows of any collection where is not stored in memory but using
> hard disk instead?
>
> Thanks
>
> *** Sent via Developersdex http://www.developersdex.com ***
GG - 17 Jan 2008 22:51 GMT
I just checked doc and says "Represents one table of in-memory data.".
In addition, I added 1 million rows of data and ram went from 18MB to 1
GIG.
If the data was stored in hard disk the Ram should have not grown
dramatically from 18MB.

Thanks
Ignacio Machin ( .NET/ C# MVP ) - 18 Jan 2008 14:09 GMT
Hi,

There is one, it;s called MS-SQL server (or SQL Express)

Signature

Ignacio Machin
http://www.laceupsolutions.com
Mobile & warehouse Solutions.

>I just checked doc and says "Represents one table of in-memory data.".
> In addition, I added 1 million rows of data and ram went from 18MB to 1
[quoted text clipped - 5 lines]
>
> *** Sent via Developersdex http://www.developersdex.com ***
Marc Gravell - 18 Jan 2008 14:59 GMT
Minor aside; we've had multiple suggestions to the OP that a database
is the right way to go here [hint: it really is!] - but if the OP is
worried about being to support both in-memory and disk (database)
based queries, then perhaps LINQ-to-SQL has the answer?

i.e. use LINQ to generate the entities (otherwise it'll never work
later), but in your code simply work with IQueryable<Whatever>, and
supply the source at runtime (based on the data volume).
For instance, a List<Whatever> will work just fine
with .ToQueryable(), but so will a Table<Whatever>.

Note that obviously the list won't automatically be indexed of course.

Actually, even the above is probably over-designing it. Keep it simple
and code to the worst-case scenario: use the database approach from
the outset. If it is running locally, then it should be pretty quick
regardless of the data volume (or at least, the performance will
degrade at an acceptable rate, unlike in-memory which doesn't scale so
well). If you need more data, buy a bigger database (i.e. start with
SQL Express).

Marc
GG - 18 Jan 2008 19:17 GMT
Thank all for the comments. I should have been more specific.

We do use SQL server as the backend.We are inserting records in SQL
using bcp from a different source. However, sometimes we may need to
replay messages but only pick the ones that did not make it into the
database. Going to Sql server to check for record existense is very
expensive. Inserting dup keys with ignore_dup_key on will be very slow.

Probably Arne Vajh?j idea of
"A collection that's designed to run off the disk will probably have an
indexing system"
would be the ideal solution for us.

Anybody knows of such a solution?

Thanks
Marc Gravell - 18 Jan 2008 21:57 GMT
> However, sometimes we may need to
> replay messages but only pick the ones that did not make it into the
> database. Going to Sql server to check for record existense is very
> expensive. Inserting dup keys with ignore_dup_key on will be very slow.

Yes, round-tripping in this way would be a bad idea.
In this scenario, I would bulk-insert into a staging table, and then
either remove the duplicates via a JOIN to the actual table, or just
do an insert from staging => actual where the unique key* doesn't
exist in the actual data. If you want to minimise the size of the
wokring set, you could do this in batches of a few thousand. It would
be fairly easy to write an IDataReader implementation (for use with
SqlBulkCopy) that splits the input file into batches.

If you want to just find out which keys are missing, then perhaps just
build up (again in batches) a set of the keys (as CSV/TSV); send that
to the db and ask the db which ones it doesn't have; then commit the
missing data.

But again I claim that this should be largely a database
implementation (perhaps just with some C# to orchestrate the
SqlBulkCopy). C#/.NET is a great tool, but don't be fooled that it is
the right tool for every job.

*=I'm /hoping/ that there is a unique key you can use for this
purpose!

Marc
Arne Vajhøj - 19 Jan 2008 02:27 GMT
> We do use SQL server as the backend.We are inserting records in SQL
> using bcp from a different source. However, sometimes we may need to
[quoted text clipped - 6 lines]
> indexing system"
> would be the ideal solution for us.

Actually that was something Jesse McGrew wrote and I just quoted.

Arne
Arne Vajhøj - 19 Jan 2008 02:32 GMT
> Thank all for the comments. I should have been more specific.
>
[quoted text clipped - 8 lines]
> indexing system"
> would be the ideal solution for us.

One idea: start the "replay" code by reading all primary keys
into a Hashtable/Dictionary<> and check existence there before
inserting.

Arne
christery@gmail.com - 17 Jan 2008 22:56 GMT
Marc Gravell - 17 Jan 2008 22:57 GMT
> does the word DataTable ring any bells for you?

DataTable in in memory... but can be *persisted* to a database or flat-
file : but that isn't the same thing.

> Anybody knows of any collection where is not stored in memory but using
> hard disk instead?

What is the reason for this requirement? I would suggest that in most
cases where this is an issue, a database would be the best option. For
example, SQL Server Express Edition is free, pretty robust, with good
support for most constucts (including xml support etc).

Marc
Arne Vajhøj - 18 Jan 2008 01:14 GMT
> Anybody knows of any collection where is not stored in memory but using
> hard disk instead?

As others has stated then database is one obvious solution.

If you are on 64 bit you could also just use a collection.

What there are not space for in RAM will be on disk in the
pagefile.

Arne
Michael C - 18 Jan 2008 02:19 GMT
> As others has stated then database is one obvious solution.
>
> If you are on 64 bit you could also just use a collection.
>
> What there are not space for in RAM will be on disk in the
> pagefile.

Not really the best solution as it would slow the machine down dramatically.
It could work ok if you could specify it to use the swap file only but afaik
this is not an option.

Michael
Arne Vajhøj - 18 Jan 2008 02:27 GMT
>> As others has stated then database is one obvious solution.
>>
[quoted text clipped - 4 lines]
>
> Not really the best solution as it would slow the machine down dramatically.

Why should a page file be slower than a ny other disk file ?

> It could work ok if you could specify it to use the swap file only but afaik
> this is not an option.

Since Windows only has a page file then ...

Arne
Jesse McGrew - 18 Jan 2008 02:44 GMT
> >> As others has stated then database is one obvious solution.
>
[quoted text clipped - 6 lines]
>
> Why should a page file be slower than a ny other disk file ?

An in-memory collection whose contents are being paged to and from the
disk by the OS will have worse performance than a collection designed
to operate off the disk, as soon as you do any kind of search on it.

A collection that's designed to run off the disk will probably have an
indexing system so it doesn't have to load the entire file to find a
single element. But searching through a massive memory-based
collection will cause many pages to be swapped in, possibly causing
other useful pages to be swapped out and lowering performance down the
road.

For example, a binary search on a memory-based collection might end up
having to load half the file into memory, one page at a time, while a
disk-based collection could keep all the necessary indexing data in a
single page that never gets swapped out.

Jesse
Arne Vajhøj - 18 Jan 2008 03:41 GMT
>>>> As others has stated then database is one obvious solution.
>>>> If you are on 64 bit you could also just use a collection.
[quoted text clipped - 18 lines]
> disk-based collection could keep all the necessary indexing data in a
> single page that never gets swapped out.

"A collection that's designed to run off the disk will probably have an
indexing system"

Sounds as a great idea.

But I get an even better idea. Let us implement that in memory as well.
We could call it Hashtable or Dictionary.

:-)

Arne
Peter Duniho - 18 Jan 2008 03:48 GMT
>> Why should a page file be slower than a ny other disk file ?
>
> An in-memory collection whose contents are being paged to and from the
> disk by the OS will have worse performance than a collection designed
> to operate off the disk, as soon as you do any kind of search on it.

Why?  There's no a priori reason to believe this is true, even though it's  
true that _some_ in-memory collections may not be as effecient as a  
disk-based database.

> A collection that's designed to run off the disk will probably have an
> indexing system so it doesn't have to load the entire file to find a
> single element. But searching through a massive memory-based
> collection will cause many pages to be swapped in, possibly causing
> other useful pages to be swapped out and lowering performance down the
> road.

You're assuming that the in-memory structure would not have a similar  
indexing mechanism.

Now, I don't know the implementation of DataTable.  But as a general  
concept, there's absolutely no reason it couldn't be indexed in basically  
the same way as a database.  Conversely, if a database implements (for  
example) an index as a simple sorted array that uses a binary search, it's  
going to have the exact same liability that an in-memory structure paged  
to the disk using the same indexing scheme would have.

> For example, a binary search on a memory-based collection might end up
> having to load half the file into memory, one page at a time, while a
> disk-based collection could keep all the necessary indexing data in a
> single page that never gets swapped out.

If that's a concern, why wouldn't someone just have a similar "index only"  
data section for their data structure for the in-memory implementation?

It seems to me that if all you know is that one implementation is a  
disk-based database and another is an in-memory data structure, that that  
is not nearly enough information to tell you which will perform better.

Pete
Jesse McGrew - 20 Jan 2008 10:00 GMT
On Jan 17, 7:48 pm, "Peter Duniho" <NpOeStPe...@nnowslpianmk.com>
wrote:
> >> Why should a page file be slower than a ny other disk file ?
>
[quoted text clipped - 5 lines]
> true that _some_ in-memory collections may not be as effecient as a
> disk-based database.

Yes, it's possible to write an in-memory collection that will perform
as well when paged to disk as a disk-based collection. But the
specific collection types that have been mentioned here don't fit the
bill, nor do any of the other standard in-memory collections (AFAIK).

Jesse
Peter Duniho - 20 Jan 2008 18:19 GMT
> Yes, it's possible to write an in-memory collection that will perform
> as well when paged to disk as a disk-based collection. But the
> specific collection types that have been mentioned here don't fit the
> bill, nor do any of the other standard in-memory collections (AFAIK).

I don't know about that.  They may not be optimized for swapping behavior,  
but any of the indexed collections are implemented with an index that is  
separate from the data collection itself and finding an element requires  
only iterating in some way through the index, not the entire data  
collection.  In the same way that a file-based, index-based collection  
only needs to pull in data from an index file, rather than the entire  
database itself, so too does a Dictionary<> (for example) only need to  
pull in data from the hashed index, rather than the entire collection in  
order to find a specific element.

Is there something you see as being particularly different about the two  
scenarios?  I'm not seeing it myself.  I could imagine that there are  
subtle differences in performance, but I don't see anything fundamentally  
different about them.

Pete
Arne Vajhøj - 21 Jan 2008 01:18 GMT
> On Jan 17, 7:48 pm, "Peter Duniho" <NpOeStPe...@nnowslpianmk.com>
> wrote:
[quoted text clipped - 10 lines]
> specific collection types that have been mentioned here don't fit the
> bill, nor do any of the other standard in-memory collections (AFAIK).

It is not obvious to me why Hashtable/Dictionary<> should not
have nice O(1) characteristics for number of pages needed
to be read from disk.

Can you elaborate a bit ?

Arne

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.