>> [...]
>> What techniques/APIs can I use to speed up file searching?
[quoted text clipped - 8 lines]
>
> Surely there is, but I'm sure you can't do it via .NET directly, nor am
If its indirect, I'm fine with that.
> I at all convinced it would significantly speed up your search, since I
> suspect that Windows is already pulling the "raw data" off the disk
> about as fast as is possible, and feeding it to you with relatively
> little overhead.
Maybe, however I think net may be pulling in things a simply don't need.
> Beyond that, I have to question a design that requires the search of
> 100,000 files or more during startup. At the very least, if you feel
[quoted text clipped - 3 lines]
> to change it so that you don't have to look at every file in every
> directory in the first place.
It shouldn't be that slow. I know the file-system is all over the place
but the amount of data for 100,000 is only about 2mbs worth of
filenames. But maybe windows is doing a good job of the search and its
just how the table is layed out.
> How you might do that, I don't know. I don't know why you have to
> search for this particular file on startup, so I can't suggest ways you
[quoted text clipped - 6 lines]
> directories. Or maybe you detect a file that gets removed before your
> search is done.
I'm certain the user won't change things while they are using the tool.
> Again, since I don't know exactly how the file gets created or deleted,
> or why it is you want to look for it at startup, I can't really comment
> on the exact nuances of searching for it. But the above issues would
> apply in the general case, at least.
In this design its is necessary. Have you ever used visual assist? It
needs to work kinda like that. The whole idea is that the fastest way
to get to something is to simply type it and see the list of
possibilities shrink as you get closer to what you want. Its a optimal
design thing.
> If in the end you decide you don't want to change the design, you may
> want to consider including some sort of auxiliary background process
[quoted text clipped - 3 lines]
> indexing, but wouldn't require that you figure out how to get at it
> (assuming it's possible at all).
This is a good idea. I would probably need to run the service all the
time. There is a way to monitor file-system changes. I may give that a
shot. I hope it doesn't slow down computer time by much like
visualstudio does. Actually I noticed the google desktop searcher does
this.
Thanks for your thoughts.
> Pete
Peter Duniho - 27 May 2007 18:26 GMT
>>> Is there a way to more directly look at the raw data that make up the
>>> file-tables?
>> Surely there is, but I'm sure you can't do it via .NET directly, nor am
>
> If its indirect, I'm fine with that.
Well, there's always p/invoke. If you can find a way in the native Win32
API, you can use p/invoke to call outside of managed code.
>> I at all convinced it would significantly speed up your search, since I
>> suspect that Windows is already pulling the "raw data" off the disk
>> about as fast as is possible, and feeding it to you with relatively
>> little overhead.
>
> Maybe, however I think net may be pulling in things a simply don't need.
I would be surprised if it was accounting for even 10% overhead.
> It shouldn't be that slow. I know the file-system is all over the place
> but the amount of data for 100,000 is only about 2mbs worth of
> filenames. But maybe windows is doing a good job of the search and its
> just how the table is layed out.
You can't just consider the total amount of data. In fact, for the kind
of search you're doing, the total amount of data is probably the
least-relevant piece of information. Most important is how the data is
distributed on the disk, and how many seeks it takes to get at it all. It
all depends on how the directory data is laid out. If it's not all in one
place, seeks and cache misses are going to really slow things down.
> I'm certain the user won't change things while they are using the tool.
Then why not just run your search in the background, using the most recent
previous results until you have updated the data? The user will get
nearly 100% functionality during that window after startup before the
search has completed, without causing a delay in the application starting
up.
> In this design its is necessary. Have you ever used visual assist? It
> needs to work kinda like that. The whole idea is that the fastest way
> to get to something is to simply type it and see the list of
> possibilities shrink as you get closer to what you want. Its a optimal
> design thing.
I have never heard of "virtual assist". I found this link using Google:
http://virtualassistoncall.com/ I don't see what searching 100,000 files
has to do with that though. Your explanation also doesn't actually give
me any insight into why it is that searching 100,000 files when you start
your application is necessary. "Optimal design" and "search 100,000 files
on startup" don't strike me as correlated, at least not at first glance.
> This is a good idea. I would probably need to run the service all the
> time. There is a way to monitor file-system changes. I may give that a
> shot. I hope it doesn't slow down computer time by much like
> visualstudio does. Actually I noticed the google desktop searcher does
> this.
Yes, all of the various desktop search utilities have to do something
similar. Of course, it begs the question as to whether it really makes
sense to have a half-dozen different applications installed, all of which
are trying to do some sort of indexing on the disk. :) But it's
technically possible.
Pete
janderson - 27 May 2007 19:11 GMT
>>>> Is there a way to more directly look at the raw data that make up
>>>> the file-tables?
[quoted text clipped - 47 lines]
> "search 100,000 files on startup" don't strike me as correlated, at
> least not at first glance.
Sorry I meant visual assist.
http://www.wholetomato.com/
>> This is a good idea. I would probably need to run the service all the
>> time. There is a way to monitor file-system changes. I may give that
[quoted text clipped - 9 lines]
>
> Pete
Peter Duniho - 27 May 2007 19:27 GMT
> Sorry I meant visual assist.
>
> http://www.wholetomato.com/
Actually, that's what you wrote. I must be getting dyslexic in my old
age. :)
Sorry for the confusion. Still having seen that, it doesn't seem a lot
different than Visual Studio's built-in "Intellisense" (though it does add
to that). VS doesn't have to search through 100,000 files to accomplish
what it does, nor does it prevent you from using VS before Intellisense
has caught up (in the rare cases when it hasn't...I usually only see the
"still building database" message on my really slow PIII 550Mhz PC anyway).
I do remain unconvinced that you're going to make the search of 100,000
files much faster (unless the code you're using now is *really* broken
somehow, and it doesn't sound like it is). It's more likely that you'll
have success with an approach that simply moves the search out of the
user's way so it's not a bottleneck (whether that's doing it in a
different process, or doing in the background the way Intellisense does).
And of course, I still wonder at the need to do the search in the first
place, but without more details I can only take as granted that you're
correct about that need.
Pete
janderson - 24 Jun 2007 21:53 GMT
>> Sorry I meant visual assist.
>>
[quoted text clipped - 10 lines]
> only see the "still building database" message on my really slow PIII
> 550Mhz PC anyway).
intellisense in visual assist is nice, but thats not what I'm talking
about. You are able to type alt+O partial name of the file u want,
enter. Bam your at that file. Rather then navigating a slow tree. A
mouse movement takes 1-4 seconds to navigate each branch, thats if you
don't have to scroll (which is extremely inefficient) and read 20
different names 3 levels deep. Maybe I'm slow at navigating trees but
it takes me about 20-30 seconds to find a file in VS.
A keyboard click takes (at 60 words a minute) 1 second or less, normally
you only need to type 3 or 4 letters to get what you want. So 7
keystrokes/seconds in total is a lot more efficient. (Visual assist
takes me on average 3 seconds to get to a file).
I guess its a case of you have be be there.
> I do remain unconvinced that you're going to make the search of 100,000
> files much faster (unless the code you're using now is *really* broken
[quoted text clipped - 8 lines]
>
> Pete
I came across this today, thought it may be useful.
http://video.google.com/videoplay?docid=-6856727143023456694&hl=en
Peter Duniho - 24 Jun 2007 22:14 GMT
> I came across this today, thought it may be useful.
>
> http://video.google.com/videoplay?docid=-6856727143023456694&hl=en
At 90 minutes, no. Not useful at all.
From your previous description, you seem to be talking about an indexed
file search. Which is fine, and a common enough practice. But you don't
get that for free. If you intend to index even just the filenames of
100,000 files, you have to read that data from the disk, and that will
take a non-trivial amount of time.
Note that all along, I have assumed that in spite of my failure to
understand the exact application you're trying to write, you really do
need to do what you're asking about. I haven't been questioning that. I
have only questioned whether your expectation that you can do it in some
unnoticeable amount of time is a reasonable one.
I could be wrong about that, but the last post in this thread was four
weeks ago. That seems like plenty of time to have explored your options
and found a faster way to get the file information, if one exists. So
that begs the question: have you in fact found any significant improvement
on the file i/o aspect of your problem?
Absent any progress on the performance aspect, I stand by my previous
suggestion: the way to do this is to do it as other indexing
implementations handle it. Index in the background, and offer the user
whatever functionality you can, however minimal at any given moment, with
whatever has been indexed so far. Once an initial index has been built,
this will provide nearly perfect results immediately, and should only take
a few minutes or so (maybe 10 or 15 if you throttle your indexing so that
the user's other work isn't impeded noticeably) for the index to be
brought up to date.
Pete