Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / VB.NET / March 2008

Tip: Looking for answers? Try searching our database.

Fastest Way to search for a string in a large text file (75 to 100mb)

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Clinto - 28 Feb 2008 04:25 GMT
Hi,
I am trying to find the fastest way to search a txt file for a
particular string and return the line that contains the string. I have
so for just used the most basic method. Initialized a variable as
IO.streamreader. Read each line and perform an if-then to see if
var.contains(mystring) is true or false. if true I get my string if
false it reads the next line. This takes for ever. Is there anything I
can do to speed this up?
Thanks.
Chris K. - 28 Feb 2008 04:53 GMT
How big are these files?

> Hi,
> I am trying to find the fastest way to search a txt file for a
[quoted text clipped - 5 lines]
> can do to speed this up?
> Thanks.
Clinto - 29 Feb 2008 03:18 GMT
On Feb 27, 10:53 pm, "Chris K." <ckoeber[Do Not
Spam]@googlesemailservice.figureitout> wrote:
> How big are these files?
>
[quoted text clipped - 9 lines]
>
> - Show quoted text -

usually anywhere from 75 to 100mb
Tom Shelton - 28 Feb 2008 05:49 GMT
> Hi,
> I am trying to find the fastest way to search a txt file for a
[quoted text clipped - 5 lines]
> can do to speed this up?
> Thanks.

if the file is only a 100mb...  Then, seriously, I would  just read
the entire file at once and process it in memory.  If you read line-by-
line, then you are going to hit the disk a lot, and that will really
slow you down...

If you don't want to do it that way, then you might want to read the
file in chunks as binary data - then convert your bytes to strings and
do yor compares...  of course, that is going to make it a little
tricky because you might end up in the middle of a line....

--
Tom Shelton
kimiraikkonen - 28 Feb 2008 08:40 GMT
> > Hi,
> > I am trying to find the fastest way to search a txt file for a
[quoted text clipped - 18 lines]
> --
> Tom Shelton

I agree Tom, about 100mb is a huge size for a text file, reading it as
raw then converting each byte to string is a good idea, but the key
point is how to do it programmaticaly :-)
(O)enone - 28 Feb 2008 12:32 GMT
> I agree Tom, about 100mb is a huge size for a text file, reading it as
> raw then converting each byte to string is a good idea, but the key
> point is how to do it programmaticaly :-)

The most efficient way would presumably be to read the entire file into a
single string using IO.File.ReadAllText and see whether your search string
is contained within the file at all (which you can then do using a single
call to .Contains). If it't not there then there's no point trying to work
out which line it's on, and you can stop looking any further straight away.

If you do find the search string, you can count the line breaks that appear
before the search string to work out which line it's on.

HTH,

Signature

(O)enone

Family Tree Mike - 28 Feb 2008 12:45 GMT
> > I agree Tom, about 100mb is a huge size for a text file, reading it as
> > raw then converting each byte to string is a good idea, but the key
[quoted text clipped - 10 lines]
>
> HTH,

I would use System.IO.File.ReadAllLines(Filename), because this returns the
lines split out for you.  You just loop through the array of individual lines
in the array.
(O)enone - 28 Feb 2008 12:58 GMT
> I would use System.IO.File.ReadAllLines(Filename), because this
> returns the lines split out for you.  You just loop through the array
> of individual lines in the array.

I did originally write the same thing in my message but then chose to remove
it before I posted it. I think the ReadAllText approach may be quicker
because you can check whether the string exists at all without having to
loop... You could them possible determine the line by using a call to
Replace() on the string prior to the search result position, changing the
two-character line break with a one-character replacement string, and then
see how much smaller the string has got; the number of characters it reduces
by will be the line count.

Maybe needs someone to try it to see which is more efficient.

Signature

(O)enone

Clinto - 01 Mar 2008 04:12 GMT
On Feb 28, 6:58 am, "\(O\)enone" <oen...@nowhere.com> wrote:
> > I would use System.IO.File.ReadAllLines(Filename), because this
> > returns the lines split out for you.  You just loop through the array
[quoted text clipped - 14 lines]
>
> (O)enone

Thanks everyone, I appreciate the responses. I tried several methods,
ReadAllText, io.filestream, readallLines and all seem about the same.
It became apparent that I am also fighting a slow server connection,
which increases the time to open the files.
Cor Ligthert[MVP] - 01 Mar 2008 18:52 GMT
Clinto,

Use the Visual Basic Find as that is optimized for strings, any other method
will go slower, just because those are optimized for characters.

Cor

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.