Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / New Users / August 2005

Tip: Looking for answers? Try searching our database.

Regular Expression and Multiple Group Captures

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Amy L. - 29 Jul 2005 04:51 GMT
I am having a hard time figuring out why this regular expression does not
have multiple captures for the group.  When checking the regular expression
in a testing tool like "Expresso" it seems to work fine.

Input (All on one line - watch for wordwrap):
Student Results [weight=103]: SMITH=PASS JONES=WARN WRIGHT=WARN JOHNSON=WARN

Regular Expression:
(?<studentname>([\w-!\[\].,;:?/!@#$%^&*()<>{}|\~`'"=+-]+))=\w+(?:\s+|$)

Expected Output: A Group with multiple captures of: "SMITH", "JONES",
"WRIGHT".

Code I was using:

Regex myRegexTest = new Regex(
@"(?<studentname>([\w-!\[\].,;:?/!@#$%^&*()<>{}|\~`'""=+-]+))=\w+(?:\s+|$)",
RegexOptions.IgnoreCase | RegexOptions.Compiled ) ;
m = myRegexTest.Match( sText.ToString() ) ;
Console.WriteLine( "Groups Count: " + m.Groups.Count ) ;
Console.WriteLine( "Groups Capture 0: " + m.Groups[0].Captures.Count ) ;
Console.WriteLine( "Groups Capture 1: " + m.Groups[1].Captures.Count ) ;
Console.WriteLine( "Groups Capture 2: " + m.Groups[2].Captures.Count ) ;

When I look at the output I get 3 groups each with one capture.  When I look
at whats captured I always end up with just "SMITH" and never the other two
names.

Any help would be greatly appreciated.
Amy.
Oliver Sturm - 29 Jul 2005 10:48 GMT
> I am having a hard time figuring out why this regular expression does not
> have multiple captures for the group.  When checking the regular expression
[quoted text clipped - 5 lines]
> Regular Expression:
> (?<studentname>([\w-!\[\].,;:?/!@#$%^&*()<>{}|\~`'"=+-]+))=\w+(?:\s+|$)

At a quick glance, the problem is that the "studentname" group doesn't
have a quantifier behind it. You can't get multiple captures witout a
quantifier (*, +, ?) behind the group.

What you do get, currently, and what a regular expression tool might
show you, are multiple matches. The complete expression is matched more
than once to the input string and each of these matches has its own
"studentname" group, that's what you are probably seeing.

Now, two choices: Either you just evaluate the various matches in your
code (use the Matches method instead of Match to retrieve them all) or
you rewrite the expression to include a quantified group so that you'll
actually get multiple captures. In a simple case, like this:

Student\sResults.*?\:\s*(?<assignment>(?<studentname>[\w-!\[\].,;:?/!@#$%^&*()<>{}|\~`'"=+-]+)=\w+(?:\s+|$))*

This should give you two named groups "assignment" and "studentname",
each of which has multiple captures. Hope this helps!

               Oliver Sturm
Signature

omnibus ex nihilo ducendis sufficit unum
Spaces inserted to prevent google email destruction:
MSN oliver @ sturmnet.org Jabber sturm @ amessage.de
ICQ 27142619 http://www.sturmnet.org/blog

Amy L. - 29 Jul 2005 20:45 GMT
Thank you so much.  After testing your regular expression I see the
difference in the tool on the difference of multiple matches versus multiple
captures are.

Do you have an opinion on what is more efficient - iterating through
multiple matches or iterating through multiple captures under one group?

Amy.

>> I am having a hard time figuring out why this regular expression does not
>> have multiple captures for the group.  When checking the regular
[quoted text clipped - 27 lines]
>
>                Oliver Sturm
Oliver Sturm - 30 Jul 2005 16:30 GMT
> Thank you so much.  After testing your regular expression I see the
> difference in the tool on the difference of multiple matches versus multiple
> captures are.
>
> Do you have an opinion on what is more efficient - iterating through
> multiple matches or iterating through multiple captures under one group?

I'm willing to have an opinion, but I can't really think of one :-)

Generally I would think that finding multiple captures may involve less
overhead in the regular expression engine, because it's an intrinsic
part of the algorithm, while finding multiple matches involves running
the expression against the input multiple times. But then this depends
on the implementation details and quality of the engine, and even in the
case where multiple captures are found, additional runs are made anyway
to look for additional matches, even if none are found.

I'd say that a carefully implemented engine shouldn't show much of a
difference between the two, but I wouldn't be surprised if many engines
did actually show quite a difference, depending on the pattern, the
input and probably other parameters. Might be interesting to do some
tests here with the .NET implementation ...

               Oliver Sturm
Signature

omnibus ex nihilo ducendis sufficit unum
Spaces inserted to prevent google email destruction:
MSN oliver @ sturmnet.org Jabber sturm @ amessage.de
ICQ 27142619 http://www.sturmnet.org/blog

Oliver Sturm - 30 Jul 2005 17:59 GMT
> Might be interesting to do some tests here with the .NET implementation ...

Well, I just did :-) Here's the results:

http://www.sturmnet.org/blog/archives/2005/07/30/regex-multiple-matches-captures/

               Oliver Sturm
Signature

omnibus ex nihilo ducendis sufficit unum
Spaces inserted to prevent google email destruction:
MSN oliver @ sturmnet.org Jabber sturm @ amessage.de
ICQ 27142619 http://www.sturmnet.org/blog

Nick Malik [Microsoft] - 31 Jul 2005 08:32 GMT
clever code.  If I ever inherit it, I will chuck it and replace it with a
simple set of parsing expressions.

You code has a really high "bus factor."  That means that if you are ever
hit by a bus, your team is screwed.

just a head's up.
Signature

--- Nick Malik [Microsoft]
   MCSD, CFPS, Certified Scrummaster
   http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
  I do not answer questions on behalf of my employer.  I'm just a
programmer helping programmers.
--

>I am having a hard time figuring out why this regular expression does not
>have multiple captures for the group.  When checking the regular expression
[quoted text clipped - 27 lines]
> Any help would be greatly appreciated.
> Amy.
Amy L. - 02 Aug 2005 02:18 GMT
Nick,

I would have to agree with you - the original code was implemented using a
simpler method of splitting the string and grabbing what we needed.
However, our dataset consists of multiple files that are easily over a gig
each and when you have to process roughly 30 at a time it takes a bit of
time.  We looked to see if regular expression parsing was faster than what
we had implemented to begin with.  Long story short it was not :)

Amy.

> clever code.  If I ever inherit it, I will chuck it and replace it with a
> simple set of parsing expressions.
[quoted text clipped - 34 lines]
>> Any help would be greatly appreciated.
>> Amy.

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.