> I am having a hard time figuring out why this regular expression does not
> have multiple captures for the group. When checking the regular expression
[quoted text clipped - 5 lines]
> Regular Expression:
> (?<studentname>([\w-!\[\].,;:?/!@#$%^&*()<>{}|\~`'"=+-]+))=\w+(?:\s+|$)
At a quick glance, the problem is that the "studentname" group doesn't
have a quantifier behind it. You can't get multiple captures witout a
quantifier (*, +, ?) behind the group.
What you do get, currently, and what a regular expression tool might
show you, are multiple matches. The complete expression is matched more
than once to the input string and each of these matches has its own
"studentname" group, that's what you are probably seeing.
Now, two choices: Either you just evaluate the various matches in your
code (use the Matches method instead of Match to retrieve them all) or
you rewrite the expression to include a quantified group so that you'll
actually get multiple captures. In a simple case, like this:
Student\sResults.*?\:\s*(?<assignment>(?<studentname>[\w-!\[\].,;:?/!@#$%^&*()<>{}|\~`'"=+-]+)=\w+(?:\s+|$))*
This should give you two named groups "assignment" and "studentname",
each of which has multiple captures. Hope this helps!
Oliver Sturm

Signature
omnibus ex nihilo ducendis sufficit unum
Spaces inserted to prevent google email destruction:
MSN oliver @ sturmnet.org Jabber sturm @ amessage.de
ICQ 27142619 http://www.sturmnet.org/blog
Amy L. - 29 Jul 2005 20:45 GMT
Thank you so much. After testing your regular expression I see the
difference in the tool on the difference of multiple matches versus multiple
captures are.
Do you have an opinion on what is more efficient - iterating through
multiple matches or iterating through multiple captures under one group?
Amy.
>> I am having a hard time figuring out why this regular expression does not
>> have multiple captures for the group. When checking the regular
[quoted text clipped - 27 lines]
>
> Oliver Sturm
Oliver Sturm - 30 Jul 2005 16:30 GMT
> Thank you so much. After testing your regular expression I see the
> difference in the tool on the difference of multiple matches versus multiple
> captures are.
>
> Do you have an opinion on what is more efficient - iterating through
> multiple matches or iterating through multiple captures under one group?
I'm willing to have an opinion, but I can't really think of one :-)
Generally I would think that finding multiple captures may involve less
overhead in the regular expression engine, because it's an intrinsic
part of the algorithm, while finding multiple matches involves running
the expression against the input multiple times. But then this depends
on the implementation details and quality of the engine, and even in the
case where multiple captures are found, additional runs are made anyway
to look for additional matches, even if none are found.
I'd say that a carefully implemented engine shouldn't show much of a
difference between the two, but I wouldn't be surprised if many engines
did actually show quite a difference, depending on the pattern, the
input and probably other parameters. Might be interesting to do some
tests here with the .NET implementation ...
Oliver Sturm

Signature
omnibus ex nihilo ducendis sufficit unum
Spaces inserted to prevent google email destruction:
MSN oliver @ sturmnet.org Jabber sturm @ amessage.de
ICQ 27142619 http://www.sturmnet.org/blog
Oliver Sturm - 30 Jul 2005 17:59 GMT
> Might be interesting to do some tests here with the .NET implementation ...
Well, I just did :-) Here's the results:
http://www.sturmnet.org/blog/archives/2005/07/30/regex-multiple-matches-captures/
Oliver Sturm

Signature
omnibus ex nihilo ducendis sufficit unum
Spaces inserted to prevent google email destruction:
MSN oliver @ sturmnet.org Jabber sturm @ amessage.de
ICQ 27142619 http://www.sturmnet.org/blog
clever code. If I ever inherit it, I will chuck it and replace it with a
simple set of parsing expressions.
You code has a really high "bus factor." That means that if you are ever
hit by a bus, your team is screwed.
just a head's up.

Signature
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik
Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
>I am having a hard time figuring out why this regular expression does not
>have multiple captures for the group. When checking the regular expression
[quoted text clipped - 27 lines]
> Any help would be greatly appreciated.
> Amy.
Amy L. - 02 Aug 2005 02:18 GMT
Nick,
I would have to agree with you - the original code was implemented using a
simpler method of splitting the string and grabbing what we needed.
However, our dataset consists of multiple files that are easily over a gig
each and when you have to process roughly 30 at a time it takes a bit of
time. We looked to see if regular expression parsing was faster than what
we had implemented to begin with. Long story short it was not :)
Amy.
> clever code. If I ever inherit it, I will chuck it and replace it with a
> simple set of parsing expressions.
[quoted text clipped - 34 lines]
>> Any help would be greatly appreciated.
>> Amy.