> Given the regular expression:
>
> S"^([a-zA-Z]+|[a-zA-z]+\\s[a-zA-Z]+)$"
>
> 1) Isn't the "[a-zA-Z]+|[a-zA-z]+" part redundant? As far as I can
> understand it means exactly the same as "[a-zA-Z]+" alone.
No, because of the alternative - it's
[a-zA-Z]+
-or-
[a-zA-z]+\\s[a-zA-Z]+
> 2) Isn't the parenthesis grouping redundant?
Since it's the entire expression, yes. If this expression was embedded
inside a larger regex then no - it defines the limits of the alternative.
> 3) How can we define the parenthesis characters themselves as expected
> characters in a match?
Just escape them: \\(. You shouldn't need to escape the right paren in
most cases - just the left.
-cd
Ioannis Vranos - 26 Feb 2005 08:36 GMT
>>1) Isn't the "[a-zA-Z]+|[a-zA-z]+" part redundant? As far as I can
>>understand it means exactly the same as "[a-zA-Z]+" alone.
[quoted text clipped - 6 lines]
>
> [a-zA-z]+\\s[a-zA-Z]+
I did not understand what you mean with the above. May you explain with
some details?
>>2) Isn't the parenthesis grouping redundant?
>
[quoted text clipped - 6 lines]
> Just escape them: \\(. You shouldn't need to escape the right paren in
> most cases - just the left.
Ok, thanks for the info.

Signature
Ioannis Vranos
Carl Daniel [VC++ MVP] - 26 Feb 2005 14:36 GMT
>>> 1) Isn't the "[a-zA-Z]+|[a-zA-z]+" part redundant? As far as I can
>>> understand it means exactly the same as "[a-zA-Z]+" alone.
[quoted text clipped - 9 lines]
> I did not understand what you mean with the above. May you explain
> with some details?
The alternative operation has low precendence - lower than concatenation, so
(bob|joe|sue)
parses as 'bob' or 'joe' or 'sue' not as 'bo'+('b' or 'j')+'o'+('e' or
's')+'ue'
similarly,
[a-zA-Z]+|[a-zA-Z]+\\s+[a-zA-Z]+
parses as
'[a-zA-Z]+' or '[a-zA-Z]+\\s[a-zA-Z]+'
instead of
('[a-zA-Z]+' or '[a-zA-Z]+')\\s+[a-zA-Z]+
does that make sense?
The original expression could be factored, since the alternatives have a
common prefix:
[a-zA-Z]+(\\s+[a-zA-Z]+)?
I would expect a DFA-based regex engine might well do that factoring as a
matter of course when computing the DFA.
-cd
Ioannis Vranos - 26 Feb 2005 15:25 GMT
> The alternative operation has low precendence - lower than concatenation, so
>
[quoted text clipped - 24 lines]
> I would expect a DFA-based regex engine might well do that factoring as a
> matter of course when computing the DFA.
Thanks for the explanation.

Signature
Ioannis Vranos
IV> S"^([a-zA-Z]+|[a-zA-z]+\\s[a-zA-Z]+)$"
Note that the [A-z] character set listed above (in the second group) includes
non-alphabetic characters.

Signature
Serge
Ioannis Vranos - 26 Feb 2005 19:20 GMT
> IV> S"^([a-zA-Z]+|[a-zA-z]+\\s[a-zA-Z]+)$"
>
> Note that the [A-z] character set listed above (in the second group)
> includes non-alphabetic characters.
Thanks for the correction, it was just a typo of mine, it was meant to be:
S"^([a-zA-Z]+|[a-zA-Z]+\\s[a-zA-Z]+)$"

Signature
Ioannis Vranos