Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / Managed C++ / March 2005

Tip: Looking for answers? Try searching our database.

Regular expressions question

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Ioannis Vranos - 11 Mar 2005 07:40 GMT
What is the difference among the following. Please correct me if I am
wrong (this is not a homework, I am just checking System::Regex these
days and have not figured out everything yet).

"a*": As far as I know it means nothing (=it matches even the empty
string ""), or a string consisting of character 'a' followed by 0 or
more characters(?).

"a+": It means a string (=up to the first whitespace as is the case of
all the regular expressions) consisting of character 'a' followed by 0
or more characters.

"a.*": Character 'a' followed by one and 0 or more characters.

"a.+": Character 'a' followed by one and 0 or more characters (same
effect as the above?).

"a.": Character 'a' followed by one character.

Signature

Ioannis Vranos

http://www23.brinkster.com/noicys

adebaene@club-internet.fr - 11 Mar 2005 08:32 GMT
> What is the difference among the following. Please correct me if I am

> wrong (this is not a homework, I am just checking System::Regex these

> days and have not figured out everything yet).
>
> "a*": As far as I know it means nothing (=it matches even the empty
> string ""), or a string consisting of character 'a' followed by 0 or
> more characters(?).
It means 0 or more "a". So it match an empty string, "a", "aaaaa", but
not "bced".

> "a+": It means a string (=up to the first whitespace as is the case of
> all the regular expressions)
Why? A regex can contains whitespaces!!

> consisting of character 'a' followed by 0
> or more characters.
It means one or more instances of "a". So it matches "a", "aaaa", but
not empty string, neither "vfef";

> "a.*": Character 'a' followed by one and 0 or more characters.
"a", followed by 0 or more <anything>.

> "a.+": Character 'a' followed by one and 0 or more characters (same
> effect as the above?).
"a", followed by one or more <anything>

> "a.": Character 'a' followed by one character.
Yes, that's right.

Arnaud
MVP - VC
Ioannis Vranos - 11 Mar 2005 10:21 GMT
>>"a*": As far as I know it means nothing (=it matches even the empty
>>string ""), or a string consisting of character 'a' followed by 0 or
>>more characters(?).
>
> It means 0 or more "a". So it match an empty string, "a", "aaaaa", but
> not "bced".

However under VC++ 2005 Express February 2005 CTP we get for the code:

// This is the main project file for VC++ application project
// generated using an Application Wizard.

#include "stdafx.h"

using namespace System;

int main()
{
    using namespace System::Text::RegularExpressions;

    String ^s="bcdefghij";

    Console::WriteLine(Regex::IsMatch(s, "a*"));
}

True
Press any key to continue . . .

>>"a+": It means a string (=up to the first whitespace as is the case

> Why? A regex can contains whitespaces!!

What I mean is that whitespaces are considered another character class
from alphabetic characters.
adebaene@club-internet.fr - 11 Mar 2005 12:54 GMT
> >>"a*": As far as I know it means nothing (=it matches even the empty
> >>string ""), or a string consisting of character 'a' followed by 0 or
[quoted text clipped - 13 lines]
>      Console::WriteLine(Regex::IsMatch(s, "a*"));
> }

Yes : I said that the regex matches an empty string : So here you match
the empty string at the beginning of "bcdefghij".
What you failed to see is that the IsMatch method try to find a match
inside the given string, it doesn't check that the full string is
matched. Use the Regex.Match method to get the Match object : you'll
see that it matches an empty string (length=0) at index 0 from the
input string.

If you use Regex.Mathes (to get all the matches), you'll see that in
fact it find a 0 length match at each position of the input string, so
you get 10 matches!

> >>"a+": It means a string (=up to the first whitespace as is the case
>
> > Why? A regex can contains whitespaces!!
>
> What I mean is that whitespaces are considered another character class
> from alphabetic characters.

Yes, but "." match anything, including whitespaces.

Arnaud
MVP - VC
Ioannis Vranos - 11 Mar 2005 14:53 GMT
>>However under VC++ 2005 Express February 2005 CTP we get for the
>
[quoted text clipped - 11 lines]
> Yes : I said that the regex matches an empty string : So here you match
> the empty string at the beginning of "bcdefghij".

I am not sure I understood this. There is no empty string in there.

> What you failed to see is that the IsMatch method try to find a match
> inside the given string, it doesn't check that the full string is
> matched.

If I wanted the entire string to be matched, shouldn't I use
Regex::IsMatch(s, "^a*$")?

> Use the Regex.Match method to get the Match object : you'll
> see that it matches an empty string (length=0) at index 0 from the
[quoted text clipped - 3 lines]
> fact it find a 0 length match at each position of the input string, so
> you get 10 matches!

So in essence it matches everything and is equivalent to
Regex::IsMatch(s, ".*")?

BTW why does Regex::IsMatch(s, "*") crash?

Unhandled Exception: System.ArgumentException: parsing "*" - Quantifier
{x,y} fo
llowing nothing.
   at System.Text.RegularExpressions.RegexParser.ScanRegex()
   at System.Text.RegularExpressions.RegexParser.Parse(String re,
RegexOptions o
p)
   at System.Text.RegularExpressions.Regex..ctor(String pattern,
RegexOptions op
tions, Boolean useCache)
   at System.Text.RegularExpressions.Regex.IsMatch(String input, String
pattern)

   at main() in c:\documents and settings\administrator\my
documents\visual stud
io\projects\test\test\test.cpp:line 14
Press any key to continue . . .

> Yes, but "." match anything, including whitespaces.

Thanks, I did not know that.
Tom Widmer - 11 Mar 2005 16:10 GMT
>>> However under VC++ 2005 Express February 2005 CTP we get for the
>>
[quoted text clipped - 13 lines]
>
> I am not sure I understood this. There is no empty string in there.

The empty string is a substring of every string, and there are n
different substring calls that will produce the empty string for an n
character string.

>> What you failed to see is that the IsMatch method try to find a match
>> inside the given string, it doesn't check that the full string is
>> matched.

Yes, IsMatch sees if any substring of the string matches the regex.

> If I wanted the entire string to be matched, shouldn't I use
> Regex::IsMatch(s, "^a*$")?

Yes.

>> Use the Regex.Match method to get the Match object : you'll
>> see that it matches an empty string (length=0) at index 0 from the
[quoted text clipped - 6 lines]
> So in essence it matches everything and is equivalent to
> Regex::IsMatch(s, ".*")?

"a*"? For IsMatch, yes they are equivalent, but as RegExes, they are
not. If you have the string:

"abaabb"

then ".*" will match:
"" 6x
a 3x
ab 2x
aba 1x
abaa 2x
etc.

whereas
"a*" will match:
"" 6x
"a" 3x
"aa" 1x
etc.

Matching isn't just a yes/no (unless you use IsMatch) - the regex
matches against some substring of the string.

> BTW why does Regex::IsMatch(s, "*") crash?

"*" is not a valid Regex. 0 to many of what? Similarly "+" and "{0,4}"
are not valid.

(apologies for any misinformation, regexp is not a major area of
expertise for me)

Tom
Arnaud Debaene - 11 Mar 2005 21:39 GMT
>>> However under VC++ 2005 Express February 2005 CTP we get for the
>>
[quoted text clipped - 13 lines]
>
> I am not sure I understood this. There is no empty string in there.

Yes there are many! : there is an empty string at index 0, another at index
1, another at index 2, etc... This si true for whatever string...

> If I wanted the entire string to be matched, shouldn't I use
> Regex::IsMatch(s, "^a*$")?

Yes, but this is a rather useless regex (as is "a*) : a regex that matches
the empty string doesn't make much sense, unles you filter the matches
afterwards : say, keep only matches more than x characters long. But in that
case, you'd better write a regex that does this filtering directly.

> So in essence it matches everything and is equivalent to
> Regex::IsMatch(s, ".*")?

As Tom explained, it is a bit more complex. In order to experiment, I
suggest you display all the Matches from both regexes on a given input
string.

> BTW why does Regex::IsMatch(s, "*") crash?
>
> Unhandled Exception: System.ArgumentException: parsing "*" -
> Quantifier {x,y} fo
> llowing nothing.
The error description seems quite clear, no? "*" is a quantifier : it
specifies "0 to n instances of the token before it" : There is nothing
before the quantifier in your regex, so it is an invalid regex.

Arnaud
MVP - VC
ismailp - 13 Mar 2005 17:32 GMT
yes, "*" is an illegal regular expression, it represents nothing. these
quantifiers, as Tom and Arnaud told, should follow something. * matches
0-n of preceding item (string or character group, or character,
whatever). bare * itself is meaningless, literally, "illegal". ? also
illegal, +, and {}. these are illegal, if they do not follow anything.
"*a" is also wrong.

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.