Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / New Users / August 2005

Tip: Looking for answers? Try searching our database.

regex syntax

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
jg - 14 Aug 2005 05:12 GMT
I am new to using both dotnet and regex.  I have done the basic reading to
the point I thought I know how to use regex to extract date string.  But I
ran into problems.

what is the best regex expression to look for month names or date string for
that matter?

from my testing,  I could use
   "((JAN)|(FEB)|(MAR)|(APR)|(MAY)|(JUN)|(JUL)|(AUG)|(SEP)|(OCT)|(NOV)|(DEC))"
not
   '([ADFJMNOS][ACEOPU][BCGLNPRTVY])"
In other word I got  syntax problem with the month pattern

I am working towards dealing with various date format I deal with
My object is to get the entire date string and parse into yyyy-mm-dd or
whatever the dotnet conversion routine will take.
I will have to deal with many long strings of 64K to 200K .  This is the
reason I am locking for a good regex expression to minimize delays from
processing

I know I have to deal with
   yyyy-mm-dd ( and variants thereof with dot or slash as separator instead
of dash, single digit month or day)
   yyyy-MMM-dd ( or just space instead of -)
   MMM d, yy    ( or yyyy)
and the tougher ones like
   d MMM yyyy
   d MMM yy
Alvin Bruney [MVP - ASP.NET] - 14 Aug 2005 20:39 GMT
have a look at regexlib.com for customized expressions

Signature

Regards,
Alvin Bruney
[Shameless Author Plug]
The Microsoft Office Web Components Black Book with .NET
available at www.lulu.com/owc, Amazon, B&H etc

Forth-coming VSTO.NET
-------------------------------------------------------------------------------

>I am new to using both dotnet and regex.  I have done the basic reading to
>the point I thought I know how to use regex to extract date string.  But I
[quoted text clipped - 25 lines]
>    d MMM yyyy
>    d MMM yy
jg - 18 Aug 2005 06:29 GMT
thank you

However, I have no luck accessing that content.  all I got was the Green
Logos. did not  see anything.

> have a look at regexlib.com for customized expressions
>
[quoted text clipped - 27 lines]
>>    d MMM yyyy
>>    d MMM yy
Oliver Sturm - 18 Aug 2005 10:03 GMT
> I know I have to deal with
>     yyyy-mm-dd ( and variants thereof with dot or slash as separator instead
[quoted text clipped - 4 lines]
>     d MMM yyyy
>     d MMM yy

I have created a regex for you that works with all those samples. Here
it is:

(?<year>\d{4})[-\./\s](?<month>\d{1,2})[-\./\s](?<day>\d{1,2})$ |
(?<year>\d{4})[-\s](?<month>JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)[-\s](?<day>\d{1,2})$

(?<month>JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\s(?<day>\d{1,2}),\s*?(?<year>\d{4}|\d{2})$

(?<day>\d{1,2})\s(?<month>JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\s(?<year>\d{4}|\d{2})$

I tried this with the following samples, constructed from the templates
you gave:

2005-03-08
2005.03.08
2005/03/08
2005 03 08
2005 3 08
2005 3 8
2005 03 8
2005-MAR-08
2005 MAR 08
2005 MAR 8
MAR 8, 2005
MAR 08, 2005
MAR 8, 05
MAR 08, 05
8 MAR 2005
8 MAR 05
08 MAR 2005
08 MAR 05

As you can see, the expression is comprised of four different parts.
Each of these has a $ sign at the end, which you'll want to get rid of
before using the expression with your own long string. This is only
needed to test the expression in Regulator with multiple samples.

I tried this with the IgnoreWhitespace and the IgnoreCase options
switched on.

Hope this helps!

(If you have any trouble with the regex, I could send you the saved
Regulator file. Just in case things get mangled in the message or
something.)

               Oliver Sturm
Signature

omnibus ex nihilo ducendis sufficit unum
Spaces inserted to prevent google email destruction:
MSN oliver @ sturmnet.org Jabber sturm @ amessage.de
ICQ 27142619 http://www.sturmnet.org/blog

jg - 18 Aug 2005 18:52 GMT
that is absolutely wonderful and helpful.  Thank you very much. Your efforts
are well appreciated.
Thank you very much again for testing and explaining.

I will try that out..

>> I know I have to deal with
>>     yyyy-mm-dd ( and variants thereof with dot or slash as separator
[quoted text clipped - 52 lines]
>
>                Oliver Sturm
jg - 19 Aug 2005 06:38 GMT
Great, it works even after taking out the $ and the space around the |..  I
did add \b before the entire expression to make sure the first part of the
date is on the word boundary.  This way I can avoid some supposedly low
probability errors like some strange catalogue dot or dash notations

Now all I have to do is to make it work with January, February,... ( fully
spelled month names). I guess I can always add another 12 | parts to the
month expressions

> that is absolutely wonderful and helpful.  Thank you very much. Your
> efforts are well appreciated.
[quoted text clipped - 58 lines]
>>
>>                Oliver Sturm
Oliver Sturm - 19 Aug 2005 09:41 GMT
> Great, it works even after taking out the $ and the space around the |..  I
> did add \b before the entire expression to make sure the first part of the
> date is on the word boundary.  This way I can avoid some supposedly low
> probability errors like some strange catalogue dot or dash notations

Sure, I didn't know your exact circumstances, so you'd have to make
modifications to my sample to make it work for you completely.

> Now all I have to do is to make it work with January, February,... ( fully
> spelled month names). I guess I can always add another 12 | parts to the
> month expressions

Sure you can. If you find the whole thing growing too much, maybe you
could define the various parts you need (the month expression, the day
expression, the two digit year, the four digit year) as string constants
in your code and use a String.Format to put them together to form the
complete regular expression before you use it. That way it might be a
bit more maintainable - otherwise you'll have to make every change to
one of the parts in many places, increasing the probability of an error.

               Oliver Sturm
Signature

omnibus ex nihilo ducendis sufficit unum
Spaces inserted to prevent google email destruction:
MSN oliver @ sturmnet.org Jabber sturm @ amessage.de
ICQ 27142619 http://www.sturmnet.org/blog

jg - 23 Aug 2005 06:32 GMT
thank you again. you are wonderfully helpful.

I did find the pattern string  getting too huge. So I started to split date
pattern into 3 components before using them to compose the final pattern,
although I did not use the string format method.

>> Great, it works even after taking out the $ and the space around the |..
>> I did add \b before the entire expression to make sure the first part of
[quoted text clipped - 17 lines]
>
>                Oliver Sturm
Oliver Sturm - 23 Aug 2005 10:14 GMT
> I did find the pattern string  getting too huge. So I started to split date
> pattern into 3 components before using them to compose the final pattern,
> although I did not use the string format method.

Well, if you ask me, you should always use String.Format when putting
together strings from more than two parts. A String.Format call can
create an arbitrarily complicated string in one operation, while a
concatenation a + b + c takes two operations at least. Strings are
immutable in .NET, so a + b + c will end up allocating several new
strings before the final result is ready.

The argument against this is that the compiler might get rid of some of
the overhead for you, at least when a, b and c are static strings. But I
don't like to depend on that, especially when the String.Format call is
usually so much better readable:

 "At " + time.ToString() + ", the user " + user + "had a problem
accessing the " + resource + "resource."

 String.Format("At {0}, the user {1} had a problem accessing the {2}
resource.", time, user, resource);

               Oliver Sturm
Signature

omnibus ex nihilo ducendis sufficit unum
Spaces inserted to prevent google email destruction:
MSN oliver @ sturmnet.org Jabber sturm @ amessage.de
ICQ 27142619 http://www.sturmnet.org/blog

jg - 24 Aug 2005 05:09 GMT
Now I see.  pardon my ignorance

Thank you again. much appreciated.

>> I did find the pattern string  getting too huge. So I started to split
>> date pattern into 3 components before using them to compose the final
[quoted text clipped - 19 lines]
>
>                Oliver Sturm
Jon Skeet [C# MVP] - 24 Aug 2005 07:18 GMT
> Well, if you ask me, you should always use String.Format when putting
> together strings from more than two parts.

I disagree.

> A String.Format call can
> create an arbitrarily complicated string in one operation, while a
> concatenation a + b + c takes two operations at least.

What do you count as an operation? Bear in mind that String.Format has
to do a lot more work in terms of parsing etc - I very much doubt that
there are many cases where it's more efficient.

> Strings are
> immutable in .NET, so a + b + c will end up allocating several new
> strings before the final result is ready.

That's not true if a, b and c are already strings. a+b+c will simply
result in a call to String.Concat(a, b, c) which creates one string
without creating any intermediate ones. It's not like a+b+c is compiled
into (a+b)+c, evaluating a+b first.

string a = "a";
string b = "b";
string c = "c";
       
string x = a+b+c;

is compiled into:

 IL_0000:  ldstr      "a"
 IL_0005:  stloc.0
 IL_0006:  ldstr      "b"
 IL_000b:  stloc.1
 IL_000c:  ldstr      "c"
 IL_0011:  stloc.2
 IL_0012:  ldloc.0
 IL_0013:  ldloc.1
 IL_0014:  ldloc.2
 IL_0015:  call       string [mscorlib]System.String::Concat(string,
                                                             string,
                                                             string)
 IL_001a:  stloc.3

> The argument against this is that the compiler might get rid of some of
> the overhead for you, at least when a, b and c are static strings. But I
> don't like to depend on that

You can depend on it in C# at least - it's in the specification, IIRC.

> especially when the String.Format call is
> usually so much better readable:
[quoted text clipped - 4 lines]
>   String.Format("At {0}, the user {1} had a problem accessing the {2}
> resource.", time, user, resource);

Sometimes String.Format is more readable; sometimes it's less readable.
In almost all cases, readability should be the key to determining which
to use.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Oliver Sturm - 24 Aug 2005 09:46 GMT
>>Well, if you ask me, you should always use String.Format when putting
>>together strings from more than two parts.
>
> I disagree.

I guess I should have qualified my statement better. I might have added
conditions like "and at least one of the parts is not a string in itself".

>>The argument against this is that the compiler might get rid of some of
>>the overhead for you, at least when a, b and c are static strings. But I
>>don't like to depend on that
>
> You can depend on it in C# at least - it's in the specification, IIRC.

I would readily assume it even without reading the specs. I would make a
test if it were in any way important to me. Until then, I wouldn't
depend on it.

>>especially when the String.Format call is
>>usually so much better readable:
[quoted text clipped - 8 lines]
> In almost all cases, readability should be the key to determining which
> to use.

Right, that was my most important point as well. But apart from
concatenations of literal strings or variables/constants holding
strings, I can't imagine cases where the + concatenation would be more
readable (see above, IMO). Even in these cases I might tend to use
String.Format because during the course of development I find it much
easier to extend and change. I can always change it if the profiler says
it's a problem.

               Oliver Sturm
Signature

omnibus ex nihilo ducendis sufficit unum
Spaces inserted to prevent google email destruction:
MSN oliver @ sturmnet.org Jabber sturm @ amessage.de
ICQ 27142619 http://www.sturmnet.org/blog

Jon Skeet [C# MVP] - 25 Aug 2005 09:17 GMT
> >>Well, if you ask me, you should always use String.Format when putting
> >>together strings from more than two parts.
[quoted text clipped - 3 lines]
> I guess I should have qualified my statement better. I might have added
> conditions like "and at least one of the parts is not a string in itself".

Do you have evidence that String.Format doesn't itself convert the
arguments to intermediate strings? If it does, I can't see that using
it is saving any operations.

> > You can depend on it in C# at least - it's in the specification, IIRC.
>
> I would readily assume it even without reading the specs. I would make a
> test if it were in any way important to me. Until then, I wouldn't
> depend on it.

Well, take it from me - you *can* depend on it. (That's assuming that
by "static" you mean "constant".)

> > Sometimes String.Format is more readable; sometimes it's less readable.
> > In almost all cases, readability should be the key to determining which
[quoted text clipped - 7 lines]
> easier to extend and change. I can always change it if the profiler says
> it's a problem.

In cases with a single parameter you want at the end of the string, I
think it's more readable to have:

string x = "Age: "+age;

than:

string x = string.Format("Age: {0}", age);

It's very easy to change the former to the latter if you ever *do* want
to do anything more complicated.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.