Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / December 2005

Tip: Looking for answers? Try searching our database.

regular expression help

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Trevor Braun - 10 Dec 2005 20:28 GMT
Hi, I'm not sure that this is the right forum for this, but I've been having
a very tough time completing this expression, and I was hoping someone might
have some suggestions for me.
I am trying to read measurements out of a text description, and I have a
working expression, but it captures a pile of empty matches.  I obviously am
not interested in them, but I screw up my functionality when I try to get
rid of them.

My expression is:
(?:(?:(?<Feet>[0-9]*)\'){0,1}(?:(?:(?<WholeInches>[0-9]*(?![/\w])){0,1}(?:[
,\-]){0,1}(?<Fraction>[0-9]*\/[0-9]*){0,1}(?<Decimal>\d*\.\d*){0,1}\")){0,1})

Some test strings are:
1/4" x 2" Flat 44W x 20'
1 1/4" x 2" Flat 44W x 20'
1/4" x 2.5" Flat 44W x 20'
1/4" x 2" Flat 44W x 20' 3"
1/4" x 2" Flat 44W x 20' 3.5"
1/4" x 2" Flat 44W x 20' 1/2"
1/8" x 4" C-1018 flat x 14' 5-1/4"

I really could use some help on this.  I've been working on this on and off
for several months now, and just can't seem to get it right.
Trevor Braun - 10 Dec 2005 21:44 GMT
Sorry, it's been a hectic day... I didn't finish my post, but somehow
managed to send it anyway....

In the strings, I there are always random numbers, and I want them ignored.
I only want matches on the measurements which can be written about a million
different ways.  This is for pulling data out of a legacy inventory
application.

Any thoughts or suggestions would be very, very much appreciated.  Right
now, my app uses this expression, and removes matches to the empty groups,
but this is just not how it should work.

Thanks,
Trevor_B
jeremiah johnson - 11 Dec 2005 15:54 GMT
> Sorry, it's been a hectic day... I didn't finish my post, but somehow
> managed to send it anyway....
[quoted text clipped - 10 lines]
> Thanks,
> Trevor_B

shoot me an email and i'll work with you on these.  there's no need to
flood a C# newsgroup with a bunch of back and forth messages about
regular expressions, when they're just between you and me.

send me a long list of the test strings and i'll see what i can do for
you.  i've never written a regular expression this complicated and i
would love to give it a try.

jeremiah
Marc Noon - 11 Dec 2005 19:40 GMT
I disagree... regular expressions are fun.

-Marc N.
>> Sorry, it's been a hectic day... I didn't finish my post, but somehow
>> managed to send it anyway....
[quoted text clipped - 20 lines]
>
> jeremiah
yoshijg - 11 Dec 2005 20:18 GMT
Hey trevor,

 It maybe easier to write multiple regex strings than one large regex
string capable of handling all situations.  There is always going to be
a legacy string that will fail your regex.  So, instead have a set of
regex strings that you will loop through and try to match.  If no match
is found, then you know you need to create a new regex.

It's like a bunch of security check points.  If it fails one, then it
goes through another checkpoint.  Having one large centralized
checkpoint can cause a lot of complications.

Give it a whirl because sometimes it's easier to have a bunch of little
tasks than one large complicated task.

josh
Greg Bacon - 14 Dec 2005 22:16 GMT
: Hi, I'm not sure that this is the right forum for this, but I've been
: having a very tough time completing this expression, and I was hoping
[quoted text clipped - 18 lines]
: I really could use some help on this. I've been working on this on and
: off for several months now, and just can't seem to get it right.

One easy suggestion is that you can write "{0,1}" more succinctly as
"?", e.g., "a{0,1}" and "a?" are equivalent.

If you want to insist that one of the groups matches, then say what
you mean.  Remember that the ? and * quantifiers *always* succeed
because they can match nothing.

For complex patterns, I like to use IgnorePatternWhitespace

Your subpatterns are inconsistent, e.g., some included the unit and
some didn't, and even with your followup, I may not be clear on what
you're trying to capture.

Take a look at the code below.  Note how the pattern requires one of
the alternatives to match non-empty strings.

   static void Main(string[] args)
   {
     Regex measurements = new Regex(
       @"
       (?<Fraction>    (\d+\s+)?\d+/\d+"" ) |
       (?<Decimal>     \d+\.\d+""         ) |
       (?<Feet>        \d+'               ) |
       (?<WholeInches> \d+(?![/\w])       )
       ",
       RegexOptions.IgnorePatternWhitespace |
       RegexOptions.ExplicitCapture);

     string[] inputs = {
       "1/4\" x 2\" Flat 44W x 20'",
       "1 1/4\" x 2\" Flat 44W x 20'",
       "1/4\" x 2.5\" Flat 44W x 20'",
       "1/4\" x 2\" Flat 44W x 20' 3\"",
       "1/4\" x 2\" Flat 44W x 20' 3.5\"",
       "1/4\" x 2\" Flat 44W x 20' 1/2\"",
       "1/8\" x 4\" C-1018 flat x 14' 5-1/4\"",
     };

     string[] groups = {
       "Feet", "WholeInches", "Fraction", "Decimal",
     };

     foreach (string input in inputs)
     {
       Console.WriteLine("[" + input + "]:");

       int count = 1;
       foreach (Match m in measurements.Matches(input))
       {
         Console.WriteLine("  - {0}:", count++);

         foreach (string group in groups)
           Console.WriteLine("    - {0}: [{1}]",
             group, m.Groups[group].Value);
       }
     }
   }

Is it at least a start in the right direction?  Should an input such
as [20' 3"] produce one match or two (one for the feet component and
one for the inches component)?  What else needs fixing?

I agree with Mark Noon: regular expressions are fun, so I look forward
to hearing back from you.

Hope this helps,
Greg
Signature

"Those who deliberately sign their names to deception will be punished,"
[President Bush] said, leaving out that this is precisely what happens
every time he signs a budget or a law, or Congress votes.
   -- Lew Rockwell


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.