Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / July 2007

Tip: Looking for answers? Try searching our database.

Regex. How can this match?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Ethan Strauss - 05 Jul 2007 17:15 GMT
Hi,

   I have written a regular expression which is supposed to pull a
direction (forward or reverse) designation from a file name.

   Unfortunately, the direction designation can either be the whole word
("Forward" or "Reverse") or just a single letter ("F" or "R") and the rest
of the name is not as consistent as I would like.. For example
"P1|1_G10_Forward_primer.ab1" or  "K8_I1_A01_F.ab1".

   At the time I am processing the file names, I have already stripped off
the extension.

   I have written the Regular Expression
public static Regex DirectionFromFIleName = new
Regex("_(?<Direction>[Forward_|Reverse_|R$|F$])");

This looks for the underscore, followed by "Forward" or "Reverse" or an "F"
as the last character in the string or an "R" as the last character in the
string, or so I thought.

In fact,

   when Designation = "P1|1_G10_Forward_primer"
   RegexLibrary.DirectionFromFIleName.Match(Designation).Groups["Direction"].Value
= "F"!

How can it pick up that F when it is not the last character? I assume it has
something to do with putting the $ inside the square brackets, but I can't
figure out exactly what it is.

I can figure out a bunch of different work arounds for this, but I would
like to understand what the regular expression is doing for the future.

Thanks!
Ethan
Nicholas Paldino [.NET/C# MVP] - 05 Jul 2007 17:29 GMT
Ethan,

   For this, I don't know that I would use that logic.  I would parse apart
the parts of the filename by non-alphanumeric characters and then look for
the word or letters in the remaining results.  Basically, you would use the
regular expression pattern "\W" (for non-alphanumeric characters) and then
call the Split method on the regular expression, passing your string.

   In the array of strings that is returned, look for Forward, Reverse, R
or F.

Signature

         - Nicholas Paldino [.NET/C# MVP]
         - mvp@spam.guard.caspershouse.com

> Hi,
>
[quoted text clipped - 33 lines]
> Thanks!
> Ethan
Jesse Houwing - 05 Jul 2007 20:45 GMT
* Ethan Strauss wrote, On 5-7-2007 18:15:
> Hi,
>
[quoted text clipped - 20 lines]
> I can figure out a bunch of different work arounds for this, but I would
> like to understand what the regular expression is doing for the future.

There's a few error's in your regex. Let me try to explain:

_                                : Find a '_'
[Forward_|Reverse_|R$|F$]        : followed by any letter in the
                                   following group 'F','o','r','w'
                                   ... ...  's', 'e', '_', '|', 'R', '$'
                                   'F'
(?<Direction>)                   : Capture these in a named group called
                                   Direction

Of course this isn't what you wanted ;)

This should work better:

_(?<Direction>Forward_|Reverse_|R$|F$)

Just removing the [] would make things work. Now it reads:

_                                : Find a '_'
Forward_|Reverse_|R$|F$          : Find either "Forward_", "Reverse_",
                                   "R" followed by end of line,
                                   "F" followed by end of line
(?<Direction>)                   : Capture these in a named group called
                                   Direction

I present a course in Regular expressions and you've fallen into the
trap many of my students have before you. Be very sure what each kind of
brace, bracket etc means in which context.

() Group, Capture, Set options
[] Character set, any characters in there match
{} Quantifier
<> Naming of groups, look around

Jesse
Jesse
Ethan Strauss - 06 Jul 2007 15:22 GMT
Thanks Jesse,
   I had actually figured it out and was about to post the answer, but you
beat me to it.
   The Regex which ended up working as I wanted is
               "_(?<Direction>Forward|Reverse|R$|F$)"

       This is slightly different from what is below, but only because I
changed my mind about what characters to capture...
Ethan

>* Ethan Strauss wrote, On 5-7-2007 18:15:
>> Hi,
[quoted text clipped - 58 lines]
> Jesse
> Jesse

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.