Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / March 2008

Tip: Looking for answers? Try searching our database.

Parsing in between strings using Regex

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
CJ - 03 Mar 2008 01:44 GMT
Is this the format to parse a string and return the value between the item?

Regex pRE = new Regex("<File_Name>.*>(?<insideText>.*)</File_Name>");

I am trying to parse this string.

<File_Name>Services</File_Name>

Thanks
Arne Vajhøj - 03 Mar 2008 02:25 GMT
> Is this the format to parse a string and return the value between the item?
>
[quoted text clipped - 3 lines]
>
> <File_Name>Services</File_Name>

Regex re = new Regex("<File_Name>(?<insideText>.*)</File_Name>");
string fn = re.Match(s).Groups["insideText"].Value;

seems to work.

Arne
CJ - 03 Mar 2008 02:33 GMT
Thanks Arne,

Seems like the ".*" was messing me up.

This regular expression is so hard at times, I don't know how
you guys have this thing figured out.

CJ

>> Is this the format to parse a string and return the value between the
>> item?
[quoted text clipped - 11 lines]
>
> Arne
Jesse Houwing - 03 Mar 2008 11:00 GMT
Hello cj,

> Thanks Arne,
>
> Seems like the ".*" was messing me up.
>
> This regular expression is so hard at times, I don't know how you guys
> have this thing figured out.

This looks a lot like XML data. If it is, you really should try to avoid
regex and use XPath to fetch the data you need.

If it isn't wellformed Regex can help you, but the regex you have still has
a few issues in it.

dor one, if your input would contain "<file_name>bbbbbbbbb</file_name><file_name>aaaaaaaaaaaa</file_name>"
you would get this as your whole value:
"bbbbbbbbb</file_name><file_name>aaaaaaaaaaaa". Obviously not what's required.

You can adjust your regex to prevent this from happening in two ways:

1) Use Reluctant Matching
Regex re = new Regex("<File_Name>(?<insideText>.*?)</File_Name>");
string fn = re.Match(s).Groups["insideText"].Value;

2) Use a negative Look Ahead
Regex re = new Regex("<File_Name>(?<insideText>((?!</File_Name>).)*)</File_Name>");
string fn = re.Match(s).Groups["insideText"].Value;

One thing that migth also catch up with you is afile that is formatted like
this (let's hope the newsreader will leave this in tact):
<file_name>
bbbbbbbbb
</file_name>

This is probably syntactically correct, but as . normally doesn't match over
the end of a line, it will require you to use an extra switch in your regex
constructor (either case) which will allow . to match newline.
Regex re = new Regex("your regex here", RegexOptions.Singleline);

Alternatively you could 'eat up' all whitespace around the File_Name. But
only if you're very sure the filename itself will never contain a newline
or have whitespace in it at the strat or end of the filename.

1)
Regex re = new Regex("<File_Name>\s*(?<insideText>.*?)\s*</File_Name>");
2)
Regex re = new Regex("<File_Name>\s*(?<insideText>((?!</File_Name>).)*?)\s*</File_Name>");

Kind Regards,

Jesse Houwing

> CJ
>
[quoted text clipped - 14 lines]
>>
>> Arne

--
Jesse Houwing
jesse.houwing at sogeti.nl
Ignacio Machin ( .NET/ C# MVP ) - 03 Mar 2008 12:50 GMT
Hi,

> Thanks Arne,
>
> Seems like the ".*" was messing me up.
>
> This regular expression is so hard at times, I don't know how
> you guys have this thing figured out.

Practice, you should try it a couple of times until you find the correct way

Also a book would help you ;)
Jesse Houwing - 03 Mar 2008 22:09 GMT
Hello Ignacio Machin ( .NET/ C# MVP )" machin TA laceupsolutions.com,

> Hi,
>
[quoted text clipped - 9 lines]
>
> Also a book would help you ;)

Dan Appleman has a very affordable ebook available on Amazon which is made
specifically for the .NET syntax. It is from the 1.1 timeframe, so there
are a few features that aren't described, but you probably won't miss them.

A great resource on Regular expressions at a whole (once a free ebook) is
Mastering regular expressions (probably 4th edition by now, last time I checked
they released the 3rd) from O'Reilly.

--
Jesse Houwing
jesse.houwing at sogeti.nl

Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.