Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / August 2006

Tip: Looking for answers? Try searching our database.

Regex help

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Stephen Brown - 18 Aug 2006 00:13 GMT
I have a simple regex need and I've already wasted too much time on it
spinning in circles.  Can a regex god help a stranded soul?  I just need to
replace all non-escaped ampersands in a file.  It needs to skip escaped
ampersands such as & and 

&[a-zA-Z0-9]+; will get the escaped ampersands (some inproper escapes will
slip buy, but good enough for my purposes), but I need to replace all the
ampersands that aren't escaped

example:
&abc;&def& ghi&_jkl
after replace:
&abc;*def* ghi*_jkl
Paul E Collins - 18 Aug 2006 00:41 GMT
> &[a-zA-Z0-9]+; will get the escaped ampersands
> (some inproper escapes will slip buy, but good enough
[quoted text clipped - 4 lines]
> after replace:
> &abc;*def* ghi*_jkl

You can't do this unambiguously. If you've got a file that's somehow
been *partially* escaped, it's no longer in a state that makes any
sense, and you can't tell "&123;" (intended to be an escaped
character) from the identical "©" (an unescaped ampersand that
just happens to be followed by the string "123"). Where are you
getting this input from?

Eq.
Jesse Houwing - 18 Aug 2006 00:54 GMT
* Stephen Brown wrote, On 18-8-2006 1:13:
> I have a simple regex need and I've already wasted too much time on it
> spinning in circles.  Can a regex god help a stranded soul?  I just need to
[quoted text clipped - 9 lines]
> after replace:
> &abc;*def* ghi*_jkl

This will probably do for most circumstances, though Pauls remark does
apply of course.

&(?![a-z0-9]+;)

This will find all '&' not directly followed by a number of letters,
digits and finally a ';'.

To really ensure you're only escaping unescaped '&'s you'll need to
write a very long regex that looks like:

&(?!([0-9]+|copy|euml|amp|......);)

Where you'll have to fill the dots with all allowed escape sequences and
optimize afterwards (so that 'amp' & 'auml' bccome 'a(?:uml|mp)' for
improved speed. But my guess is that in most cases the first option will
suffice.

Jesse Houwing

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.